I implemented a simple socks5 proxy in C using basic Unix socket APIs recently[1], here are some tips I learned from it.
When doing network programming with basic the Unix socket rather than based on a library, it's something like leaving a safe zone. You face every challenges big or small, some of which was not even perceived under the help of a library, for example domain name lookup, selection between IPv4 and IPv6, choose an appropriate I/O model......
Things once were simple may become difficult and things once were obscure may become clear, it's an old friend in new clothes.
Here is the routine for TCP socket, UDP is connectionless and simpler.
1. open a file for the socket with socket
system call
Since everything in Unix is a file, so is a socket, a file descriptor is needed
for later operations over the socket.
2. bind the socket to an address with bind
system call
A client connect to a server by specifying the IP and port of that server, so
a server socket must bind to an address. A host may have multiple IPs, just
choose the one you'd like.
3. accept a client connection with accept
system call
4. send or receive data with read
or write
system call
It's similar to server side without bind step.
1. open a file for the socket with socket
system call
2. connect to the server with connect
3. send or receive data with read
or write
system call
Abbreviation is common in socket system function and constants naming, here are some abbreviations you should know.
inet_ntop
- Function convert IPv4 and IPv6 address from binary to text form
inet: a common suffix refers to Internet Protocol related things
n: stands for numerical, in numerical form, an IPv4 address is a 4 byte
number and an IPv6 address is a 16 byte number
p: stands for presentation, that's a human readable text form IP address,
for example, 127.0.0.1(IPv4) or 2606:4700:3034::6815:478a(IPv6)
inet_pton
- Function do the opposite of inet_ntop
ntohs
- Function converts a short unsigned integer from network byte
order to host byte order(I'll explain byte order later)
n: stands for numeric
h: stands for host
s: stands for short
htons
- Function converts a short unsigned integer from host byte order to network
byte order
AF_INET
- Constant stands for Address Family Internet Protocol v4
AF_INET6
- Constant stands for Address Family Internet Protocol v6
sockaddr
- socket address
sockaddr_in
- socket address for IPv4
sockaddr_in6
- socket address for IPv6
addrinfo
- address information
Address is a small trouble in my opinion, it consist of an IP address and a port. IP address can be IPv4 and IPv6 or we can look up address from a domain name, there are 4 structures for addresses. Port is simple and just an integer but we should take care with the byte order. Let's go through them one by one.
sockaddr
, sockaddr_in
and sockaddr_in6
are structures represent an address.
sockaddr
represent an IPv4 or IPv6 address, sockaddr_in
only represent IPv4
address, sockaddr_in6
only represent IPv6 address.
So sockaddr_in
and sockaddr_in6
can be casted to sockaddr
respectively.
The declaration of them are as follows
struct sockaddr {
unsigned short sa_family; // address family, AF_xxx
char sa_data[14]; // 14 bytes of protocol address
};
/* IPv4 address */
struct sockaddr_in {
short int sin_family; // Address family, AF_INET
unsigned short int sin_port; // Port number
struct in_addr sin_addr; // Internet address
unsigned char sin_zero[8]; // Same size as struct sockaddr
};
struct in_addr {
uint32_t s_addr; // that's a 32-bit int (4 bytes)
};
/* IPv6 address */
struct sockaddr_in6 {
u_int16_t sin6_family; // address family, AF_INET6
u_int16_t sin6_port; // port number, Network Byte Order
u_int32_t sin6_flowinfo; // IPv6 flow information
struct in6_addr sin6_addr; // IPv6 address
u_int32_t sin6_scope_id; // Scope ID
};
struct in6_addr {
unsigned char s6_addr[16]; // IPv6 address
};
An IP can be obtained through DNS domain name lookup with getaddrinfo
function which returns a new address structure addrinfo
as follows
struct addrinfo {
int ai_flags; // AI_PASSIVE, AI_CANONNAME, etc.
int ai_family; // AF_INET, AF_INET6, AF_UNSPEC
int ai_socktype; // SOCK_STREAM, SOCK_DGRAM
int ai_protocol; // use 0 for "any"
size_t ai_addrlen; // size of ai_addr in bytes
struct sockaddr *ai_addr; // struct sockaddr_in or _in6
char *ai_canonname; // full canonical hostname
struct addrinfo *ai_next; // linked list, next node
};
This structure contains lots of information. ai_addr
field is a sockadd
pointer and ai_next
field make it like a linked list.
As described early, inet_ntop
and inet_pton
functions is used for
conversion between different IP forms.
It's a simple and natural concept just like the left-hander and the right-hander in our daily life.
The so-called big endian byte order put the most significant byte first, while the so-called little endian byte order put the least significant byte first.
For example, 443
is the default port for HTTPs, its binary
form in bits is 00000001 10111011
or 1bb
in hexadecimal form.
The port field takes 2 bytes in the socks5 packet.
00000001
(or 1
), the second byte of the port is 10111011
(or bb
)
10110111
(or bb
)
, the second byte of the port is 00000001
(or 1
).
Some host is little endian byte order, we can convert received network data with
htons
function, or ntohs
to do the opposite when sending data through the
network.
These are big topics that impact the performance. There are two basic questions here: 1. How to handle multiple requests at the same time. A thread or process can be created per request or use multiplexing mechanism within a single process.
Besides a thread per request model, you can explore more advanced and complicated thread model, for example a few thread to receive packet from network and some other threads to do the work required by each request.
2. I/O model
1 dt.c Source code of a simple socks5 proxy server
Written by Songziyu @China Dec. 2023