Notes of a Unix Socket Programming Newbie

I implemented a simple socks5 proxy in C using basic Unix socket APIs recently[1], here are some tips I learned from it.

1 Leave the safe zone

When doing network programming with basic the Unix socket rather than based on a library, it's something like leaving a safe zone. You face every challenges big or small, some of which was not even perceived under the help of a library, for example domain name lookup, selection between IPv4 and IPv6, choose an appropriate I/O model......

Things once were simple may become difficult and things once were obscure may become clear, it's an old friend in new clothes.

2 Socket programming routine

Here is the routine for TCP socket, UDP is connectionless and simpler.

2.1 Server side

1. open a file for the socket with socket system call
Since everything in Unix is a file, so is a socket, a file descriptor is needed for later operations over the socket.

2. bind the socket to an address with bind system call
A client connect to a server by specifying the IP and port of that server, so a server socket must bind to an address. A host may have multiple IPs, just choose the one you'd like.

3. accept a client connection with accept system call

4. send or receive data with read or write system call

2.2 Client side

It's similar to server side without bind step.
1. open a file for the socket with socket system call

2. connect to the server with connect

3. send or receive data with read or write system call

3 Function names

Abbreviation is common in socket system function and constants naming, here are some abbreviations you should know.

4 Address

Address is a small trouble in my opinion, it consist of an IP address and a port. IP address can be IPv4 and IPv6 or we can look up address from a domain name, there are 4 structures for addresses. Port is simple and just an integer but we should take care with the byte order. Let's go through them one by one.

4.1 sockaddr, sockaddr_in, sockaddr_in6

sockaddr, sockaddr_in and sockaddr_in6 are structures represent an address. sockaddr represent an IPv4 or IPv6 address, sockaddr_in only represent IPv4 address, sockaddr_in6 only represent IPv6 address.

So sockaddr_in and sockaddr_in6 can be casted to sockaddr respectively.

The declaration of them are as follows


struct sockaddr {
  unsigned short sa_family; // address family, AF_xxx
  char sa_data[14]; // 14 bytes of protocol address
};

/* IPv4 address */
struct sockaddr_in {
  short int sin_family; // Address family, AF_INET
  unsigned short int sin_port; // Port number
  struct in_addr sin_addr; // Internet address
  unsigned char sin_zero[8]; // Same size as struct sockaddr
};

struct in_addr {
  uint32_t s_addr; // that's a 32-bit int (4 bytes)
};

/* IPv6 address */
struct sockaddr_in6 {
  u_int16_t sin6_family; // address family, AF_INET6
  u_int16_t sin6_port; // port number, Network Byte Order
  u_int32_t sin6_flowinfo; // IPv6 flow information
  struct in6_addr sin6_addr; // IPv6 address
  u_int32_t sin6_scope_id; // Scope ID
};
struct in6_addr {
  unsigned char s6_addr[16]; // IPv6 address
};

4.2 DNS lookup

An IP can be obtained through DNS domain name lookup with getaddrinfo function which returns a new address structure addrinfo as follows


struct addrinfo {
  int ai_flags; // AI_PASSIVE, AI_CANONNAME, etc.
  int ai_family; // AF_INET, AF_INET6, AF_UNSPEC
  int ai_socktype; // SOCK_STREAM, SOCK_DGRAM 
  int ai_protocol; // use 0 for "any" 
  size_t ai_addrlen; // size of ai_addr in bytes
  struct sockaddr *ai_addr; // struct sockaddr_in or _in6
  char *ai_canonname; // full canonical hostname 
  struct addrinfo *ai_next; // linked list, next node
};

This structure contains lots of information. ai_addr field is a sockadd pointer and ai_next field make it like a linked list.

4.3 Conversion between two IP forms

As described early, inet_ntop and inet_pton functions is used for conversion between different IP forms.

4.4 Byte order

It's a simple and natural concept just like the left-hander and the right-hander in our daily life.

The so-called big endian byte order put the most significant byte first, while the so-called little endian byte order put the least significant byte first.

For example, 443 is the default port for HTTPs, its binary form in bits is 00000001 10111011 or 1bb in hexadecimal form.
The port field takes 2 bytes in the socks5 packet.

Some host is little endian byte order, we can convert received network data with htons function, or ntohs to do the opposite when sending data through the network.

5 I/O model, multiplexing, multi-thread, multi-process

These are big topics that impact the performance. There are two basic questions here:
1. How to handle multiple requests at the same time.
A thread or process can be created per request or use multiplexing mechanism within a single process.

Besides a thread per request model, you can explore more advanced and complicated thread model, for example a few thread to receive packet from network and some other threads to do the work required by each request.

2. I/O model

6 Reference

1 dt.c Source code of a simple socks5 proxy server

Written by Songziyu @China Dec. 2023