Home  >  Article  >  php教程  >  Detailed explanation of Socket

Detailed explanation of Socket

高洛峰
高洛峰Original
2016-12-13 10:17:381185browse

We are well aware of the value of information exchange, so how do processes in the network communicate? For example, when we open the browser to browse the web every day, how does the browser process communicate with the web server? When you use QQ to chat, how does the QQ process communicate with the server or the QQ process where your friends are? Do all these rely on sockets? So what is a socket? What are the types of sockets? There are also basic functions of socket, which are what this article wants to introduce. The main contents of this article are as follows:

1. How to communicate between processes in the network?

2. What is Socket?

3. Basic operations of socket

3.1, socket() function

3.2, bind() function

3.3, listen(), connect() function

3.4, accept() function

3.5, read( ), write() function, etc.

3.6. close() function

4. Detailed explanation of TCP’s three-way handshake to establish a connection in socket

5. Detailed explanation of TCP’s four-way handshake to release connection in socket

6. An example (Practice (For a moment)

7. Leave a question, everyone is welcome to reply! ! !

1. How to communicate between processes in the network?

There are many ways of local inter-process communication (IPC), but they can be summarized into the following 4 categories:

Message passing (pipeline, FIFO, message queue)

Synchronization (mutex, condition variable, read-write lock, File and write record locks, semaphores)

Shared memory (anonymous and named)

Remote procedure calls (Solaris gates and Sun RPC)

But these are not the topic of this article! What we are going to discuss is how to communicate between processes in the network? The first problem to solve is how to uniquely identify a process, otherwise communication will be impossible! A process can be uniquely identified locally by the process PID, but this does not work on the network. In fact, the TCP/IP protocol suite has helped us solve this problem. The "ip address" of the network layer can uniquely identify the host in the network, while the "protocol + port" of the transport layer can uniquely identify the application (process) in the host. In this way, the triplet (ip address, protocol, port) can be used to identify the network process, and process communication in the network can use this mark to interact with other processes.

Applications that use the TCP/IP protocol usually use application programming interfaces: sockets of UNIX BSD and TLI of UNIX System V (already obsolete) to achieve communication between network processes. For now, almost all applications use sockets, and now is the Internet era. Process communication on the network is ubiquitous. This is why I say "everything is socket".

2. What is Socket?

We already know above that processes in the network communicate through sockets, so what is a socket? Socket originated from Unix, and one of the basic philosophies of Unix/Linux is that "everything is a file" and can be operated in the "open -> read and write write/read -> close" mode. My understanding is that Socket is an implementation of this mode. Socket is a special file, and some socket functions are operations on it (read/write IO, open, close). We will introduce these functions later.

The origin of the word socket

The first use in the field of networking was found in the document IETF RFC33 released on February 12, 1970, written by Stephen Carr, Steve Crocker and Vint Cerf. According to the Computer History Museum, Croker wrote: "Elements of a namespace may be called socket interfaces. A socket interface forms one end of a connection, and a connection may be fully specified by a pair of socket interfaces. "The Computer History Museum added: "This is about 12 years earlier than BSD's socket interface definition."

3. Basic operation of socket

Since socket is part of the "open-write/read-close" mode. implementation, then the socket provides functional interfaces corresponding to these operations. The following uses TCP as an example to introduce several basic socket interface functions.

3.1, socket() function

int socket(int domain, int type, int protocol);

socket function corresponds to the opening operation of ordinary files. The ordinary file open operation returns a file descriptor, and socket() is used to create a socket descriptor (socket descriptor), which uniquely identifies a socket. This socket descriptor is the same as the file descriptor. It is used in subsequent operations. It is used as a parameter to perform some read and write operations.

Just like you can pass different parameter values ​​to fopen to open different files. When creating a socket, you can also specify different parameters to create different socket descriptors. The three parameters of the socket function are:

domain: the protocol domain, also known as the protocol family. Commonly used protocol families include AF_INET, AF_INET6, AF_LOCAL (or AF_UNIX, Unix domain socket), AF_ROUTE, etc. The protocol family determines the address type of the socket, and the corresponding address must be used in communication. For example, AF_INET determines to use a combination of ipv4 address (32-bit) and port number (16-bit), and AF_UNIX determines to use an absolute path. Name as address.

type: Specify the socket type. Commonly used socket types include SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, SOCK_PACKET, SOCK_SEQPACKET, etc. (What are the types of sockets?).

protocol: Hence the name, it means a designated protocol. Commonly used protocols include IPPROTO_TCP, IPPTOTO_UDP, IPPROTO_SCTP, IPPROTO_TIPC, etc., which respectively correspond to the TCP transmission protocol, UDP transmission protocol, STCP transmission protocol, and TIPC transmission protocol (I will discuss this protocol separately!).

Note: The above type and protocol cannot be combined at will. For example, SOCK_STREAM cannot be combined with IPPROTO_UDP. When protocol is 0, the default protocol corresponding to the type type is automatically selected.

When we call socket to create a socket, the returned socket descriptor exists in the protocol family (address family, AF_XXX) space, but does not have a specific address. If you want to assign an address to it, you must call the bind() function, otherwise the system will automatically assign a port randomly when calling connect() or listen().

3.2. bind() function

As mentioned above, the bind() function assigns a specific address in an address family to the socket. For example, corresponding to AF_INET and AF_INET6, an ipv4 or ipv6 address and port number combination is assigned to the socket.

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen); The three parameters of the

function are:

sockfd: the socket descriptor, which is created through the socket() function and uniquely identifies it. a socket. The bind() function will bind a name to this descriptor.

addr: a const struct sockaddr * pointer pointing to the protocol address to be bound to sockfd. This address structure varies according to the address protocol family when the address creates the socket. For example, ipv4 corresponds to:

struct sockaddr_in {
    sa_family_t    sin_family; /* address family: AF_INET */
    in_port_t      sin_port;   /* port in network byte order */
    struct in_addr sin_addr;   /* internet address */};/* Internet address. */struct in_addr {
    uint32_t       s_addr;     /* address in network byte order */};
ipv6对应的是: 
struct sockaddr_in6 { 
    sa_family_t     sin6_family;   /* AF_INET6 */ 
    in_port_t       sin6_port;     /* port number */ 
    uint32_t        sin6_flowinfo; /* IPv6 flow information */ 
    struct in6_addr sin6_addr;     /* IPv6 address */ 
    uint32_t        sin6_scope_id; /* Scope ID (new in 2.4) */ };struct in6_addr { 
    unsigned char   s6_addr[16];   /* IPv6 address */ };
Unix域对应的是: 
#define UNIX_PATH_MAX    108struct sockaddr_un { 
    sa_family_t sun_family;               /* AF_UNIX */ 
    char        sun_path[UNIX_PATH_MAX];  /* pathname */ };
addrlen:对应的是地址的长度。

Usually the server will bind a well-known address (such as ip address + port number) when it is started, for To provide services, customers can connect to the server through it; the client does not need to specify, the system automatically assigns a port number and its own IP address combination. This is why the server usually calls bind() before listening, but the client does not call it. Instead, the system randomly generates one during connect().

Network byte order and host byte order

Host byte order is what we usually call big endian and little endian modes: different CPUs have different byte order types, these byte order refers to the integers in memory The order in which they are saved is called host order. The standard definitions of Big-Endian and Little-Endian are quoted as follows:

 a) Little-Endian means that the low-order bytes are arranged at the low address end of the memory, and the high-order bytes are arranged at the high address end of the memory.

  b) Big-Endian means that the high-order bytes are arranged at the low address end of the memory, and the low-order bytes are arranged at the high address end of the memory.

Network byte order: 4-byte 32-bit values ​​are transmitted in the following order: first 0~7bit, then 8~15bit, then 16~23bit, and finally 24~31bit. This transfer order is called big-endian. Because all binary integers in the TCP/IP header are required to be in this order when transmitted over the network, it is also called network byte order. Byte order, as the name suggests, is the order in which data larger than one byte is stored in memory. There is no order issue with data of one byte.

So: When binding an address to a socket, please first convert the host byte order to network byte order, and do not assume that the host byte order uses Big-Endian like the network byte order. There have been murders caused by this problem! This problem has caused many inexplicable problems in the company's project code, so please remember not to make any assumptions about the host byte order, and be sure to convert it into network byte order before assigning it to the socket.

3.3, listen(), connect() function

If you are a server, after calling socket() and bind(), you will call listen() to listen to the socket. If the client calls connect() to issue a connection request, the server will receive the request.

int listen(int sockfd, int backlog);int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

The first parameter of the listen function is the socket descriptor to be listened to, and the second The parameter is the maximum number of connections that can be queued by the corresponding socket. The socket created by the socket() function is an active type by default, and the listen function changes the socket to a passive type, waiting for the client's connection request.

The first parameter of the connect function is the client's socket descriptor, the second parameter is the server's socket address, and the third parameter is the length of the socket address. The client establishes a connection with the TCP server by calling the connect function.

3.4, accept() function

After the TCP server calls socket(), bind(), and listen() in sequence, it will listen to the specified socket address. After calling socket() and connect() in sequence, the TCP client sends a connection request to the TCP server. After the TCP server listens to this request, it will call the accept() function to receive the request, so that the connection is established. Then you can start network I/O operations, which are similar to ordinary file read and write I/O operations.

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen); The first parameter of the

accept function is the socket descriptor of the server, and the second parameter is a pointer to struct sockaddr * for return The protocol address of the client. The third parameter is the length of the protocol address. If accpet succeeds, the return value is a new descriptor automatically generated by the kernel, representing the TCP connection to the returning client.

Note: The first parameter of accept is the server's socket descriptor, which is generated when the server starts calling the socket() function, which is called the listening socket descriptor; and the accept function returns the connected socket descriptor. A server usually only creates a listening socket descriptor, which exists throughout the life cycle of the server. The kernel creates a connected socket descriptor for each client connection accepted by the server process. When the server completes serving a client, the corresponding connected socket descriptor is closed.

3.5, read(), write() and other functions

Everything is needed but the east wind is needed. At this point, the connection between the server and the client has been established. Network I/O can be called for read and write operations, which means communication between different processes in the network is realized! Network I/O operations have the following groups:

read()/write()

recv()/send()

readv()/writev()

recvmsg()/sendmsg()

recvfrom( )/sendto()

I recommend using the recvmsg()/sendmsg() function. These two functions are the most common I/O functions. In fact, you can replace all the other functions above with these two functions. Their declarations are as follows:

    #include

  ssize_t read(int fd, void *buf, size_t count);
  ssize_t write(int fd, const void *buf, size_t count);

  # include
        #include

    ssize_t send(int sockfd, const void *buf, size_t len, int flags);
  ssize_t recv(int sockfd, void *buf , size_t len, int flags);

ssize_t sendto(int sockfd, const void *buf, size_t len, int flags, const struct sockaddr *dest_addr, socklen_t addrlen);
ssize_t rec vfrom(int sockfd, void *buf, size_t len , int flags, struct sockaddr *src_addr, socklen_t *addrlen);

ssize_t sendmsg(int sockfd, const struct msghdr *msg, int flags);
ssize_t recvmsg(int so ckfd, struct msghdr *msg, int flags);

The read function is responsible for reading content from fd. When the read is successful, read returns the actual number of bytes read. If the returned value is 0, it means that the end of the file has been read. If it is less than 0, it means that an error has occurred. If the error is EINTR, it means that the read was caused by an interrupt. If it is ECONNREST, it means there is a problem with the network connection.

The write function writes the nbytes bytes content in buf to the file descriptor fd. When successful, it returns the number of bytes written. On failure, -1 is returned and the errno variable is set. In network programs, there are two possibilities when we write to the socket file descriptor. 1) The return value of write is greater than 0, indicating that part or all of the data has been written. 2) The returned value is less than 0, and an error occurred. We have to deal with it according to the error type. If the error is EINTR, it means that an interrupt error occurred during writing. If it is EPIPE, it means there is a problem with the network connection (the other party has closed the connection).

I will not introduce these pairs of I/O functions one by one. For details, please refer to the man document or Baidu or Google. Send/recv will be used in the following example.

3.6, close() function

在服务器与客户端建立连接之后,会进行一些读写操作,完成了读写操作就要关闭相应的socket描述字,好比操作完打开的文件要调用fclose关闭打开的文件。

#include int close(int fd);

close一个TCP socket的缺省行为时把该socket标记为以关闭,然后立即返回到调用进程。该描述字不能再由调用进程使用,也就是说不能再作为read或write的第一个参数。

注意:close操作只是使相应socket描述字的引用计数-1,只有当引用计数为0的时候,才会触发TCP客户端向服务器发送终止连接请求。

4、socket中TCP的三次握手建立连接详解

我们知道tcp建立连接要进行“三次握手”,即交换三个分组。大致流程如下:

客户端向服务器发送一个SYN J

服务器向客户端响应一个SYN K,并对SYN J进行确认ACK J+1

客户端再想服务器发一个确认ACK K+1

只有就完了三次握手,但是这个三次握手发生在socket的那几个函数中呢?请看下图:

Detailed explanation of Socket

图1、Detailed explanation of Socket

从图中可以看出,当客户端调用connect时,触发了连接请求,向服务器发送了SYN J包,这时connect进入阻塞状态;服务器监听到连接请求,即收到SYN J包,调用accept函数接收请求向客户端发送SYN K ,ACK J+1,这时accept进入阻塞状态;客户端收到服务器的SYN K ,ACK J+1之后,这时connect返回,并对SYN K进行确认;服务器收到ACK K+1时,accept返回,至此三次握手完毕,连接建立。

总结:客户端的connect在三次握手的第二个次返回,而服务器端的accept在三次握手的第三次返回。

5、socket中TCP的四次握手释放连接详解

上面介绍了socket中TCP的三次握手建立过程,及其涉及的socket函数。现在我们介绍socket中的四次握手释放连接的过程,请看下图:

Detailed explanation of Socket

图2、Detailed explanation of Socket

图示过程如下:

某个应用进程首先调用close主动关闭连接,这时TCP发送一个FIN M;

另一端接收到FIN M之后,执行被动关闭,对这个FIN进行确认。它的接收也作为文件结束符传递给应用进程,因为FIN的接收意味着应用进程在相应的连接上再也接收不到额外数据;

一段时间之后,接收到文件结束符的应用进程调用close关闭它的socket。这导致它的TCP也发送一个FIN N;

接收到这个FIN的源发送端TCP对它进行确认。

这样每个方向上都有一个FIN和ACK。

6、一个例子(实践一下)

说了这么多了,动手实践一下。下面编写一个简单的服务器、客户端(使用TCP)——服务器端一直监听本机的6666号端口,如果收到连接请求,将接收请求并接收客户端发来的消息;客户端与服务器端建立连接并发送一条消息。

服务器端代码:

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<errno.h>
#include<sys/types.h>
#include<sys/socket.h>
#include<netinet/in.h>

#define MAXLINE 4096

int main(int argc, char** argv)
{
    int    listenfd, connfd;
    struct sockaddr_in     servaddr;
    char    buff[4096];
    int     n;

    if( (listenfd = socket(AF_INET, SOCK_STREAM, 0)) == -1 ){
    printf("create socket error: %s(errno: %d)\n",strerror(errno),errno);
    exit(0);
    }

    memset(&servaddr, 0, sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(6666);

    if( bind(listenfd, (struct sockaddr*)&servaddr, sizeof(servaddr)) == -1){
    printf("bind socket error: %s(errno: %d)\n",strerror(errno),errno);
    exit(0);
    }

    if( listen(listenfd, 10) == -1){
    printf("listen socket error: %s(errno: %d)\n",strerror(errno),errno);
    exit(0);
    }

    printf("======waiting for client&#39;s request======\n");
    while(1){
    if( (connfd = accept(listenfd, (struct sockaddr*)NULL, NULL)) == -1){
        printf("accept socket error: %s(errno: %d)",strerror(errno),errno);
        continue;
    }
    n = recv(connfd, buff, MAXLINE, 0);
    buff[n] = &#39;\0&#39;;
    printf("recv msg from client: %s\n", buff);
    close(connfd);
    }

    close(listenfd);
}

当然上面的代码很简单,也有很多缺点,这就只是简单的演示socket的基本函数使用。其实不管有多复杂的网络程序,都使用的这些基本函数。上面的服务器使用的是迭代模式的,即只有处理完一个客户端请求才会去处理下一个客户端的请求,这样的服务器处理能力是很弱的,现实中的服务器都需要有并发处理能力!为了需要并发处理,服务器需要fork()一个新的进程或者线程去处理请求等。

7、动动手

留下一个问题,欢迎大家回帖回答!!!是否熟悉Linux下网络编程?如熟悉,编写如下程序完成如下功能:

服务器端:

接收地址192.168.100.2的客户端信息,如信息为“Client Query”,则打印“Receive Query”

客户端:

Send the information "Client Query test", "Cleint Query", "Client Query Quit" sequentially to the server at address 192.168.100.168, and then exit.

The IP address appearing in the question can be determined according to the actual situation.


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn