Home >php教程 >PHP开发 >Introduction to socket communication

Introduction to socket communication

高洛峰
高洛峰Original
2016-12-13 10:09:181096browse

We are well aware of the value of information exchange, so how do processes in the network communicate? For example, when we open the browser to browse the web every day, how does the browser process communicate with the web server? When you use QQ to chat, how does the QQ process communicate with the server or the QQ process where your friends are? Do all these rely on sockets? So what is a socket? What are the types of sockets? There are also basic functions of socket, which are what this article wants to introduce. The main contents of this article are as follows:

1. How to communicate between processes in the network?

2. What is Socket?

3. Basic operations of socket

3.1, socket() function

3.2, bind() function

3.3, listen(), connect() function

3.4, accept() function

3.5, read( ), write() function, etc.

3.6. close() function

4. Detailed explanation of TCP’s three-way handshake to establish a connection in socket

5. Detailed explanation of TCP’s four-way handshake to release connection in socket

6. An example (Practice (For a moment)

7. Leave a question, everyone is welcome to reply! ! !

1. How to communicate between processes in the network?

There are many ways of local inter-process communication (IPC), but they can be summarized into the following 4 categories:

Message passing (pipeline, FIFO, message queue)

Synchronization (mutex, condition variable, read-write lock, File and write record locks, semaphores)

Shared memory (anonymous and named)

Remote procedure calls (Solaris gates and Sun RPC)

But these are not the topic of this article! What we are going to discuss is how to communicate between processes in the network? The first problem to solve is how to uniquely identify a process, otherwise communication will be impossible! A process can be uniquely identified locally by the process PID, but this does not work on the network. In fact, the TCP/IP protocol family has helped us solve this problem. The "ip address" of the network layer can uniquely identify the host in the network, while the "protocol + port" of the transport layer can uniquely identify the application (process) in the host. In this way, the triplet (ip address, protocol, port) can be used to identify the network process, and process communication in the network can use this mark to interact with other processes.

Applications that use the TCP/IP protocol usually use application programming interfaces: sockets of UNIX BSD and TLI of UNIX System V (already obsolete) to achieve communication between network processes. For now, almost all applications use sockets, and now is the Internet era. Process communication on the network is ubiquitous. This is why I say "everything is socket".

2. What is Socket?

We already know above that processes in the network communicate through sockets, so what is a socket? Sockets originated from Unix, and one of the basic philosophies of Unix/Linux is that "everything is a file" and can be operated in the "open -> read and write write/read -> close" mode. My understanding is that Socket is an implementation of this mode. Socket is a special file, and some socket functions are operations on it (read/write IO, open, close). We will introduce these functions later.

The origin of the word socket

The first use in the field of networking was found in the document IETF RFC33 released on February 12, 1970, written by Stephen Carr, Steve Crocker and Vint Cerf. According to the Computer History Museum, Croker wrote: "Elements of a namespace may be called socket interfaces. A socket interface forms one end of a connection, and a connection may be fully specified by a pair of socket interfaces. "The Computer History Museum added: "This is about 12 years earlier than the BSD socket interface definition."

3. Basic operation of socket

Since socket is part of the "open-write/read-close" mode. implementation, then the socket provides functional interfaces corresponding to these operations. The following uses TCP as an example to introduce several basic socket interface functions.

3.1, socket() function

int socket(int domain, int type, int protocol);

socket function corresponds to the opening operation of ordinary files. The ordinary file open operation returns a file descriptor, and socket() is used to create a socket descriptor (socket descriptor), which uniquely identifies a socket. This socket descriptor is the same as the file descriptor. It is used in subsequent operations. It is used as a parameter to perform some read and write operations.

Just like you can pass different parameter values ​​to fopen to open different files. When creating a socket, you can also specify different parameters to create different socket descriptors. The three parameters of the socket function are:

domain: the protocol domain, also known as the protocol family. Commonly used protocol families include AF_INET, AF_INET6, AF_LOCAL (or AF_UNIX, Unix domain socket), AF_ROUTE, etc. The protocol family determines the address type of the socket, and the corresponding address must be used in communication. For example, AF_INET determines to use a combination of ipv4 address (32-bit) and port number (16-bit), and AF_UNIX determines to use an absolute path. Name as address.

type: Specify the socket type. Commonly used socket types include SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, SOCK_PACKET, SOCK_SEQPACKET, etc. (What are the types of sockets?).

protocol: Hence the name, which means specifying the protocol. Commonly used protocols include IPPROTO_TCP, IPPTOTO_UDP, IPPROTO_SCTP, IPPROTO_TIPC, etc., which respectively correspond to the TCP transmission protocol, UDP transmission protocol, STCP transmission protocol, and TIPC transmission protocol (I will discuss this protocol separately!).

Note: The above type and protocol cannot be combined at will. For example, SOCK_STREAM cannot be combined with IPPROTO_UDP. When protocol is 0, the default protocol corresponding to the type type is automatically selected.

When we call socket to create a socket, the returned socket descriptor exists in the protocol family (address family, AF_XXX) space, but does not have a specific address. If you want to assign an address to it, you must call the bind() function, otherwise the system will automatically assign a port randomly when calling connect() or listen().

3.2. bind() function

As mentioned above, the bind() function assigns a specific address in an address family to the socket. For example, corresponding to AF_INET and AF_INET6, an ipv4 or ipv6 address and port number combination is assigned to the socket.

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen); The three parameters of the

function are:

sockfd: the socket descriptor, which is created through the socket() function and uniquely identifies it. a socket. The bind() function will bind a name to this descriptor.

addr: a const struct sockaddr * pointer pointing to the protocol address to be bound to sockfd. This address structure varies according to the address protocol family when the address creates the socket. For example, ipv4 corresponds to:

struct sockaddr_in {
    sa_family_t    sin_family; /* address family: AF_INET */
    in_port_t      sin_port;   /* port in network byte order */
    struct in_addr sin_addr;   /* internet address */};/* Internet address. */struct in_addr {
    uint32_t       s_addr;     /* address in network byte order */};
ipv6对应的是: 
struct sockaddr_in6 { 
    sa_family_t     sin6_family;   /* AF_INET6 */ 
    in_port_t       sin6_port;     /* port number */ 
    uint32_t        sin6_flowinfo; /* IPv6 flow information */ 
    struct in6_addr sin6_addr;     /* IPv6 address */ 
    uint32_t        sin6_scope_id; /* Scope ID (new in 2.4) */ };struct in6_addr { 
    unsigned char   s6_addr[16];   /* IPv6 address */ };
Unix域对应的是: 
#define UNIX_PATH_MAX    108struct sockaddr_un { 
    sa_family_t sun_family;               /* AF_UNIX */ 
    char        sun_path[UNIX_PATH_MAX];  /* pathname */ };

addrlen: corresponds to the length of the address.

Usually the server will be bound to a well-known address (such as IP address + port number) when it is started to provide services, and customers can connect to the server through it; the client does not need to specify it, the system automatically assigns one Port number and its own IP address combination. This is why the server usually calls bind() before listening, but the client does not call it. Instead, the system randomly generates one during connect().

Network byte order and host byte order

Host byte order is what we usually call big endian and little endian modes: different CPUs have different byte order types, these byte order refers to the integers in memory The order in which they are saved is called host order. The standard definitions of Big-Endian and Little-Endian are quoted as follows:

 a) Little-Endian means that the low-order bytes are arranged at the low address end of the memory, and the high-order bytes are arranged at the high address end of the memory.

 b) Big-Endian means that the high-order bytes are arranged at the low address end of the memory, and the low-order bytes are arranged at the high address end of the memory.

Network byte order: 4-byte 32-bit values ​​are transmitted in the following order: first 0~7bit, then 8~15bit, then 16~23bit, and finally 24~31bit. This transfer order is called big-endian. Because all binary integers in the TCP/IP header are required to be in this order when transmitted over the network, it is also called network byte order. Byte order, as the name suggests, is the order in which data larger than one byte is stored in memory. There is no order issue with data of one byte.

So: When binding an address to a socket, please first convert the host byte order to network byte order, and do not assume that the host byte order uses Big-Endian like the network byte order. There have been murders caused by this problem! This problem has caused many inexplicable problems in the company's project code, so please remember not to make any assumptions about the host byte order, and be sure to convert it into network byte order before assigning it to the socket.

3.3, listen(), connect() function

If you are a server, after calling socket(), bind(), listen() will be called to listen to the socket. If the client calls connect() at this time, it will issue Connection request, the server will receive this request.

int listen(int sockfd, int backlog);
int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

The first parameter of the listen function is the socket descriptor to be monitored, and the second parameter is the maximum number of connections that can be queued for the corresponding socket. The socket created by the socket() function is an active type by default, and the listen function changes the socket to a passive type, waiting for the client's connection request.

The first parameter of the connect function is the client's socket descriptor, the second parameter is the server's socket address, and the third parameter is the length of the socket address. The client establishes a connection with the TCP server by calling the connect function.

3.4, accept() function

After the TCP server calls socket(), bind(), and listen() in sequence, it will listen to the specified socket address. After calling socket() and connect() in sequence, the TCP client sends a connection request to the TCP server. After the TCP server listens to this request, it will call the accept() function to receive the request, so that the connection is established. Then you can start network I/O operations, which are similar to ordinary file read and write I/O operations.

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

accept函数的第一个参数为服务器的socket描述字,第二个参数为指向struct sockaddr *的指针,用于返回客户端的协议地址,第三个参数为协议地址的长度。如果accpet成功,那么其返回值是由内核自动生成的一个全新的描述字,代表与返回客户的TCP连接。

注意:accept的第一个参数为服务器的socket描述字,是服务器开始调用socket()函数生成的,称为监听socket描述字;而accept函数返回的是已连接的socket描述字。一个服务器通常通常仅仅只创建一个监听socket描述字,它在该服务器的生命周期内一直存在。内核为每个由服务器进程接受的客户连接创建了一个已连接socket描述字,当服务器完成了对某个客户的服务,相应的已连接socket描述字就被关闭。

3.5、read()、write()等函数

万事具备只欠东风,至此服务器与客户已经建立好连接了。可以调用网络I/O进行读写操作了,即实现了网咯中不同进程之间的通信!网络I/O操作有下面几组:

read()/write()

recv()/send()

readv()/writev()

recvmsg()/sendmsg()

recvfrom()/sendto()

我推荐使用recvmsg()/sendmsg()函数,这两个函数是最通用的I/O函数,实际上可以把上面的其它函数都替换成这两个函数。它们的声明如下:

      #include <unistd.h>

       ssize_t read(int fd, void *buf, size_t count);
       ssize_t write(int fd, const void *buf, size_t count);

       #include <sys/types.h>
       #include <sys/socket.h>

       ssize_t send(int sockfd, const void *buf, size_t len, int flags);
       ssize_t recv(int sockfd, void *buf, size_t len, int flags);

       ssize_t sendto(int sockfd, const void *buf, size_t len, int flags,                      const struct sockaddr *dest_addr, socklen_t addrlen);
       ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags,                        struct sockaddr *src_addr, socklen_t *addrlen);

       ssize_t sendmsg(int sockfd, const struct msghdr *msg, int flags);
       ssize_t recvmsg(int sockfd, struct msghdr *msg, int flags);

read函数是负责从fd中读取内容.当读成功时,read返回实际所读的字节数,如果返回的值是0表示已经读到文件的结束了,小于0表示出现了错误。如果错误为EINTR说明读是由中断引起的,如果是ECONNREST表示网络连接出了问题。

write函数将buf中的nbytes字节内容写入文件描述符fd.成功时返回写的字节数。失败时返回-1,并设置errno变量。 在网络程序中,当我们向套接字文件描述符写时有俩种可能。1)write的返回值大于0,表示写了部分或者是全部的数据。2)返回的值小于0,此时出现了错误。我们要根据错误类型来处理。如果错误为EINTR表示在写的时候出现了中断错误。如果为EPIPE表示网络连接出现了问题(对方已经关闭了连接)。

其它的我就不一一介绍这几对I/O函数了,具体参见man文档或者baidu、Google,下面的例子中将使用到send/recv。

3.6、close()函数

在服务器与客户端建立连接之后,会进行一些读写操作,完成了读写操作就要关闭相应的socket描述字,好比操作完打开的文件要调用fclose关闭打开的文件。

#include <unistd.h>
int close(int fd);

close一个TCP socket的缺省行为时把该socket标记为以关闭,然后立即返回到调用进程。该描述字不能再由调用进程使用,也就是说不能再作为read或write的第一个参数。

注意:close操作只是使相应socket描述字的引用计数-1,只有当引用计数为0的时候,才会触发TCP客户端向服务器发送终止连接请求。

4、socket中TCP的三次握手建立连接详解

我们知道tcp建立连接要进行“三次握手”,即交换三个分组。大致流程如下:

客户端向服务器发送一个SYN J

服务器向客户端响应一个SYN K,并对SYN J进行确认ACK J+1

客户端再想服务器发一个确认ACK K+1

只有就完了三次握手,但是这个三次握手发生在socket的那几个函数中呢?请看下图:

Introduction to socket communication

图1、Introduction to socket communication

从图中可以看出,当客户端调用connect时,触发了连接请求,向服务器发送了SYN J包,这时connect进入阻塞状态;服务器监听到连接请求,即收到SYN J包,调用accept函数接收请求向客户端发送SYN K ,ACK J+1,这时accept进入阻塞状态;客户端收到服务器的SYN K ,ACK J+1之后,这时connect返回,并对SYN K进行确认;服务器收到ACK K+1时,accept返回,至此三次握手完毕,连接建立。

总结:客户端的connect在三次握手的第二个次返回,而服务器端的accept在三次握手的第三次返回。

5、socket中TCP的四次握手释放连接详解

上面介绍了socket中TCP的三次握手建立过程,及其涉及的socket函数。现在我们介绍socket中的四次握手释放连接的过程,请看下图:

Introduction to socket communication

图2、socket中发送的TCP四次握手

图示过程如下:

某个应用进程首先调用close主动关闭连接,这时TCP发送一个FIN M;

另一端接收到FIN M之后,执行被动关闭,对这个FIN进行确认。它的接收也作为文件结束符传递给应用进程,因为FIN的接收意味着应用进程在相应的连接上再也接收不到额外数据;

一段时间之后,接收到文件结束符的应用进程调用close关闭它的socket。这导致它的TCP也发送一个FIN N;

接收到这个FIN的源发送端TCP对它进行确认。

这样每个方向上都有一个FIN和ACK。

6、一个例子(实践一下)

说了这么多了,动手实践一下。下面编写一个简单的服务器、客户端(使用TCP)——服务器端一直监听本机的6666号端口,如果收到连接请求,将接收请求并接收客户端发来的消息;客户端与服务器端建立连接并发送一条消息。

服务器端代码:

服务器端

#include
#include
#include
#include
#include
#include
#include

#define MAXLINE 4096

int main(int argc, char** argv)
{
    int    listenfd, connfd;
    struct sockaddr_in     servaddr;
    char    buff[4096];
    int     n;
    if( (listenfd = socket(AF_INET, SOCK_STREAM, 0)) == -1 ){
    printf("create socket error: %s(errno: %d)/n",strerror(errno),errno);
    exit(0);
    }
    memset(&servaddr, 0, sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(6666);
    if( bind(listenfd, (struct sockaddr*)&servaddr, sizeof(servaddr)) == -1){
    printf("bind socket error: %s(errno: %d)/n",strerror(errno),errno);
    exit(0);
    }
    if( listen(listenfd, 10) == -1){
    printf("listen socket error: %s(errno: %d)/n",strerror(errno),errno);
    exit(0);
    }
    printf("======waiting for client&#39;s request======/n");
    while(1){
    if( (connfd = accept(listenfd, (struct sockaddr*)NULL, NULL)) == -1){
        printf("accept socket error: %s(errno: %d)",strerror(errno),errno);
        continue;
    }
    n = recv(connfd, buff, MAXLINE, 0);
    buff[n] = &#39;/0&#39;;
    printf("recv msg from client: %s/n", buff);
    close(connfd);
    }
    close(listenfd);
}

客户端代码:

客户端

#include
#include
#include
#include
#include
#include
#include

#define MAXLINE 4096

int main(int argc, char** argv)
{
    int    sockfd, n;
    char    recvline[4096], sendline[4096];
    struct sockaddr_in    servaddr;
    if( argc != 2){
    printf("usage: ./client <ipaddress>/n");
    exit(0);
    }
    if( (sockfd = socket(AF_INET, SOCK_STREAM, 0)) < 0){
    printf("create socket error: %s(errno: %d)/n", strerror(errno),errno);
    exit(0);
    }
    memset(&servaddr, 0, sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    servaddr.sin_port = htons(6666);
    if( inet_pton(AF_INET, argv[1], &servaddr.sin_addr) <= 0){
    printf("inet_pton error for %s/n",argv[1]);
    exit(0);
    }
    if( connect(sockfd, (struct sockaddr*)&servaddr, sizeof(servaddr)) < 0){
    printf("connect error: %s(errno: %d)/n",strerror(errno),errno);
    exit(0);
    }
    printf("send msg to server: /n");
    fgets(sendline, 4096, stdin);
    if( send(sockfd, sendline, strlen(sendline), 0) < 0)
    {
    printf("send msg error: %s(errno: %d)/n", strerror(errno), errno);
    exit(0);
    }
    close(sockfd);
    exit(0);
}

当然上面的代码很简单,也有很多缺点,这就只是简单的演示socket的基本函数使用。其实不管有多复杂的网络程序,都使用的这些基本函数。上面的服务器使用的是迭代模式的,即只有处理完一个客户端请求才会去处理下一个客户端的请求,这样的服务器处理能力是很弱的,现实中的服务器都需要有并发处理能力!为了需要并发处理,服务器需要fork()一个新的进程或者线程去处理请求等。

7、动动手

留下一个问题,欢迎大家回帖回答!!!是否熟悉Linux下网络编程?如熟悉,编写如下程序完成如下功能:

服务器端:

接收地址192.168.100.2的客户端信息,如信息为“Client Query”,则打印“Receive Query”

客户端:

向地址192.168.100.168的服务器端顺序发送信息“Client Query test”,“Cleint Query”,“Client Query Quit”,然后退出。

题目中出现的ip地址可以根据实际情况定。


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn