Home  >  Article  >  Backend Development  >  Detailed explanation of sticky packet problem in python socket network programming

Detailed explanation of sticky packet problem in python socket network programming

不言
不言Original
2018-04-28 13:36:002114browse

This article mainly introduces the detailed explanation of the sticky problem of python socket network programming. Now I will share it with you and give you a reference. Let’s take a look together

1. Details of the sticky packet problem

1. Only TCP has packet sticking phenomenon, UDP will never Will stick packets

Your program actually does not have the right to directly operate the network card. When you operate the network card, you operate the interface exposed by the operating system to the user program. Then Every time your program wants to send data to a remote location, it actually copies the data from the user state to the kernel state. This operation consumes resources and time. Frequently exchanging data between the kernel state and the user state will inevitably cause the data to be sent. The efficiency is reduced. Therefore, in order to improve the transmission efficiency of the socket, the sender often has to collect enough data before sending the data to the other party once. If the data that needs to be sent several times in a row is very small, usually the TCP socket will combine the data into a TCP segment according to the optimization algorithm and send it out at once, so that the receiver receives the sticky packet data.

2, first you need to master the principle of sending and receiving messages through a socket

The sender can be 1k, 1k of sending data and the receiving end application can extract 2k, 2k Of course, the data may be extracted from 3k or more k, that is to say, the application is invisible, so the TCP protocol is the protocol for that stream, which is also the reason why sticky packets are prone to occur, while UDP is for connectionless protocol, each UDP segment is a message, and the application must extract data in message units and cannot extract any byte of data at a time. This is very similar to TCP. How to define message? It is considered that the data written/sent by the other party at one time is a message. What needs to be known is that when the other party sends a message, no matter how Dingcheng fragments it, the TCP protocol layer will sort the data segments that make up the entire message. before appearing in the kernel buffer.

For example, a TCP-based socket client uploads a file to the server. When sending, the file content is sent as a byte stream segment by segment. It seems even more stupid to the receiver who does not know the byte stream of the file. Where does it begin and where does it end.

3, reasons for sticky packets

3-1 Direct reasons

The so-called sticky packet problem is mainly because the receiver does not know the difference between messages Boundary, caused by not knowing how many bytes of data to extract at one time

3-2 The root cause

The sticky packet caused by the sender is caused by the TCP protocol itself. TCP is to improve transmission For efficiency, the sender often has to collect enough data before sending a TCP segment. If there is very little data that needs to be sent several times in a row, usually TCP will combine the data into one TCP segment according to Optimization Algorithm and send it out at once, so that the receiver receives the sticky packet data.

3-3 Summary

  1. TCP (transport control protocol) is connection-oriented and stream-oriented, providing high reliability services. Both the sending and receiving ends (client and server) must have a pair of sockets. Therefore, in order to send multiple packets to the receiving end more efficiently, the sending end uses an optimization method (Nagle algorithm). Combine multiple data with small intervals and small data volume into one large data block, and then package it. In this way, it will be difficult for the receiving end to distinguish, and a scientific unpacking mechanism must be provided. That is, stream-oriented communication has no message protection boundaries.

  2. UDP (user datagram protocol) is connectionless, message-oriented, and provides high-efficiency services. The block merging optimization algorithm will not be used. Since UDP supports one-to-many mode, the skbuff (socket buffer) on the receiving end adopts a chain structure to record each arriving UDP packet. In each UDP There is a message header (message source address, port and other information) in the package, so that it is easy for the receiving end to distinguish and process it. That is, message-oriented communication has message protection boundaries.

  3. tcp is based on data flow, so the messages sent and received cannot be empty. This requires adding an empty message processing mechanism on both the client and the server to prevent program stuck Live, and UDP is based on datagrams. Even if you enter empty content (press Enter directly), it is not an empty message. The UDP protocol will help you encapsulate the message header. The experiment is abbreviated

UDP's recvfrom is blocked. One recvfrom(x) must be for only one sendinto(y). It is completed after collecting x bytes of data. If y>x data will be lost, which means that udp cannot Packets will stick, but data will be lost, and it is unreliable.

tcp protocol data will not be lost. If the packet is not received, the next time it is received, it will continue to receive the last time. The end always receives the ack when it is received. The buffer contents will be cleared. The data is reliable, but can be sticky.

2. Sticky packets will occur in two situations:

1. The sending end needs to wait until the local buffer is full before sending out, resulting in sticky packets (the time interval for sending data is very short, the data is very small, Python uses an optimization algorithm, and together they produce sticky packets)

Client

#_*_coding:utf-8_*_
import socket
BUFSIZE=1024
ip_port=('127.0.0.1',8080)
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
res=s.connect_ex(ip_port)
s.send('hello'.encode('utf-8'))
s.send('feng'.encode('utf-8'))

Server

#_*_coding:utf-8_*_
from socket import *
ip_port=('127.0.0.1',8080)
tcp_socket_server=socket(AF_INET,SOCK_STREAM)
tcp_socket_server.bind(ip_port)
tcp_socket_server.listen(5)
conn,addr=tcp_socket_server.accept()
data1=conn.recv(10)
data2=conn.recv(10)
print('----->',data1.decode('utf-8'))
print('----->',data2.decode('utf-8'))
conn.close()

2. The receiving end does not accept the packets in the buffer in time, causing multiple packets to be accepted (the client sends a piece of data, the server Only a small part has been collected, and the server will still get the data left over from the buffer next time, resulting in sticky packets) Client

#_*_coding:utf-8_*_
import socket
BUFSIZE=1024
ip_port=('127.0.0.1',8080)
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
res=s.connect_ex(ip_port)
s.send('hello feng'.encode('utf-8'))

Server

#_*_coding:utf-8_*_
from socket import *
ip_port=('127.0.0.1',8080)
tcp_socket_server=socket(AF_INET,SOCK_STREAM)
tcp_socket_server.bind(ip_port)
tcp_socket_server.listen(5)
conn,addr=tcp_socket_server.accept()
data1=conn.recv(2) #一次没有收完整
data2=conn.recv(10)#下次收的时候,会先取旧的数据,然后取新的
print('----->',data1.decode('utf-8'))
print('----->',data2.decode('utf-8'))
conn.close()

Three, sticky package example:

Server

import socket
import subprocess
din=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
ip_port=('127.0.0.1',8080)
din.bind(ip_port)
din.listen(5)
conn,deer=din.accept()
data1=conn.recv(1024)
data2=conn.recv(1024)
print(data1)
print(data2)

Client:

import socket
import subprocess
din=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
ip_port=('127.0.0.1',8080)
din.connect(ip_port)
din.send('helloworld'.encode('utf-8'))
din.send('sb'.encode('utf-8'))

Four, The occurrence of unpacking

When the length of the sender buffer is greater than the MTU of the network card, tcp will split the data sent this time into several data packets and send them over

Supplementary Question 1:Why tcp is reliable transmission and udp is unreliable transmission

When tcp transmits data, the sender first sends the data to its own cache, and then the protocol controls Send the data in the cache to the peer, the peer returns an ack=1, the sender clears the data in the cache, and the peer returns ack=0, then resends the data, so tcp is reliable

When udp sends data, the peer will not return confirmation information, so it is unreliable

Supplementary question two:What do send (byte stream), recv(1024) and sendall mean? ?

The 1024 specified in recv means that 1024 bytes of data are taken out from the cache at a time.

The byte stream of send is first put into the own-side cache, and then the cache is controlled by the protocol. The content is sent to the opposite end. If the size of the byte stream is larger than the remaining cache space, the data will be lost. Use sendall to call send in a loop, and the data will not be lost.

5. How to solve the sticky bag problem?

The root of the problem is that the receiving end does not know the length of the byte stream to be transmitted by the sending end, so the solution to sticky packets is to focus on how to make the sending end send data before sending it. Let the receiving end know the total size of the byte stream you are about to send, and then the receiving end will create an infinite loop to receive all the data.

5-1 Simple solution (from the surface):

Add a time sleep under the client sending to avoid packet sticking. The server must also perform time sleep when receiving to effectively avoid sticky packets.

Client:

#客户端
import socket
import time
import subprocess
din=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
ip_port=('127.0.0.1',8080)
din.connect(ip_port)
din.send('helloworld'.encode('utf-8'))
time.sleep(3)
din.send('sb'.encode('utf-8'))

Server:

#服务端
import socket
import time
import subprocess
din=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
ip_port=('127.0.0.1',8080)
din.bind(ip_port)
din.listen(5)
conn,deer=din.accept()
data1=conn.recv(1024)
time.sleep(4)
data2=conn.recv(1024)
print(data1)
print(data2)

There will definitely be a lot of flaws in the above solution, because you don’t know when the transmission is completed, and the length of the time pause will vary. The problem is that it is inefficient if it is long and inappropriate if it is short, so this method is inappropriate.

5-2 Common solution (look at the problem from the root):

The root of the problem is that the receiving end does not know the byte stream to be transmitted by the sending end. length, so the way to solve sticky packets is to focus on how to let the sending end let the receiving end know the total size of the byte stream it will send before sending data, and then the receiving end will create an infinite loop to receive all the data

Add a custom fixed-length header to the byte stream. The header contains the length of the byte stream, and then sends it to the peer in turn. When the peer receives it, it first takes out the fixed-length header from the cache, and then gets the real data. .

Use the struct module to pack a fixed length of 4 bytes or eight bytes. When the struct.pack.format parameter is "i", you can only pack numbers with a length of 10, so you can also first Convert the length into a json string and then package it.

Ordinary client

# _*_ coding: utf-8 _*_ 
import socket
import struct
phone = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
phone.connect(('127.0.0.1',8880)) #连接服
while True:
 # 发收消息
 cmd = input('请你输入命令>>:').strip()
 if not cmd:continue
 phone.send(cmd.encode('utf-8')) #发送

 #先收报头
 header_struct = phone.recv(4) #收四个
 unpack_res = struct.unpack('i',header_struct)
 total_size = unpack_res[0] #总长度

 #后收数据
 recv_size = 0
 total_data=b''
 while recv_size<total_size: #循环的收
  recv_data = phone.recv(1024) #1024只是一个最大的限制
  recv_size+=len(recv_data) #
  total_data+=recv_data #
 print(&#39;返回的消息:%s&#39;%total_data.decode(&#39;gbk&#39;))
phone.close()

Ordinary server

# _*_ coding: utf-8 _*_ 
import socket
import subprocess
import struct
phone = socket.socket(socket.AF_INET,socket.SOCK_STREAM) #买手机
phone.bind((&#39;127.0.0.1&#39;,8880)) #绑定手机卡
phone.listen(5) #阻塞的最大数
print(&#39;start runing.....&#39;)
while True: #链接循环
 coon,addr = phone.accept()# 等待接电话
 print(coon,addr)
 while True: #通信循环

  # 收发消息
  cmd = coon.recv(1024) #接收的最大数
  print(&#39;接收的是:%s&#39;%cmd.decode(&#39;utf-8&#39;))

  #处理过程

  res = subprocess.Popen(cmd.decode(&#39;utf-8&#39;),shell = True,
           stdout=subprocess.PIPE, #标准输出
           stderr=subprocess.PIPE #标准错误
        )
  stdout = res.stdout.read()
  stderr = res.stderr.read()

  #先发报头(转成固定长度的bytes类型,那么怎么转呢?就用到了struct模块)
  #len(stdout) + len(stderr)#统计数据的长度
  header = struct.pack(&#39;i&#39;,len(stdout)+len(stderr))#制作报头
  coon.send(header)

  #再发命令的结果
  coon.send(stdout)
  coon.send(stderr)
 coon.close()
phone.close()


5-3 Optimized version of the solution (from Fundamental solution to the problem)

The optimal idea to solve the sticky problem is that the server optimizes the header information and uses a dictionary to describe the content to be sent. First of all, the dictionary cannot be directly transmitted over the network and needs to be Serialize and convert it into a json format string, and then convert it into bytes format for sending to the server. Because the length of the json string in bytes format is not fixed, you need to use the struct module to compress the length of the json string in bytes format into a fixed length. Send it to the client, the client accepts it, and the complete data packet will be obtained by decoding it.

Ultimate Edition Client

# _*_ coding: utf-8 _*_ 
import socket
import struct
import json
phone = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
phone.connect((&#39;127.0.0.1&#39;,8080)) #连接服务器
while True:
 # 发收消息
 cmd = input(&#39;请你输入命令>>:&#39;).strip()
 if not cmd:continue
 phone.send(cmd.encode(&#39;utf-8&#39;)) #发送

 #先收报头的长度
 header_len = struct.unpack(&#39;i&#39;,phone.recv(4))[0] #吧bytes类型的反解

 #在收报头
 header_bytes = phone.recv(header_len) #收过来的也是bytes类型
 header_json = header_bytes.decode(&#39;utf-8&#39;) #拿到json格式的字典
 header_dic = json.loads(header_json) #反序列化拿到字典了
 total_size = header_dic[&#39;total_size&#39;] #就拿到数据的总长度了

 #最后收数据
 recv_size = 0
 total_data=b&#39;&#39;
 while recv_size<total_size: #循环的收
  recv_data = phone.recv(1024) #1024只是一个最大的限制
  recv_size+=len(recv_data) #有可能接收的不是1024个字节,或许比1024多呢,
  # 那么接收的时候就接收不全,所以还要加上接收的那个长度
  total_data+=recv_data #最终的结果
 print(&#39;返回的消息:%s&#39;%total_data.decode(&#39;gbk&#39;))
phone.close()

Ultimate Edition Server

# _*_ coding: utf-8 _*_ 
import socket
import subprocess
import struct
import json
phone = socket.socket(socket.AF_INET,socket.SOCK_STREAM) #买手机
phone.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)
phone.bind((&#39;127.0.0.1&#39;,8080)) #绑定手机卡
phone.listen(5) #阻塞的最大数
print(&#39;start runing.....&#39;)
while True: #链接循环
 coon,addr = phone.accept()# 等待接电话
 print(coon,addr)

 while True: #通信循环
  # 收发消息
  cmd = coon.recv(1024) #接收的最大数
  print(&#39;接收的是:%s&#39;%cmd.decode(&#39;utf-8&#39;))

  #处理过程
  res = subprocess.Popen(cmd.decode(&#39;utf-8&#39;),shell = True,
           stdout=subprocess.PIPE, #标准输出
           stderr=subprocess.PIPE #标准错误
        )
  stdout = res.stdout.read()
  stderr = res.stderr.read()

  # 制作报头
  header_dic = {
   &#39;total_size&#39;: len(stdout)+len(stderr), # 总共的大小
   &#39;filename&#39;: None,
   &#39;md5&#39;: None
  }
  header_json = json.dumps(header_dic) #字符串类型
  header_bytes = header_json.encode(&#39;utf-8&#39;) #转成bytes类型(但是长度是可变的)

  #先发报头的长度
  coon.send(struct.pack(&#39;i&#39;,len(header_bytes))) #发送固定长度的报头
  #再发报头
  coon.send(header_bytes)
  #最后发命令的结果
  coon.send(stdout)
  coon.send(stderr)
 coon.close()
phone.close()

Six, struct module

了解c语言的人,一定会知道struct结构体在c语言中的作用,它定义了一种结构,里面包含不同类型的数据(int,char,bool等等),方便对某一结构对象进行处理。而在网络通信当中,大多传递的数据是以二进制流(binary data)存在的。当传递字符串时,不必担心太多的问题,而当传递诸如int、char之类的基本数据的时候,就需要有一种机制将某些特定的结构体类型打包成二进制流的字符串然后再网络传输,而接收端也应该可以通过某种机制进行解包还原出原始的结构体数据。python中的struct模块就提供了这样的机制,该模块的主要作用就是对python基本类型值与用python字符串格式表示的C struct类型间的转化(This module performs conversions between Python values and C structs represented as Python strings.)。stuct模块提供了很简单的几个函数,下面写几个例子。

1,基本的pack和unpack

struct提供用format specifier方式对数据进行打包和解包(Packing and Unpacking)。例如:

#该模块可以把一个类型,如数字,转成固定长度的bytes类型
import struct
# res = struct.pack(&#39;i&#39;,12345)
# print(res,len(res),type(res)) #长度是4
res2 = struct.pack(&#39;i&#39;,12345111)
print(res2,len(res2),type(res2)) #长度也是4
unpack_res =struct.unpack(&#39;i&#39;,res2)
print(unpack_res) #(12345111,)
# print(unpack_res[0]) #12345111

代码中,首先定义了一个元组数据,包含int、string、float三种数据类型,然后定义了struct对象,并制定了format‘I3sf',I 表示int,3s表示三个字符长度的字符串,f 表示 float。最后通过struct的pack和unpack进行打包和解包。通过输出结果可以发现,value被pack之后,转化为了一段二进制字节串,而unpack可以把该字节串再转换回一个元组,但是值得注意的是对于float的精度发生了改变,这是由一些比如操作系统等客观因素所决定的。打包之后的数据所占用的字节数与C语言中的struct十分相似。

2,定义format可以参照官方api提供的对照表:

3,基本用法

import json,struct
#假设通过客户端上传1T:1073741824000的文件a.txt
#为避免粘包,必须自定制报头
header={&#39;file_size&#39;:1073741824000,&#39;file_name&#39;:&#39;/a/b/c/d/e/a.txt&#39;,&#39;md5&#39;:&#39;8f6fbf8347faa4924a76856701edb0f3&#39;} #1T数据,文件路径和md5值

#为了该报头能传送,需要序列化并且转为bytes
head_bytes=bytes(json.dumps(header),encoding=&#39;utf-8&#39;) #序列化并转成bytes,用于传输

#为了让客户端知道报头的长度,用struck将报头长度这个数字转成固定长度:4个字节
head_len_bytes=struct.pack(&#39;i&#39;,len(head_bytes)) #这4个字节里只包含了一个数字,该数字是报头的长度

#客户端开始发送
conn.send(head_len_bytes) #先发报头的长度,4个bytes
conn.send(head_bytes) #再发报头的字节格式
conn.sendall(文件内容) #然后发真实内容的字节格式

#服务端开始接收
head_len_bytes=s.recv(4) #先收报头4个bytes,得到报头长度的字节格式
x=struct.unpack(&#39;i&#39;,head_len_bytes)[0] #提取报头的长度
head_bytes=s.recv(x) #按照报头长度x,收取报头的bytes格式
header=json.loads(json.dumps(header)) #提取报头

#最后根据报头的内容提取真实的数据,比如
real_data_len=s.recv(header[&#39;file_size&#39;])
s.recv(real_data_len)


The above is the detailed content of Detailed explanation of sticky packet problem in python socket network programming. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn