C language does not have a string type, it is essentially a char[] array; and the size of the C language array must be initialized when it is created. It cannot be changed after the type is specified, and the last character array The element is always the null character '\0' .
The following shows a C string with the value "Redis":
Redis does not directly use the string method of C language, but builds A type of simple dynamic string (SDS). The underlying strings in Redis are stored using the SDS structure. For example, the underlying key-value pairs containing strings are all implemented using the SDS structure.
The SDS structure is defined in sds.h
struct sdshdr{ int len;//SDS保存的字符串长度 int free;//buf数组中未使用字节数量 char buf[];//字符数组,保存字符串 }
The last byte saves the null character '\0', retaining the specification of the C string , so that the SDS structured string can reuse some functions of the C function library.
Mainly because C string has the following shortcomings:
The time complexity of obtaining the string length is O(N): C To obtain the length of a string, you need to traverse the entire string until you encounter the '\0' null character. A buffer overflow can occur if insufficient memory is allocated during a string append operation. Memory reallocation: Every time a string is grown or truncated, the program must perform a memory reallocation operation on the array that holds the C string. Memory reallocation involves complex algorithms and may require the execution of system calls, so it is usually relatively time-consuming. hour. Null character problem: Spaces cannot be stored in the middle of a C string, otherwise the program will mistakenly think it is the end of the string when traversing. Due to this limitation, C strings can only be used to store text data and are not suitable for saving binary data such as pictures, audio and video, and compressed files.
1. SDS records the SDS length through the len attribute, so the time complexity of obtaining the length is O( 1), that is, the time complexity of the strlen command is O(1).
2. The SDS space allocation strategy avoids buffer overflow: when SDS is modified, it will first check whether the SDS space meets the modification. If not, it will automatically expand to the required size before performing the modification.
3. Fewer memory reallocation times when modifying strings: free in SDS records unused bytes in the buf byte array.
Redis implements two optimization strategies of space pre-allocation and lazy space release through the free attribute.
Space pre-allocation: When performing a growth operation on SDS, the program will not only allocate the space necessary for modification, but also allocate additional unused space for SDS. The number of memory reallocations is reduced when string growth operations are performed continuously, which is achieved through the pre-allocation strategy. Lazy space release: When the SDS is truncated, the program will not immediately reclaim the memory occupied by the extra bytes after shortening. Instead, it will use the free attribute to record the extra bytes for future use. The unused space may come in handy for future SDS growth, where the growth operation does not necessarily require memory reallocation.
The buf byte array in the SDS structure is binary safe and can not only save characters but also binary data.
SDS retains the convention of C strings, setting the end of the data to the null character '\0'. The reason why SDS retains this specification is that it can reuse some functions of the C string function library, such as append String.
Three encodings of Redis string:
int stores 8-byte long integer (long, 2^63- 1) embstr, embstr format SDS (Simple Dynamic String) raw, raw format SDS, stores long strings greater than 44 bytes
int type refers to numbers, then raw and embstr both represent What are the similarities and differences between strings? Let’s analyze them below.
The picture shows the difference between the two. You can see that embstr saves redisObject and SDS in a continuous 64-byte space, so that only one memory allocation is required. For raw, the separation of SDS and redisObject requires two memory allocations and takes up more memory space.
You can see that embstr uses a structure called sdshdr8 in 3.2. Under this structure, metadata only requires 3 bytes, while Redis requires 8 bytes. , so a total of 64 bytes, minus redisObject (16 bytes), and then minus the original information of SDS, the final actual content becomes 44 bytes and 39 bytes.
When the string is less than or equal to 44 bytes, Redis uses the embedded string creation method to reduce memory allocation and memory fragmentation.
The following picture shows the process of createEmbeddedStringObject creating an embedded string:
In short, just remember that Redis will realize a continuous piece of string by design. Memory space, compactly place the redisObject structure and SDS structure together.
In this way, for strings no longer than 44 bytes, memory fragmentation and the overhead of two memory allocations can be avoided.
SDS is an efficient string implementation in Redis. It has the advantages of automatic expansion, binary safety, O(1) length acquisition and modification, etc. In actual applications, SDS can help us achieve efficient string operations and also avoid some common string operation problems, such as buffer overflow. By in-depth understanding of the internal structure and implementation principles of SDS, we can better understand the underlying mechanism of Redis and further improve our Redis application capabilities.
The above is the detailed content of How to solve the simple dynamic string problem of SDS in Redis. For more information, please follow other related articles on the PHP Chinese website!