Memory optimization is crucial for writing performant software systems. When a software has a finite amount of memory to work with, many issues can arise when that memory isn't used efficiently. That's why memory optimization is critical for better overall performance.
Go inherits many of C's advantageous features, but what I notice is that a large part of people who use it do not know the full power of this language. One of the reasons may be a lack of knowledge about how it works at a low level, or a lack of experience with languages like C or C . I mention C and C because the foundations of Go are pretty much built on the wonderful features of C/C . It's no coincidence that I will quote an interview of Ken Thompson at Google I/O 2012:
For me, the reason I was enthusiastic about Go is because just about the same time that we were starting on Go, I read (or tried to read) the C 0x proposed standard, and that was a convincer for me.
Today, we're going to talk about how we can optimize our Go program, and more specifically, how it would be good to use structs in Go. Let's first say what a structure is:
A struct is a user-defined data type that groups related variables of different types under a single name.
To fully understand where the problem lies, we will mention that modern processors do not read 1 byte at a time from the memory. How the CPU fetches the data or instructions which are stored in the memory?
In computer architecture, a word is a unit of data that a processor can handle in a single operation - generally the smallest addressable unit of memory. It's a fixed-size group of bits (binary digits). The word size of a processor determines its ability to handle data efficiently. Common word sizes include 8, 16, 32, and 64 bits. Some computer processor architectures support a half word, which is half the number of bits in a word, and a double word, which is two contiguous words.
In nowday the most common architectures is 32 bit and 64 bit. If you have 32 bit processor then it means it can access 4 bytes at a time which means word size is 4 bytes. If you have 64 bit processor the it can access 8 bytes at time which means word size is 8 bytes.
When we store the data in memory, each 32-bit data word has a unique address, as shown below.
Figure. 1 ‑ Word-Addressable Memory
We can read the data in memory and load it to one register using the load word (lw) instruction.
After know the above theoria let's see what is the practise. For the descripte the cases with structure data structure, I will demonstrate with C languge. A struct in C is a composite data type that allows you to group multiple variables together and store them in the same block of memory. As we said early the CPU access the data depend of the givemn architecture. Every data type in C will have alignment requirements.
So let's we have the following as simple structure:
// structure 1 typedef struct example_1 { char c; short int s; } struct1_t; // structure 2 typedef struct example_2 { double d; int s; char c; } struct2_t;
And now try to calculate the size of the following structures:
Size of Structure 1 = Size of (char short int) = 1 2 = 3.
Size of Structure 2 = Size of (double int char) = 8 4 1= 13.
The real sizes using a C program might surprise you.
#include <stdio.h> // structure 1 typedef struct example_1 { char c; short int s; } struct1_t; // structure 2 typedef struct example_2 { double d; int s; char c; } struct2_t; int main() { printf("sizeof(struct1_t) = %lu\n", sizeof(struct1_t)); printf("sizeof(struct2_t) = %lu\n", sizeof(struct2_t)); return 0; } </stdio.h>
Output
sizeof(struct1_t) = 4 sizeof(struct2_t) = 16
As we can see, the size of the structures is different from those we calculated.
What is the reason for that?
C and Go employ a technique known as "struct padding" to ensure data is appropriately aligned in memory, which can significantly affect performance due to hardware and architectural constraints. Data padding and alignment conform to the requirements of the system's architecture, mainly to optimize CPU access times by ensuring data boundaries align with word sizes.
Let's go through an example to illustrate how Go handles padding and alignment, consider the following struct:
type Employee struct { IsAdmin bool Id int64 Age int32 Salary float32 }
A bool is 1 byte, int64 is 8 bytes, int32 is 4 bytes and float32 is 4 bytes = 17 bytes(total).
Let's validate the struct size by examining the compiled Go program:
package main import ( "fmt" "unsafe" ) type Employee struct { IsAdmin bool Id int64 Age int32 Salary float32 } func main() { var emp Employee fmt.Printf("Size of Employee: %d\n", unsafe.Sizeof(emp)) }
Output
Size of Employee: 24
The reported size is 24 bytes, not 17. This discrepancy is due to memory alignment. To understand how alignment works, we need to inspect the structure and visualize the memory which it ocupate.
Figure 2 - Unoptimized Memory Layout
The struct Employee will consume 8*3 = 24 bytes. Yo see the problem now, there are a lot of empty holes in the layout of Employee (those gaps created by the alignment rules are called “padding”).
Padding Optimization and Performance Impact
Understanding how memory alignment and padding can affect the performance of an application is crucial. Specifically, data alignment impacts the number of CPU cycles required to access fields within a struct. This influence arises mostly from CPU cache effects, rather than raw clock cycles themselves, as cache behavior depends heavily on data locality and alignment within memory blocks.
Modern CPUs fetch data from memory into a faster intermediary called cache, organized in fixed-size blocks (commonly 64 bytes). When data is well-aligned and localized within the same or fewer cache lines, the CPU can access it more quickly due to reduced cache loading operations.
Consider the following Go structures to illustrate poor versus optimal alignment:
// structure 1 typedef struct example_1 { char c; short int s; } struct1_t; // structure 2 typedef struct example_2 { double d; int s; char c; } struct2_t;
How Alignment Affects Performance
CPU reads data in word-size instead of byte-size. As I described in the begining a word in a 64-bit system is 8 bytes, while a word in a 32-bit system is 4 bytes. In short, CPU reads address in the multiple of its word size. To fetch the variable passportId, our CPU takes two cycles to access the data instead of one. The first cycle will fetch memory 0 to 7 and the subsequent cycle will fetch the rest. And this is inefficient- we need of data structure alignment.By simply aligning the data, computers ensure that the var passportId can be retrieved in ONE CPU cycle.
Figure 3 - Comparing Memory Access Efficiency
Padding is the key to achieving data alignment. Padding occurs because modern CPUs are optimized to read data from memory at aligned addresses. This alignment allows the CPU to read the data in a single operation.
Figure 4 - Simply aligning the data
Without padding, data may be misaligned, leading to multiple memory accesses and slower performance. Therefore, while padding might waste some memory, it ensures that your program runs efficiently.
Padding Optimization Strategies
Aligned struct consumes lesser memory simply because it possesses a better struct fields order compared to Misaligned. Because of padding, two 13 bytes data structures turn out to be 16 bytes and 24 bytes respectively. Hence, you save up extra memory by simply reordering your struct fields.
Figure 5 - Optimizing Field Order
Improperly aligned data can slow performance as the CPU might need multiple cycles to access misaligned fields. Conversely, correctly aligned data minimizes cache line loads, which is crucial for performance, especially in systems where memory speed is a bottleneck.
Let’s do a simple benchmark to prove it:
#include <stdio.h> // structure 1 typedef struct example_1 { char c; short int s; } struct1_t; // structure 2 typedef struct example_2 { double d; int s; char c; } struct2_t; int main() { printf("sizeof(struct1_t) = %lu\n", sizeof(struct1_t)); printf("sizeof(struct2_t) = %lu\n", sizeof(struct2_t)); return 0; } </stdio.h>
Output
sizeof(struct1_t) = 4 sizeof(struct2_t) = 16
As you can see the traversing the Aligned indeed takes lesser time than its counterpart.
Padding’s added to make sure each struct field lines up properly in memory based on its needs, like we saw earlier. But while it enables efficient access, padding can also waste space if the fields ain’t ordered well.
Understanding how to properly align struct fields to minimize memory waste due to padding is important for efficient memory usage, especially in performance-critical applications. Below, I will provide an example with a poorly aligned structure and then show an optimized version of the same structure.
In a poorly aligned struct, fields are ordered without considering their sizes and alignment requirements, which can lead to added padding and increased memory usage:
// structure 1 typedef struct example_1 { char c; short int s; } struct1_t; // structure 2 typedef struct example_2 { double d; int s; char c; } struct2_t;
The total memory could hence be 1 (bool) 7 (padding) 8 (float64) 4 (int32) 4 (padding) 16 (string) = 40 bytes.
An optimized structure arranges fields from largest to smallest size, significantly reducing or eliminating the need for additional padding:
#include <stdio.h> // structure 1 typedef struct example_1 { char c; short int s; } struct1_t; // structure 2 typedef struct example_2 { double d; int s; char c; } struct2_t; int main() { printf("sizeof(struct1_t) = %lu\n", sizeof(struct1_t)); printf("sizeof(struct2_t) = %lu\n", sizeof(struct2_t)); return 0; } </stdio.h>
The total memory would then neatly comprise 8 (float64) 16 (string) 4 (int32) 1 (bool) 3 (padding) = 32 bytes.
Let's proof the above:
sizeof(struct1_t) = 4 sizeof(struct2_t) = 16
Output
type Employee struct { IsAdmin bool Id int64 Age int32 Salary float32 }
Reducing the structure size from 40 bytes to 32 bytes means a 20% reduction in memory usage per instance of Person. This can lead to considerable savings in applications where many such instances are created or stored, improving cache efficiency and potentially reducing the number of cache misses.
Conclusion
Data alignment is a pivotal factor in optimizing memory utilization and enhancing system performance. By arranging struct data correctly, memory usage becomes not only more efficient but also faster in terms of CPU read times, contributing significantly to overall system efficiency.
The above is the detailed content of Optimizing Memory Usage in Go: Mastering Data Structure Alignment. For more information, please follow other related articles on the PHP Chinese website!

Go's "strings" package provides rich features to make string operation efficient and simple. 1) Use strings.Contains() to check substrings. 2) strings.Split() can be used to parse data, but it should be used with caution to avoid performance problems. 3) strings.Join() is suitable for formatting strings, but for small datasets, looping = is more efficient. 4) For large strings, it is more efficient to build strings using strings.Builder.

Go uses the "strings" package for string operations. 1) Use strings.Join function to splice strings. 2) Use the strings.Contains function to find substrings. 3) Use the strings.Replace function to replace strings. These functions are efficient and easy to use and are suitable for various string processing tasks.

ThebytespackageinGoisessentialforefficientbyteslicemanipulation,offeringfunctionslikeContains,Index,andReplaceforsearchingandmodifyingbinarydata.Itenhancesperformanceandcodereadability,makingitavitaltoolforhandlingbinarydata,networkprotocols,andfileI

Go uses the "encoding/binary" package for binary encoding and decoding. 1) This package provides binary.Write and binary.Read functions for writing and reading data. 2) Pay attention to choosing the correct endian (such as BigEndian or LittleEndian). 3) Data alignment and error handling are also key to ensure the correctness and performance of the data.

The"bytes"packageinGooffersefficientfunctionsformanipulatingbyteslices.1)Usebytes.Joinforconcatenatingslices,2)bytes.Bufferforincrementalwriting,3)bytes.Indexorbytes.IndexByteforsearching,4)bytes.Readerforreadinginchunks,and5)bytes.SplitNor

Theencoding/binarypackageinGoiseffectiveforoptimizingbinaryoperationsduetoitssupportforendiannessandefficientdatahandling.Toenhanceperformance:1)Usebinary.NativeEndianfornativeendiannesstoavoidbyteswapping.2)BatchReadandWriteoperationstoreduceI/Oover

Go's bytes package is mainly used to efficiently process byte slices. 1) Using bytes.Buffer can efficiently perform string splicing to avoid unnecessary memory allocation. 2) The bytes.Equal function is used to quickly compare byte slices. 3) The bytes.Index, bytes.Split and bytes.ReplaceAll functions can be used to search and manipulate byte slices, but performance issues need to be paid attention to.

The byte package provides a variety of functions to efficiently process byte slices. 1) Use bytes.Contains to check the byte sequence. 2) Use bytes.Split to split byte slices. 3) Replace the byte sequence bytes.Replace. 4) Use bytes.Join to connect multiple byte slices. 5) Use bytes.Buffer to build data. 6) Combined bytes.Map for error processing and data verification.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

SublimeText3 Linux new version
SublimeText3 Linux latest version

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 English version
Recommended: Win version, supports code prompts!
