Home > Article > Backend Development > How much faster can Go's new function calling convention be?
#This article takes a look at how much benefits changes in function calling conventions can bring to Go.
InGo Function Calling Conventions In the article (it is recommended that readers who are not familiar with this section read this article first), we discussed the function calling conventions of the Go language.
The so-called function calling convention refers to a certain agreement that the function caller and the callee must abide by, mainly including the method of passing in and out parameters of the function, the order of passing them, etc.
Parameter passing methods are generally divided into two situations: register passing and stack passing.
Before Go 1.17, the Go language used stack transfer to avoid differences between different CPU registers. The biggest advantage of this approach is that it is simple to implement and makes the compiler easy to maintain. But the disadvantages are also obvious: some performance will be sacrificed. Because the speed of CPU access registers will be much higher than that of memory.
基于性能考虑,寄存器的调用惯例,是大多数语言采纳的方式。Go 也准备做点改变,在 1.17 版本中,对于 linux/amd64, darwin/amd64, windows/amd64 系统,首先实现了新的基于寄存器的调用惯例。
package main //go:noinline func add(i, j int) int { return i + j } func main() { add(100, 200) }
我们在 darwin/amd64 系统上,分别使用 Go 1.17 和 Go 1.16 的代码进行编译,得到它们的汇编语句分别如下。
Go 1.17 汇编语句
$ go version go version go1.17 darwin/amd64 $ go tool compile -S main.go "".add STEXT nosplit size=4 args=0x10 locals=0x0 funcid=0x0 0x0000 00000 (main.go:4) TEXT "".add(SB), NOSPLIT|ABIInternal, $0-16 0x0000 00000 (main.go:4) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (main.go:4) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (main.go:4) FUNCDATA $5, "".add.arginfo1(SB) 0x0000 00000 (main.go:5) ADDQ BX, AX 0x0003 00003 (main.go:5) RET 0x0000 48 01 d8 c3 H... "".main STEXT size=54 args=0x0 locals=0x18 funcid=0x0 0x0000 00000 (main.go:8) TEXT "".main(SB), ABIInternal, $24-0 0x0000 00000 (main.go:8) CMPQ SP, 16(R14) 0x0004 00004 (main.go:8) PCDATA $0, $-2 0x0004 00004 (main.go:8) JLS 47 0x0006 00006 (main.go:8) PCDATA $0, $-1 0x0006 00006 (main.go:8) SUBQ $24, SP 0x000a 00010 (main.go:8) MOVQ BP, 16(SP) 0x000f 00015 (main.go:8) LEAQ 16(SP), BP 0x0014 00020 (main.go:8) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0014 00020 (main.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0014 00020 (main.go:9) MOVL $100, AX 0x0019 00025 (main.go:9) MOVL $200, BX 0x001e 00030 (main.go:9) PCDATA $1, $0 0x001e 00030 (main.go:9) NOP 0x0020 00032 (main.go:9) CALL "".add(SB) 0x0025 00037 (main.go:10) MOVQ 16(SP), BP 0x002a 00042 (main.go:10) ADDQ $24, SP 0x002e 00046 (main.go:10) RET 0x002f 00047 (main.go:10) NOP 0x002f 00047 (main.go:8) PCDATA $1, $-1 0x002f 00047 (main.go:8) PCDATA $0, $-2 0x002f 00047 (main.go:8) CALL runtime.morestack_noctxt(SB) 0x0034 00052 (main.go:8) PCDATA $0, $-1 0x0034 00052 (main.go:8) JMP 0 ...
Go 1.16 汇编语句
$ go1.16.4 version go version go1.16.4 darwin/amd64 $ go1.16.4 tool compile -S main.go "".add STEXT nosplit size=19 args=0x18 locals=0x0 funcid=0x0 0x0000 00000 (main.go:4) TEXT "".add(SB), NOSPLIT|ABIInternal, $0-24 0x0000 00000 (main.go:4) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (main.go:4) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (main.go:5) MOVQ "".j+16(SP), AX 0x0005 00005 (main.go:5) MOVQ "".i+8(SP), CX 0x000a 00010 (main.go:5) ADDQ CX, AX 0x000d 00013 (main.go:5) MOVQ AX, "".~r2+24(SP) 0x0012 00018 (main.go:5) RET 0x0000 48 8b 44 24 10 48 8b 4c 24 08 48 01 c8 48 89 44 H.D$.H.L$.H..H.D 0x0010 24 18 c3 $.. "".main STEXT size=71 args=0x0 locals=0x20 funcid=0x0 0x0000 00000 (main.go:8) TEXT "".main(SB), ABIInternal, $32-0 0x0000 00000 (main.go:8) MOVQ (TLS), CX 0x0009 00009 (main.go:8) CMPQ SP, 16(CX) 0x000d 00013 (main.go:8) PCDATA $0, $-2 0x000d 00013 (main.go:8) JLS 64 0x000f 00015 (main.go:8) PCDATA $0, $-1 0x000f 00015 (main.go:8) SUBQ $32, SP 0x0013 00019 (main.go:8) MOVQ BP, 24(SP) 0x0018 00024 (main.go:8) LEAQ 24(SP), BP 0x001d 00029 (main.go:8) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x001d 00029 (main.go:8) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x001d 00029 (main.go:9) MOVQ $100, (SP) 0x0025 00037 (main.go:9) MOVQ $200, 8(SP) 0x002e 00046 (main.go:9) PCDATA $1, $0 0x002e 00046 (main.go:9) CALL "".add(SB) 0x0033 00051 (main.go:10) MOVQ 24(SP), BP 0x0038 00056 (main.go:10) ADDQ $32, SP 0x003c 00060 (main.go:10) RET 0x003d 00061 (main.go:10) NOP 0x003d 00061 (main.go:8) PCDATA $1, $-1 0x003d 00061 (main.go:8) PCDATA $0, $-2 0x003d 00061 (main.go:8) NOP 0x0040 00064 (main.go:8) CALL runtime.morestack_noctxt(SB) 0x0045 00069 (main.go:8) PCDATA $0, $-1 0x0045 00069 (main.go:8) JMP 0
看到这么多汇编代码,不要紧张。这里我们需要留意的就以下这么几行
// Go 1.17 汇编参数调用代码 "".add STEXT nosplit size=4 args=0x10 locals=0x0 funcid=0x0 ... 0x0000 00000 (main.go:5) ADDQ BX, AX ... "".main STEXT size=54 args=0x0 locals=0x18 funcid=0x0 ... 0x0014 00020 (main.go:9) MOVL $100, AX 0x0019 00025 (main.go:9) MOVL $200, BX 0x001e 00030 (main.go:9) PCDATA $1, $0 0x001e 00030 (main.go:9) NOP 0x0020 00032 (main.go:9) CALL "".add(SB) ... // Go 1.16 汇编参数调用代码 "".add STEXT nosplit size=19 args=0x18 locals=0x0 funcid=0x0 ... 0x0000 00000 (main.go:5) MOVQ "".j+16(SP), AX 0x0005 00005 (main.go:5) MOVQ "".i+8(SP), CX 0x000a 00010 (main.go:5) ADDQ CX, AX 0x000d 00013 (main.go:5) MOVQ AX, "".~r2+24(SP) ... "".main STEXT size=71 args=0x0 locals=0x20 funcid=0x0 ... 0x001d 00029 (main.go:9) MOVQ $100, (SP) 0x0025 00037 (main.go:9) MOVQ $200, 8(SP) 0x002e 00046 (main.go:9) PCDATA $1, $0 0x002e 00046 (main.go:9) CALL "".add(SB) ...
看出差异了吗?
在 Go 1.17 的汇编代码中,参数值 100 和 200 直接基于寄存器 AX 和 BX 来操作。而 Go 1.16 中,参数值是通过指向栈顶的栈指针寄存器SP的偏移量来表示和传递的。
在 Go 1.17 的release notes中,编译器的此项改变会让 Go 程序运行性能和二进制大小两个方面得到优化,
首先,我们比较编译后的二进制大小。
$ go build -o main1.17 main.go $ go1.16.4 build -o main1.16 main.go $ ls -al main1.17 main1.16 -rwxr-xr-x 1 slp staff 1200640 Dec 26 21:09 main1.16 -rwxr-xr-x 1 slp staff 1142208 Dec 26 21:09 main1.17
可以看出,Go 1.17 基于寄存器传递的函数调用惯例编译出的二进制,相较于 Go 1.16 基于栈传递的减少 4.8% 的大小。
通过 benchmark 比较程序执行效率
// Go 1.17 $ go test -bench=. goos: darwin goarch: amd64 pkg: workspace/add cpu: Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz BenchmarkIt-8 918887481 1.257 ns/op PASS ok workspace/add 1.299s // Go 1.16 $ go1.16.4 test -bench=. goos: darwin goarch: amd64 pkg: workspace/add cpu: Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz BenchmarkIt-8 801041754 1.469 ns/op PASS ok workspace/add 1.336s
从 1.469 ns/op 提升至 1.257 ns/op,大约提升了 14%。
我们常谈论到,Go 是在不断优化迭代的,我们值得期待与建设更好的 Go 语言。
In order to reduce the performance loss based on stack transfer, starting from Go 1.17, compilation changes based on register transfer are introduced, which currently only supports the amd64 platform. But in Go 1.18, support for arm64, ppc64, and ppc64le platforms will be expanded.
As mentioned in Go's release notes, the new function calling convention will bring improvements in two aspects: the compiled binary size will be smaller and execution efficiency will be improved. At the same time, to maintain compatibility with existing assembly functions, the compiler generates adapter functions that convert between the old and new calling conventions.
The above is the detailed content of How much faster can Go's new function calling convention be?. For more information, please follow other related articles on the PHP Chinese website!