Direct Observation with Go Tooling
Today I investigated a hunch using some nice tooling built into the Go compiler.
At work I’m building a tool that will generate nginx config to act as a dispatcher for all of our various software that the external world interacts with. A bunch of this work has been done, with other goals, in ingress-nginx, so I’ve been using bits and pieces of the code from that project as a jumping off point.
Today I found this section:
func NewBufferPool(s int) *BufferPool {
return &BufferPool{
Pool: sync.Pool{
New: func() interface{} {
b := bytes.NewBuffer(make([]byte, s)) // create *bytes.Buffer
b.Reset() // reset it
return b
},
},
}
}
The comments are mine, and are the relevant lines for this blog post. I had a hunch that the above could be written both more neatly and actually perform better as:
b := bytes.NewBuffer(make([]byte, 0, s))
That’s making a zero length byte slice, but with a capacity of s. The Go compiler has surprised me before but optimizing away silly stuff (like setting fields in maps to their zero value, like someone would do in Python) so I figured I’d check to be sure.
First I created a little test program:
package main
import (
"bytes"
"fmt"
)
func main() {
s := 1024
p := bytes.NewBuffer(make([]byte, 0, s))
p.Reset()
fmt.Printf("%#v\n", p)
}
Running the above produces &bytes.Buffer{buf:[]uint8{}, off:0, lastRead:0}
,
but that’s not actually what I am interested in. Go ships with a tool to
display the actual assembly (or some vague layer atop assembly) of the built
code. Using go tool objdump -S -s main.main ./binary
we can get the full
assembly of a given function. Here’s a subset of the output from that:
p := bytes.NewBuffer(make([]byte, 0, s))
0x48ec81 488d0518170100 LEAQ 0x11718(IP), AX
0x48ec88 48890424 MOVQ AX, 0(SP)
0x48ec8c 48c744240800000000 MOVQ $0x0, 0x8(SP)
0x48ec95 48c744241000040000 MOVQ $0x400, 0x10(SP)
0x48ec9e e8fddefaff CALL runtime.makeslice(SB)
0x48eca3 488b442418 MOVQ 0x18(SP), AX
0x48eca8 4889442450 MOVQ AX, 0x50(SP)
func NewBuffer(buf []byte) *Buffer { return &Buffer{buf: buf} }
0x48ecad 488d0dcc1c0200 LEAQ 0x21ccc(IP), CX
0x48ecb4 48890c24 MOVQ CX, 0(SP)
0x48ecb8 e8a3caf7ff CALL runtime.newobject(SB)
0x48ecbd 488b7c2408 MOVQ 0x8(SP), DI
0x48ecc2 48c7471000040000 MOVQ $0x400, 0x10(DI)
0x48ecca 833d2fdf0e0000 CMPL $0x0, runtime.writeBarrier(SB)
0x48ecd1 0f858d000000 JNE 0x48ed64
0x48ecd7 488b442450 MOVQ 0x50(SP), AX
0x48ecdc 488907 MOVQ AX, 0(DI)
p.Reset()
0x48ecdf 90 NOPL
b.buf = b.buf[:0]
0x48ece0 48c7470800000000 MOVQ $0x0, 0x8(DI)
b.off = 0
0x48ece8 48c7471800000000 MOVQ $0x0, 0x18(DI)
b.lastRead = opInvalid
0x48ecf0 c6472000 MOVB $0x0, 0x20(DI)
fmt.Printf("%#v\n", p)
My plan was to diff the old and the new, but with all the offsets in place I new that wouldn’t work, so I made a tool to filter the above output to be at least slightly more stable:
#!/bin/sh
go tool objdump -S -s main.main $1 | perl -p -e "s/^\s+0x[0-9a-f]{6}\t+[0-9a-f]+\t+/\t\t/"
Using that my output becomes:
func NewBuffer(buf []byte) *Buffer { return &Buffer{buf: buf} }
LEAQ 0x21ccc(IP), CX
MOVQ CX, 0(SP)
CALL runtime.newobject(SB)
MOVQ 0x8(SP), DI
MOVQ $0x400, 0x10(DI)
CMPL $0x0, runtime.writeBarrier(SB)
JNE 0x48ed64
MOVQ 0x50(SP), AX
MOVQ AX, 0(DI)
p.Reset()
NOPL
b.buf = b.buf[:0]
MOVQ $0x0, 0x8(DI)
b.off = 0
MOVQ $0x0, 0x18(DI)
b.lastRead = opInvalid
MOVB $0x0, 0x20(DI)
fmt.Printf("%#v\n", p)
I built the old binary and named it reset
with go build -o reset
, created
the new version (code below) and named it noreset
with go build -o noreset
.
package main
import (
"bytes"
"fmt"
)
func main() {
s := 1024
p := bytes.NewBuffer(make([]byte, 0, s))
fmt.Printf("%#v\n", p)
}
Finally, I diff’d the two, to see if indeed my version would be different (and
hopefully skip unneeded steps) by running
diff -U5 <(simpledump reset) <(simpledump noreset)
. Here’s the relevant
section of the diff:
func NewBuffer(buf []byte) *Buffer { return &Buffer{buf: buf} }
LEAQ 0x21ccc(IP), CX
MOVQ CX, 0(SP)
CALL runtime.newobject(SB)
MOVQ 0x8(SP), DI
- MOVQ $0x400, 0x8(DI)
MOVQ $0x400, 0x10(DI)
CMPL $0x0, runtime.writeBarrier(SB)
- JNE 0x48ed6c
+ JNE 0x48ed4b
MOVQ 0x50(SP), AX
MOVQ AX, 0(DI)
- fmt.Printf("%#v\n", p)
- NOPL
- b.buf = b.buf[:0]
- MOVQ $0x0, 0x8(DI)
- b.off = 0
- MOVQ $0x0, 0x18(DI)
- b.lastRead = opInvalid
- MOVB $0x0, 0x20(DI)
}
XORPS X0, X0
MOVUPS X0, 0x58(SP)
- LEAQ 0x2dd55(IP), AX
+ LEAQ 0x2dd76(IP), AX
MOVQ AX, 0x58(SP)
MOVQ DI, 0x60(SP)
return Fprintf(os.Stdout, format, a...)
As I’d expected, the new version is actually simpler, but barely. My hunch is that this would not actually affect performance unless something else is wrong, but the code is neater, works, and is slightly simpler. Cool.
Side note: I’m not totally sure why the fmt.Printf
call evaporates. My only
guess is that stuff gets short enough to be inlined, but I really don’t know.
If you are interested in learning Go, this is my recommendation:
(The following includes affiliate links.)
If you don’t already know Go, you should definitely check out The Go Programming Language. It’s not just a great Go book but a great programming book in general with a generous dollop of concurrency.
Another book to consider learning Go with is Go Programming Blueprints. It has a nearly interactive style where you write code, see it get syntax errors (or whatever,) fix it, and iterate. A useful book that shows that you don’t have to get all of your programs perfectly working on the first compile.
Posted Thu, Oct 10, 2019If you're interested in being notified when new posts are published, you can subscribe here; you'll get an email once a week at the most.