it’s my code: code A:
func RandBytes(r *rand.Rand, b []byte) { for i := 0; i < len(b); i += 4 { int31 := r.Int31() for j := 0; j < 4; j++ { if i+j < len(b) { b[i+j] = letters[(int31&0b11111111)%lettersLen] int31 = int31 >> 8 } } } }
code B:
func RandBytes(r *rand.Rand, b []byte) { for i := 0; i < len(b); i += 4 { int31 := r.Int31() b[i] = letters[(int31&0b11111111)%lettersLen] int31 = int31 >> 8 if i+1 < len(b) { b[i+1] = letters[(int31&0b11111111)%lettersLen] int31 = int31 >> 8 } if i+2 < len(b) { b[i+2] = letters[(int31&0b11111111)%lettersLen] int31 = int31 >> 8 } if i+3 < len(b) { b[i+3] = letters[(int31&0b11111111)%lettersLen] int31 = int31 >> 8 } } }
And some benchmark test code:
func BenchmarkRandBytes(b *testing.B) { r := rand.New(rand.NewSource(time.Now().UnixNano())) buf := make([]byte, 100) for i := 0; i < b.N; i++ { RandBytes(r, buf) } }
They seem to be the same. but code A:
goos: windows goarch: amd64 pkg: github.com/pingcap/go-ycsb/pkg/util cpu: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz BenchmarkRandBytes BenchmarkRandBytes-12 3272442 377.9 ns/op PASS
code B
goos: windows goarch: amd64 pkg: github.com/pingcap/go-ycsb/pkg/util cpu: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz BenchmarkRandBytes BenchmarkRandBytes-12 4012189 295.9 ns/op PASS
Performance is off by a quarter. why?
They should be the same. my go version is go version go1.21.1 windows/amd64
The performance difference between the two versions of RandBytes you provided could be attributed to how the Go compiler optimizes the code. In Go, performance can be influenced by various factors, including loop unrolling, inlining, and other compiler optimizations.
RandBytes
In your second version (code B), you have eliminated the inner loop by manually unrolling it. This can lead to better performance in some cases, as it reduces the number of loop control instructions and allows the compiler to optimize more aggressively.
The performance difference you observed might be due to the specific optimizations applied by the compiler in each case. It’s essential to note that performance can vary across different compiler versions and target architectures.
To investigate further, you may consider looking at the assembly output generated by the compiler for each version. You can use the following command to generate the assembly output:
go build -gcflags="-S" yourfile.go
This will print the assembly code to the console, and you can compare the generated assembly for both versions of your RandBytes function. This might give you insights into the specific optimizations that are affecting the performance in each case.
Keep in mind that microbenchmarks like this can be sensitive to various factors, and small variations in the results may not always reflect significant real-world performance differences.