140x Faster String to Byte and Byte to String Conversions with Zero Allocation in Go

Jose Sitanggang
4 min readOct 21, 2023
Source: https://www.cnblogs.com/kkbill/p/13069596.html

Converting a string to bytes and bytes to a string requires allocating new memory. However, strings and bytes (which refer to a slice of bytes) have a similar memory representation. The only difference is that a slice has the capacity to grow as needed, whereas a string is immutable and doesn’t need capacity.

Strings and slices are built-in types in Go. Unfortunately, we cannot directly view their definitions. However, thanks to the Go documentation, we can learn that both strings and slices are defined using structs. Strings are defined using StringHeader, and slices are defined using SliceHeader. Let’s include those definitions here to make it easier to follow:

type SliceHeader struct {
Data uintptr
Len int
Cap int
}

type StringHeader struct {
Data uintptr
Len int
}

The Data field represents the memory address of the first item in the backing array, which is where the data is stored. The backing array has a fixed size since it is allocated, which is why a slice has a capacity ( Cap) to allow it to grow when more space is needed to store new data. If you are interested in learning how a slice grows, please refer to my article titled "Exploring Go Slice Internal Implementation in C++."

To convert a slice of bytes to a string, we simply need to remove the Cap field and move the data pointer into the Data field in StringHeader. How can we do that? Go doesn't allow us to manage memory manually as we do in C and C++. However, there is a package called unsafe. This is a special package in Go that allows us to manage memory manually, but it's essential to remember that, in most cases, we should avoid doing this unless you require very high performance and are aware of the risks like Use After Free.

// BytesToString converts bytes to a string without memory allocation.
// NOTE: The given bytes MUST NOT be modified since they share the same backing array
// with the returned string.
func BytesToString(b []byte) string {
// Obtain SliceHeader from []byte.
sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&b))

// Construct StringHeader from SliceHeader.
stringHeader := reflect.StringHeader{Data: sliceHeader.Data, Len: sliceHeader.Len}

// Convert StringHeader to a string.
s := *(*string)(unsafe.Pointer(&stringHeader))
return s
}

Since both the SliceHeader and StringHeader are now deprecated, we can use a simpler version as suggested by the Go documentation:

func BytesToString(b []byte) string {
// Ignore if your IDE shows an error here; it's a false positive.
p := unsafe.SliceData(b)
return unsafe.String(p, len(b))
}

We can apply the same concept to convert a string into a slice of bytes by specifying the Cap field in the SliceHeader. Please note that we must set the capacity to be equal to the length of the string to prevent buffer overflow when the slice grows after the conversion. This is necessary because there is a possibility that the next address of the backing array is already occupied by another process due to the characteristics of contiguous memory allocation. Let's take a look at the code below:

// StringToBytes converts a string to a byte slice without memory allocation.
// NOTE: The returned byte slice MUST NOT be modified since it shares the same backing array
// with the given string.
func StringToBytes(s string) []byte {
// Get StringHeader from string
stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))

// Construct SliceHeader with capacity equal to the length
sliceHeader := reflect.SliceHeader{Data: stringHeader.Data, Len: stringHeader.Len, Cap: stringHeader.Len}

// Convert SliceHeader to a byte slice
return *(*[]byte)(unsafe.Pointer(&sliceHeader))
}

or in a simpler version:

func StringToBytes(s string) []byte {
p := unsafe.StringData(s)
b := unsafe.Slice(p, len(s))
return b
}

To demonstrate that there is no allocation, let’s do some benchmarks.

  • The BenchmarkStringToBytesStandard benchmark involves conversion using []byte("string").
  • The BenchmarkBytesToStringStandard benchmark involves conversion using string([]byte{'b', 'y', 't', 'e'}).
  • The BenchmarkStringToBytes and BenchmarkStringToBytes benchmarks use the unsafe conversion method.
[I] ⋊> ~/G/zerocast on main ⨯ go test -run=xxx -bench=.  ./...                                                                                                               21:18:22
goos: darwin
goarch: arm64
pkg: github.com/josestg/zerocast
BenchmarkStringToBytesStandard-10 7207893 151.1 ns/op 1792 B/op 1 allocs/op
BenchmarkBytesToStringStandard-10 8589217 143.8 ns/op 1792 B/op 1 allocs/op
BenchmarkStringToBytes-10 1000000000 0.3904 ns/op 0 B/op 0 allocs/op
BenchmarkBytesToString-10 1000000000 0.3912 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/josestg/zerocast 3.830s

benchmark code: github.com/josestg/zerocast

As we can see, both BenchmarkStringToBytes and BenchmarkBytesToString show no allocation and the ns/op has also improved to be approximately 140 times faster.

You can find the code in this GitHub repository. If you have any questions, please feel free to ask in the comment section below. Thank you!

EDIT
After doing some research on the Go source code and a few open-source projects, I found that this technique was also used. Let me share my results with you.

1. strings.Builder
2. github.com/google/gvisor
3. Kubernetes
4. CockroachDB
5. Ethereum

Thank you for reading and clapping. I truly appreciate it.

Originally published at https://www.josestg.com on October 21, 2023.

--

--