Spot the Bug: Golang Hashes
Suppose we want to compute the md5, sha1, and sha256 sums for some string “Foo Bar”. What’s wrong with this routine?
Problem
Golang gives us an interface type hash.Hash
to be:
the common interface implemented by all hash functions.
For example, the packages
crypto/md5
,
crypto/sha1
, and
crypto/sha256
each define an implementation of that interface.
In the following example, we invoke the Sum
method on that interface to hash a string, “Foo Bar”, three ways. But something is wrong.
package main
import (
"crypto/md5"
"crypto/sha1"
"crypto/sha256"
"fmt"
)
func main() {
// convert the demo input to a byte slice
input := []byte("Foo Bar")
// calculate the hashes
a := md5.New().Sum(input)
b := sha1.New().Sum(input)
c := sha256.New().Sum(input)
// print the hex-encoded hashes
fmt.Printf("%x\n", a)
fmt.Printf("%x\n", b)
fmt.Printf("%x\n", c)
}
Here is the output:
↳ go run main.go
466f6f20426172d41d8cd98f00b204e9800998ecf8427e
466f6f20426172da39a3ee5e6b4b0d3255bfef95601890afd80709
466f6f20426172e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Hint
Notice the prefix 466f6f20426172
in each of the hash outputs. You might be surprised to find that the three hashes all coincidentally prepend the same byte sequence. That’s because this is no coincidence at all!
If we decode the hex, we see that it actually holds the hash input “Foo Bar”! That can’t be right.
↳ echo 466f6f20426172 | xxd -r -p | xxd
00000000: 466f 6f20 4261 72 Foo Bar
Answer
Let’s consult the docs for the hash.Hash
interface:
type Hash interface {
// Write (via the embedded io.Writer interface) adds more data to the running hash.
// It never returns an error.
io.Writer
// Sum appends the current hash to b and returns the resulting slice.
// It does not change the underlying hash state.
Sum(b []byte) []byte
// Reset resets the Hash to its initial state.
Reset()
// Size returns the number of bytes Sum will return.
Size() int
// BlockSize returns the hash's underlying block size.
// The Write method must be able to accept any amount
// of data, but it may operate more efficiently if all writes
// are a multiple of the block size.
BlockSize() int
}
We learn that the receiver method Sum
does not do what one might intuit (or, at least I was surprised). The Sum
method just concatenates its argument with a running hash of its internal buffer contents.
So, the prefix is the hex-encoded “Foo Bar” input. The tail comes from the initial state of the running hash, i.e. the hash of an empty byte sequence.
Prefix | Blank Hash | |
---|---|---|
md5 | 466f6f20426172 | d41d8cd98f00b204e9800998ecf8427e |
sha1 | 466f6f20426172 | da39a3ee5e6b4b0d3255bfef95601890afd80709 |
sha256 | 466f6f20426172 | e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 |
How do we fix the example?
Correct Usage
We can use the embedded io.Writer
methods to push the input string onto the hash’s internal buffer, and set the prefix to nil
. Alternatively, just use one of the various convenience methods provided in the packages.
package main
import (
"crypto/md5"
"crypto/sha1"
"crypto/sha256"
"fmt"
)
func main() {
// convert the demo input to a byte slice
input := []byte("Foo Bar")
// calculate the hashes
ha := md5.New()
ha.Write(input)
a := ha.Sum(nil)
hb := sha1.New()
hb.Write(input)
b := hb.Sum(nil)
hc := sha256.New()
hc.Write(input)
c := hc.Sum(nil)
// alternate: calculate the hashes
alta := md5.Sum(input)
altb := sha1.Sum(input)
altc := sha256.Sum256(input)
// print the hex-encoded hashes
fmt.Printf("%x\n%x\n", a, alta)
fmt.Printf("%x\n%x\n", b, altb)
fmt.Printf("%x\n%x\n", c, altc)
}
Conclusion
If you’re a meticulous, hawk-eyed reader of the docs, then you’ll likely have found this exercise pretty silly. But I sometimes tend to skim the details, especially when the function names look so very self-explanatory and simple. For example, Sum([]byte) []byte
on a hash interface.
In dev work as in life, humbling mistakes like these can be a helpful reminder for me to be careful and check my assumptions more thoroughly, especially when working with something new.