I'm glad Go executable files are large

January 1, 2020    go dlang

TL,DR: Well, I’m not exactly glad Go binaries are so large, but if you haven’t read that article, it basically explains Go binaries are large because this is needed to provide users some awesome stack traces for crashes and runtime support. For those who don’t appreciate what big of a deal this is I’m going to be comparing Go approach to stack-traces with another system programming language just to explain why, most of the time, the Go leadership makes the right decisions.

We probably give this for granted but Go, being a system programming language largely succeeding at replacing previous use cases of C and C++, has some pretty kick-ass stack-traces and we get fat binaries because of it, I also don’t think is a bad trade-off.

If you’re not familiar with Go stack-traces, here’s a Go program that intentionally crashes at runtime:

package main

import "fmt"

type Greeting struct {
	msg string
}

func (g *Greeting) Hello() {
	fmt.Println(g.msg)
}

func main() {
	g := &Greeting{"hello"}
	g.Hello()
	g = nil // oops
	g.Hello()
}

Here’s what gets printed to stderr when the program runs:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1092fc5]

goroutine 1 [running]:
main.(*Greeting).Hello(...)
        /Users/bithavoc/.../buggy.go:10
main.main()
        /Users/bithavoc/.../buggy.go:17 +0x95

If this executable crashed in production we just hit our cloud tooling to check logs, look for stack-traces and find the place where the error happened so we can fix it quickly. Thanks to stack-traces it’s pretty obvious that the error is around line 10 of the file buggy.go.

Compare Go and its stack-traces to another system programming language: D.

D has been trying to be a Better C++ for 20 years and recently introduce its explicit “better C” mode, here’s the equivalent Go program in D:

import std.stdio;

class Greeting {
  void hello() {
    writeln("hello");
  }
}

void main() {
  auto g = new Greeting();
  g.hello();
  g = null; //oops
  g.hello();
}

Here’s what you get when it crashes:

➜  buggy git:(master) ✗ dmd buggy.d
➜  buggy git:(master) ✗ ls
buggy   buggy.d buggy.o
➜  buggy git:(master) ✗ ./buggy
hello
[1]    2236 segmentation fault  ./buggy

A segmentation fault is what you get, just like in C and C++.

Ok maybe is because I’m running this on Mac, the D playground(run.dlang.io) should be running something else like Linux, let’s see what it does over there:

buggy hello program running on run.dlang.io

Shit, it seems like it’s a different problem: program killed by signal 11. Not really, signal 11 is still segmentation fault. I tried all the compilers available, no luck. Feel free to try yourself if you don’t believe me.

No, it is not a bug, this is by design. When you look at D’s glorious page on Interfacing with C++ it states:

D takes a pragmatic approach that assumes a couple modest accommodations can solve a significant chunk of the problem:

  • matching C++ name mangling conventions
  • matching C++ function calling conventions
  • matching C++ virtual function table layout for single inheritance

Still, even when D doesn’t reinvent the wheel when it comes to mangling conventions there should be something in the standard library that helps me print a decent stack trace when the OS sends that signal 11 signal, right?. Well, there is according to the Vibe.d Web framework:

import etc.linux.memoryerror;
static if (is(typeof(registerMemoryErrorHandler)))
	registerMemoryErrorHandler();

It doesn’t work:

buggy hello program running on run.dlang.io with register memory error handler

Let’s try in a Docker, I read the source code of registerMemoryErrorHandler and there seems to be some compiler condition around Linux and libc. Based on how the docker images are named I assume dlang2/dmd-alpine uses musl and I should be using dlang2/dmd-ubuntu for libc:

➜  buggy git:(master) ✗ docker build -t bithavoc/dhelloworld .
Sending build context to Docker daemon  1.017MB
Step 1/4 : FROM dlang2/dmd-ubuntu
 ---> a846a21654dc
Step 2/4 : ADD . .
 ---> Using cache
 ---> caf46a2f1a4e
Step 3/4 : RUN dmd -g buggy.d
 ---> Running in e347f96737be
buggy.d(4): Error: function declaration without return type. (Note that constructors are always named `this`)
buggy.d(4): Error: no identifier for declarator `registerMemoryErrorHandler()`
The command '/bin/sh -c dmd -g buggy.d' returned a non-zero code: 1

It doesn’t work.

I gave up, again. I tried using D a while back and the lack of stack traces was a deal breaker for me. Back in 2015 I even asked in a HN comment hoping that someone more knowledgable on D would give me an insight on how to get stack traces, but you can read the thread, even Walter Wright(back when he was still leading the language) replied pointing out the issue in my program but never told me how to solve the stack traces.

Other users suggested I use vagrant, addr2line and some other demangle sorcery. I managed to get registerMemoryErrorHandler on an Ubuntu VM but never on docker and even after 5 years, still doesn’t work in Docker. If there is a solution out there, it’s not very obvious. If there isn’t, then it probably means no one in the D leadership cares about it.

A recent post on D’s vision of the future from the new leader states that String interpolation is on the list of priorities and not a freaking readable stack trace which says a lot about D, its design decisions and priorities.

Conclusion

You may complain that Go binaries are fat, but there is a well justified reason for it:

prior to 1.2, the Go linker was emitting a compressed line table, and the program would decompress it upon initialization at run-time. in Go 1.2, a decision was made to pre-expand the line table in the executable file into its final format suitable for direct use at run-time, without an additional decompression step.

As HN user nickw pointed out:

This is a good choice I think and the author of the article missed the most important point - it uses less memory to have an uncompressed table. This sounds paradoxical but if a table has to be expanded at runtime then it has to be loaded into memory.

So the initial stack traces were working, but they doubled down on saving RAM over disk space. You would have to care to do something like that.

I know D doesn’t have all the millions of dollars Google pours into Go every year but other system programming languages have decent stack traces too, languages like Zig whose main developer lives from donations starting 5 USD /mo, has very accurate stack traces:

➜  zig ./zig build-exe hello.zig && ./hello
attempt to unwrap null
...hello.zig:11:35: 0x1003d2876 in _greeting (hello.o)
    const y = obviouslyNullPointer.?.*;
                                  ^
...hello.zig:5:13: 0x1003d2639 in _main.0 (hello.o)
    greeting();
            ^
...lib/zig/std/special/start.zig:204:37: 0x1003d24c3 in _main (hello.o)
            const result = root.main() catch |err| {
                                    ^
???:?:?: 0x7fff67d1a404 in ??? (???)

Don’t get me wrong, I like D and I want it to succeed, I just wish the D team had some of the culture and pragmatism when making design desicisions that is making Go succeed.

D has the most beautiful generics and compile-time templating I ever seen, but it doesn’t have stack traces so it’s hard for me to use in production. Go neither has generics nor compile-time templating like D, but in reality I don’t need it either, I’ve learned that after years of pushing Go code to production because I’m confident that if my Go program crashes in production I’ll get decent stack traces to work with.

This is one more reason I keep using Go, I trust the Go team will keep making the right decisions for me.