+
AMDG
Goodman Coat of Arms

Goodman's Oak

Miscellaneous Thoughts, Projects, and Ruminations on Life, the Universe, and Everything

The Importance of Undefined Behavior

Donald P. Goodman III 26 Dec 1200 (30 Dec 2016)

C has often been called a language with multiple foot-guns; that is, a language that makes it easy to shoot yourself in the foot. One of those foot-guns is undefined behavior.

When C was being designed, the idea was that it should be easy to port to multiple architectures, which meant that the standard couldn't be too exacting. Obviously, it's pretty exacting; it's a programming language. But it's not too exacting. That makes it much easier to write conforming implementations, and very effectively so; writing and maintaining a fully-functioning C compiler is a feasible goal even for a single programmer, while for most languages the very notion is risible.

However, that does mean that the programmer needs to be aware that the compiler might do something unexpected in certain cases, because the behavior in thse cases is undefined. Keep an eye out for these!

One such undefined issue I just encountered, while revamping some of the build processes for the dozenal suite, specifically that for the dozdc calculator. One of the many functions that dozdc provides is exponeniation; that is, raising one number to the power of another. This is accomplished with the ^, by doing things like the following:

8 2 ^ =

Because dozdc works on a stack, this raises 8 to the power of 2, which gives us 54 (in decimal, 64). When I compiled the program with gcc, as I have since I wrote it many years ago, this function worked perfectly fine. But then, when I compiled it with tcc recently, I got something different: 194 (decimal 256). What gives?

Clearly, what's happening is that rather than returning 8 to the power of 2, it's returning 2 to the power of 8; that is, the wrong way round. What's wrong? Does the tcc compiler somehow put together my stack routines differently? No; all other operations worked fine. It was just this function. So again: what gives?

It's actually rather simple; the ^ function is implemented internally the math.h C library, which contains a function called pow(x,y), which raises x to the power of y. dozdc then pops two numbers from the stack to insert into this pow() function, then pushes the result back onto the stack, pretty directly:

push(pow(pop(),pop()));

Old C hands have undoubtedly seen the problem already: the order of evaluation of function parameters is undefined. In other words, C does not require that the one pop() be done before the other, so different compilers might do them in different orders. That's exactly what happened here.

To use our earlier example, 8 2 ^ =, gcc did just what I expected it to do, and popped the two in the order that produced the result I expected (though, as it turns out, that's actually rightmost first); tcc, on the other hand, popped them the other way around. Neither compiler is wrong; it's undefined behavior, and which they do depends on their internal implementation. But if I want all compilers to handle this the same way, I need to tell it which way I want.

So I made a minor variation on the above:

tmp = pop();
push(pow(pop(),tmp));

Now, the compiler knows that it should pop one value first; then pop another and raise it to the power of the first, and push that result. It will always happen in the right order, regardless of the compiler.

So know your undefined behavior! And make sure your regression tests are thorough; I never would have caught this until later if I didn't have a script to notice it.

Praise be to Christ the King!