A short diatribe about floating point numbers | RF Circuits

Sunday, October 23, 2011

A short diatribe about floating point numbers

Floats. They're great! You can store really large or really small numbers in them with ease (provided they're supported by your compiler). Most processors even have built in instructions for playing with them. However, they are an imprecise number, which means that whatever value you try to put in may not be the value you get out.

Take for example the counter system I installed in a factory - the processor we used could only support 16-bit integers, or 32-bit floats. I thought, "Great! 32-bit floats go all the way up to ~4bn, we'll never have bigger counts than that". And we wont. The problem is, just because they can count to 4bn, doesn't mean that they can count ever number in between.

I knew about this problem, but I thought it only applied to small fractions after the decimal point. Unfortunately (as I discovered today with 100+ counters), there are also whole numbers it cannot display.

The first one of these (on a 32-bit float) is 16,777,217. So whenever you try to increment the counter after 16,777,216, the processor adds 1 to the number, then stores it back in the float as the closest representable number (16,777,216). Do you see the problem?

There's a very good reason for this, which I really don't want to have to explain because I hate working with the binary representations of floats. If you have to know, try the Wikipedia article on IEEE 754-2008.

So, long story short: do not use floats for counters!

No comments:

Post a Comment