Saturday, May 24, 2008

A day in the life

You know what the difference is between a professional blogger and amateurs like us? They write about the community and we are the community. We can write about things they will never be able to cover properly: our own experiences. A view from the inside. Usually, it doesn't take too much effort to write a blog entry like this, because I love writing about what I do.

I'm the proud maintainer of the 4tH compiler, which I designed almost fifteen years ago. I became part of the community when I decided to release it as Open Source. At that moment I realized that the Internet had changed the world of software development and shareware was simply 'not done' anymore. A few years later, I switched from MS-DOS to Linux. I had never really liked MS-Windows.

4tH is a very small, very portable Forth bytecode compiler. It is written in the sweetest vanilla C that you can imagine. Even the ancient K&R C compiler of the now forgotten Unix clone Coherent is able to compile it cleanly. 4tH itself produces bytecode that you can run unchanged on virtually every platform available, from MS-DOS to AIX.

There have been many Forth standards. It all started with Forth-78 and the newest iteration is called ANS-Forth. The ANS-Forth standard consists of so-called 'wordsets', which are families of related functions. There are wordsets for 64 bit integer operations, floating point operations, local variables, etc. You can have a standard ANS-Forth compiler which supports only certain wordsets. There is no obligation to support them all as long as you document it.

If your compiler does not support all wordsets the chances are you are unable to compile all ANS Forth compliant programs, which is something you want to avoid as developer. But sometimes the design objectives of your project make it very hard to support certain features. Floating point support is one of them.

If you thought 64 bit operations are hard, because you have to take the carry into account, reconsider. Floating point is much harder. A floating point number consists of two parts: an exponent and a mantissa and is in essence an approximation and not a true representation. During floating point operations floating point numbers are constantly rounded and renormalized, so you may end up with 0.9999999999 instead of 1. Most modern CPUs have a floating point unit, but that is not of much use when you go for ultra portability. In short, I had long given up on floating point support.

That is, until I ran into Brad Eckert's floating point library, which was written in high level ANS-Forth. I had just added 64 bit support - 4tH natively only uses signed 32 bit numbers - so adding his library to 4tH was slowly entering the realm of possibilities. But first, I have to tell you something about Forth and its community. Making a Forth compiler is dead easy, so most Forth programmers have rolled their own. Forth is very easy to extend, so most Forth compilers have their own pet extensions or deviations from the standard. And since I'm a Forth programmer 4tH is no different, so I had to convert the library to make it compile under 4tH. During that process I found that my 64 bit library had some serious flaws, which had to be fixed first. But one afternoon when I was least expecting it 4tH properly divided 1 by 7. And that was it.

Or was it? Brad's library supported only the most basic of operations, that is division, addition, multiplication and square root. No equivalents for LOG, LN, SIN, COS, TAN and friends. That was a shame, because I wanted to port Krishna Myneni's "Star Trek" program to 4tH and that needed some trigonometric functions. From Usenet I learned that when Brad had presented his library to the community he had asked for some support to implement them, but for one reason or another it had never come to that. But still, I wanted those functions. Google learned me that no such functions - which are called "words" in Forth - had ever been published. That meant I had to go to work myself. First question, how do you determine the sine?

That proved to be more complex than I thought. There is no simple formula to calculate the sine, but there are several possible approaches. First, you can use the CORDIC (COordinate Rotation DIgital Computer) method, which requires a table. Second, you can use a table straight away and interpolate the intermediate values. Third, you can calculate an approximation by using the so-called Taylor series. With each iteration the error becomes smaller and smaller until you decide that good is good enough. I went for the Taylor series. However, the Taylor series method has one important limitation: it only delivers good results between minus Pi and plus Pi. Albert van der Horst, one of my Forth buddies on the Internet, thought it was a nice, clean solution - "good enough for government work" - but urged me to include range reduction. I wasted a lot of time on that, but Albert was kind enough to help me out. Thank you, Albert! After that COS and TAN were easy.

Now I became greedy. The oldest surviving program I have written is "TEONW", which stands for "The Effects of Nuclear Weapons". It is based on a report with the same name by the U.S. Energy commission. I had written it in Basic on a PHP-11 when I was a student. At the time, the Reagan administration was planning to station a handful of nuclear cruise missiles on Western European soil in response to the Soviet nuclear threat and like most of my fellow countrymen, I didn't agree. I had written the program to demonstrate what devastation a nuclear missile would cause and ridicule Reagan at the same time. Later, I ported the program to my Sinclair ZX Spectrum and that was it. If I ever wanted to port it to 4tH I at least needed the EXP and LOG functions.

That took quite some research. There were Taylor series for EXP and LOG, but these were only reliable within a limited range of values. By accident I ran into the Henry Briggs method, which allowed me to calculate the logarithm of any base with arbitrary accuracy. I solved the EXP challenge by splitting up the exponent in an integer part and a fraction, approximate the fraction part by using the Taylor series and multiply the results. The Taylor series provide a good result in the zero-to-one range, you see. After that the hyperbolic functions (SINH, COSH, TANH) and inverse hyperbolic functions (ASINH, ACOSH and ATANH) were easy.

Finally, I took a shot at the inverse trigonometric functions (ASIN, ACOS, ATAN). They key function here is the arctangent (ATAN). I used the Taylor series again and applied range reduction. By using the the tenth degree Taylor series I got a good approximation, although the error increases when you move further away from zero. With the arctangent the other functions (ASIN, ACOS) were easy. Hell, let's throw in FATAN2 as well.

Now I had a full set of high level ANS-Forth floating point words, exactly what I had set out to do. They were reasonably accurate, short and comprehensible and would form a nice addition to an already good compiler. Furthermore, I had completed what Brad Eckert had planned to do in the first place. Maybe this was not quite what he had in mind, but still. All source code is covered by the GPL, so the community had benefited as well.

In 1997 Bill McCarthy had asked me to add floating point support. I turned him down for the reasons I explained at the beginning of this blog. In 2002 John Paravantis requested the same and got the same answer. But hey, this is Open Source. If you get turned down, that doesn't mean a feature will never be added. You may have to wait a little while and if you do not want to wait for that long, do it yourself and submit your changes. Or make a fork for all I care.

We, as a development community, resemble the scientific community where people can built upon the work of others. That is why we are able to develop ourselves and our software faster than closed source software. And there are, like in the scientific community, different schools. I think this diversity is an asset. The most important thing is, however, that we continue to share the same ideal, because in the end we all win.


Unknown said...

I just stumbled upon your blog and I'm not familiar with Forth at all, but reading how you approximated all those functions using Taylor series I just figured I'd drop my two cents. If I recall correctly it's cheaper (computational wise) to use Chebyshev series; you usually get the same accuracy using a lower degree polynomial compared to Taylor series. And I'm positive that your Sinclair ZX Spectrum used Chebyshev series in it's math library.

Paravantis said...

Hello, this is John Paravantis, 6 years after I had requested that floating point be added to 4tH.

I read your interesting post and could not help noticing you appear to be somewhat ...proud you turned Bill McCarthy and me down in our requests to add floating point arithmetic to 4tH.

Perhaps, if you HAD done so, 4tH would be more actively sought out and used instead of being committed to the dustbin of programming oddities that once were BUT have since died a peaceful and lonely death.

As a more general remark, I really liked Forth as a language BUT the lack of ANY serious tools pretty much forced my hand in switching to more "modern" languages.

Whatever. This is all water under the bridge now.

I leave you with friendly regards from the world of C++, Java and Visual Basic here at the Department of Digital Systems of the University of Piraeus in Greece.

John Paravantis

The Beez' said...

I'm sorry John, I reread what I've written and I still remember writing it, I can't find any pride in turning people down. That's not where an Open Source programmer gets his pride from.

I am proud that I didn't forget your request and implemented it when the time was right. Note that 4tH was quite another beast at that time and there were other things to solve, like include files, multiple open files and 64 bit support.

I'm proud I spent days researching Taylor series and Briggs algorithms, since those weren't in the Eckert sources. I'm proud I fixed several bugs in a source matter that I'm not familiar with. I'm proud of every single user that uses the tool and is happy with it. I'm proud of every user that tells me I've done a good job.

I regret every user in the past that saw 4tH was not quite ready and moved away from it. I regret I can't spent all the time to 4tH that I want, so its progress is limited.

I can understand your bitterness a little bit, having to work with the likes of Visual Basic, Java and C++. That wouldn't be my choice either ;-)

Note a programmer recently added FP himself with ease, preferring native FP for speed considerations. It might have been a option to you at the time and I would have supported you as I am supporting him. That's Open source, mate.