Saturday, May 24, 2008

A day in the life

You know what the difference is between a professional blogger and amateurs like us? They write about the community and we are the community. We can write about things they will never be able to cover properly: our own experiences. A view from the inside. Usually, it doesn't take too much effort to write a blog entry like this, because I love writing about what I do.

I'm the proud maintainer of the 4tH compiler, which I designed almost fifteen years ago. I became part of the community when I decided to release it as Open Source. At that moment I realized that the Internet had changed the world of software development and shareware was simply 'not done' anymore. A few years later, I switched from MS-DOS to Linux. I had never really liked MS-Windows.

4tH is a very small, very portable Forth bytecode compiler. It is written in the sweetest vanilla C that you can imagine. Even the ancient K&R C compiler of the now forgotten Unix clone Coherent is able to compile it cleanly. 4tH itself produces bytecode that you can run unchanged on virtually every platform available, from MS-DOS to AIX.

There have been many Forth standards. It all started with Forth-78 and the newest iteration is called ANS-Forth. The ANS-Forth standard consists of so-called 'wordsets', which are families of related functions. There are wordsets for 64 bit integer operations, floating point operations, local variables, etc. You can have a standard ANS-Forth compiler which supports only certain wordsets. There is no obligation to support them all as long as you document it.

If your compiler does not support all wordsets the chances are you are unable to compile all ANS Forth compliant programs, which is something you want to avoid as developer. But sometimes the design objectives of your project make it very hard to support certain features. Floating point support is one of them.

If you thought 64 bit operations are hard, because you have to take the carry into account, reconsider. Floating point is much harder. A floating point number consists of two parts: an exponent and a mantissa and is in essence an approximation and not a true representation. During floating point operations floating point numbers are constantly rounded and renormalized, so you may end up with 0.9999999999 instead of 1. Most modern CPUs have a floating point unit, but that is not of much use when you go for ultra portability. In short, I had long given up on floating point support.

That is, until I ran into Brad Eckert's floating point library, which was written in high level ANS-Forth. I had just added 64 bit support - 4tH natively only uses signed 32 bit numbers - so adding his library to 4tH was slowly entering the realm of possibilities. But first, I have to tell you something about Forth and its community. Making a Forth compiler is dead easy, so most Forth programmers have rolled their own. Forth is very easy to extend, so most Forth compilers have their own pet extensions or deviations from the standard. And since I'm a Forth programmer 4tH is no different, so I had to convert the library to make it compile under 4tH. During that process I found that my 64 bit library had some serious flaws, which had to be fixed first. But one afternoon when I was least expecting it 4tH properly divided 1 by 7. And that was it.

Or was it? Brad's library supported only the most basic of operations, that is division, addition, multiplication and square root. No equivalents for LOG, LN, SIN, COS, TAN and friends. That was a shame, because I wanted to port Krishna Myneni's "Star Trek" program to 4tH and that needed some trigonometric functions. From Usenet I learned that when Brad had presented his library to the community he had asked for some support to implement them, but for one reason or another it had never come to that. But still, I wanted those functions. Google learned me that no such functions - which are called "words" in Forth - had ever been published. That meant I had to go to work myself. First question, how do you determine the sine?

That proved to be more complex than I thought. There is no simple formula to calculate the sine, but there are several possible approaches. First, you can use the CORDIC (COordinate Rotation DIgital Computer) method, which requires a table. Second, you can use a table straight away and interpolate the intermediate values. Third, you can calculate an approximation by using the so-called Taylor series. With each iteration the error becomes smaller and smaller until you decide that good is good enough. I went for the Taylor series. However, the Taylor series method has one important limitation: it only delivers good results between minus Pi and plus Pi. Albert van der Horst, one of my Forth buddies on the Internet, thought it was a nice, clean solution - "good enough for government work" - but urged me to include range reduction. I wasted a lot of time on that, but Albert was kind enough to help me out. Thank you, Albert! After that COS and TAN were easy.

Now I became greedy. The oldest surviving program I have written is "TEONW", which stands for "The Effects of Nuclear Weapons". It is based on a report with the same name by the U.S. Energy commission. I had written it in Basic on a PHP-11 when I was a student. At the time, the Reagan administration was planning to station a handful of nuclear cruise missiles on Western European soil in response to the Soviet nuclear threat and like most of my fellow countrymen, I didn't agree. I had written the program to demonstrate what devastation a nuclear missile would cause and ridicule Reagan at the same time. Later, I ported the program to my Sinclair ZX Spectrum and that was it. If I ever wanted to port it to 4tH I at least needed the EXP and LOG functions.

That took quite some research. There were Taylor series for EXP and LOG, but these were only reliable within a limited range of values. By accident I ran into the Henry Briggs method, which allowed me to calculate the logarithm of any base with arbitrary accuracy. I solved the EXP challenge by splitting up the exponent in an integer part and a fraction, approximate the fraction part by using the Taylor series and multiply the results. The Taylor series provide a good result in the zero-to-one range, you see. After that the hyperbolic functions (SINH, COSH, TANH) and inverse hyperbolic functions (ASINH, ACOSH and ATANH) were easy.

Finally, I took a shot at the inverse trigonometric functions (ASIN, ACOS, ATAN). They key function here is the arctangent (ATAN). I used the Taylor series again and applied range reduction. By using the the tenth degree Taylor series I got a good approximation, although the error increases when you move further away from zero. With the arctangent the other functions (ASIN, ACOS) were easy. Hell, let's throw in FATAN2 as well.

Now I had a full set of high level ANS-Forth floating point words, exactly what I had set out to do. They were reasonably accurate, short and comprehensible and would form a nice addition to an already good compiler. Furthermore, I had completed what Brad Eckert had planned to do in the first place. Maybe this was not quite what he had in mind, but still. All source code is covered by the GPL, so the community had benefited as well.

In 1997 Bill McCarthy had asked me to add floating point support. I turned him down for the reasons I explained at the beginning of this blog. In 2002 John Paravantis requested the same and got the same answer. But hey, this is Open Source. If you get turned down, that doesn't mean a feature will never be added. You may have to wait a little while and if you do not want to wait for that long, do it yourself and submit your changes. Or make a fork for all I care.

We, as a development community, resemble the scientific community where people can built upon the work of others. That is why we are able to develop ourselves and our software faster than closed source software. And there are, like in the scientific community, different schools. I think this diversity is an asset. The most important thing is, however, that we continue to share the same ideal, because in the end we all win.

Tuesday, May 6, 2008

The Grand Unification Theory

New comment on 'I like my bazaar!':
"As you said the writer is only partially invalid, having such a huge amount of distro's as the GNU/Linux do creates too many incompatibilities at some point, such as the package management systems. I find it stupid to have such a variety, the major distro's could have agreed on a single one or at least create a new one to suit everyones tastes and optimize it.

Also I'm sure some distro's could merge, not only because they could have similar goals but also because bigger developing teams mean faster and better development. The big number of distributions could mean that something may be developed in many distributions at the same time yet the developers are unaware of the fact thus wasting time by doing twice the work they could have done."

Dear Anonymous,

Let me tell you a little story, before you try to explain "The Grand Unification Theory" to me again. 386BSD was written mainly by Berkeley alumni Lynne Jolitz and William Jolitz. After the release of 386BSD 0.1, a group of users began collecting bug fixes and enhancements, releasing them as an unofficial patchkit. Due to differences of opinion between the Jolitzes and the patchkit maintainers over the future direction and release schedule of 386BSD, the maintainers of the patchkit founded the FreeBSD project in 1993 to continue their work.

Around the same time, the NetBSD project was founded by a different group of 386BSD users, with the aim of unifying 386BSD with other strands of BSD development into one multi-platform system. The project began as a result of frustration within the 386BSD developer community with the pace and direction of the operating system's development. The four founders of the NetBSD project, Chris Demetriou, Theo de Raadt, Adam Glass and Charles Hannum, felt that a more open development model would be beneficial to the project.

In December 1994, NetBSD co-founder Theo de Raadt was asked to resign his position as a senior developer and member of the NetBSD core team, and his access to the source code repository was revoked. The reason for this is not wholly clear, although there are claims that it was due to personality clashes within the NetBSD project and on its mailing lists. In October 1995, de Raadt founded OpenBSD, a new project forked from NetBSD 1.0. After all these years, the three flavors of *BSD are still alive. Some users like one, other users prefer the other for reasons only known to them.

Okay, got that? Now let me tell you another story before you go to bed. It is fairly easy to make a Forth compiler. Hence, virtually every serious Forth programmer has written his own. There are fat, tiny, portable, assembler based, meta-, bytecode, native, standalone, embedded, closed source, FOSS and lots and lots of other Forth compilers. They are standard (Forth-78, Forth-79, FIG-Forth, Forth-83, ANS-Forth) or non-standard. There are so many Forth compilers for every imaginable platform, you'd have a hard time to invent another variation. But I did just that. I didn't find a Forth compiler that was just right for me. So I developed my own, back in 1994. And what do you think? After me, others went through the very same process and invented their own.

It is a natural process. Whenever groups are formed, fractions will emerge. And when those fractions unify for one reason or another, there are others who won't agree, stay behind and found new groups. Well, it doesn't happen to closed source companies, you say? Right, but what keeps those companies together? Power. Money. They own you, you know. In Open Source, nobody owns anybody. If you can't find what you need, if you don't agree with somebody, you make your own. There is nothing or nobody stopping you as long as you are willing to comply with the license.

And that is exactly what is happening in the bazaar. Every merchant has its own product and the users decide. Some of us are both merchant and user, so we've seen both sides. Yes, I won't argue that there is a certain rationality in unification, but (human) nature just doesn't work that way. Would you prefer only one kind of car, one kind of television, one radio channel and one kind of cheese, the kind of cheese your neighbor likes and you detest? Of course not! That's why there is a bazaar. And that is why there is more than one cathedral ;-)

Sunday, May 4, 2008

I like my bazaar!

In his article "Why the Linux world should embrace the BSD's", Steve Lake proposed a closer cooperation between Linux and BSD. Although I have the utmost respect for BSD and what its developers have accomplished, I don't see what good it would do. I think his reasoning is flawed and the arguments he uses are - at least partially - invalid.

First, I don't agree that the cathedral is the best development method. There are many good programmers out there and they should not be denied the privilege to submit code. Note that Linus does not blindly insert all submissions. He or one of his lieutenants judge the code on its merits and decide to include it or not. Since many programmers can work on the code it is obvious that development can take place at a much faster pace. Note how the development of schedulers took place. Several different varieties were made, a lot of testing was done and in the end Linux gained overall. That is a far cry from the handgrenade method which Steve suggests Linus uses.

On the other hand, how many ports of Linux were done? It runs everything from mobile phones to mainframes. I don't see cathedral-developed software doing that (I was proven wrong here; there are 58 ports of NetBSD). From a philosophical point of view the bazaar is more democratic, allowing users to participate on every level and determining largely where development is going (Linus has acknowledged that on several occasions). You may call BSD a meritocracy, but you may also view it as a oliarchy.

Second, to me the BSD license equals to software theft. It is well known that BSD software enabled Microsoft to "steal" several key components, without doing anything in return for the community that developed it. Speaking of "sleeping with the enemy".. To use an analogy, the BSD license equals to a naked woman standing in the middle of skid row at night screaming: "Rape me! Rape me!". I don't mind anyone using my code (including Microsoft), but return the improvements that were made to the community or individual that developed it. My software was used in at least two different commercial products and the developers always submitted their modifications, which resulted in several key improvements. BTW, I use the LGPL - I'm not a Stallman groupie.

Third, I have nothing against a cooperation between both projects, but I do see legal issues. E.g. swapping code can be beneficial to both projects. May be the BSD group can live with the fact that Linus will use the GPLv2 for that code, but I'm not so sure that Linus can live with the fact that his code is published under a BSD license. That is what it boils down to in the end, even after accepting that the BSD and GPL communities have very different philosophies concerning development and licensing.

Finally, I'm desperately trying to see what he is actually proposing. What should this "partnership" do? Should it end in a complete merger of both projects? And why? Simply because "there can be only one"? Why not a merger between Microsoft and the FOSS world? Hell, let's turn over all the code we got! Then there is only one that (should) fit all. So, why not stop this silly game and let there be only Microsoft Vista? Aero isn't that bad..

Your answer will tell you why Linux and BSD should exist beside each other, why there are KDE, GNOME and Enlightenment and why the Tiny C compiler was developed (although a perfectly good GCC already existed). It is the classical error of cathedral proponents. A bazaar means choice, shopping malls, not the bleak shops of the Soviet era and - most of all - no high priests and Politbureau's. Being someone who has seen with his own eyes what dictatorship and elitarism can do to people in particular and society in general, I like my bazaar.