Saturday, February 6, 2010

The data cruncher rides again

As you may remember from my previous post, I simply wanted to import three spreadsheets into an MS-Access equivalent, use the "Query By Example" (QBE) mode to create a simple report and export the result to another spreadsheet. I did the whole thing in MS-Access as well and it took me about twenty minutes. The spreadsheets are relatively small: 2367 rows, 5149 rows and 27804 rows. Last time we saw how OOo Base 2.4.0 fell through. Not that it lacked functionality, but because it was so badly executed.

This time I decided to give Kexi 1.1.3 and Knoda 0.8.3 a go. Not that I really needed to, but just to see if they were up to it if I needed to execute another odd job like this. The first part concerned reading the spreadsheets. Both packages do not support MS-Excel files out of the box, so I needed to convert those to CSV. That wasn't too hard. OOo Calc did the job. Reading them was quite similar in both packages, since both provide beautiful wizards which make it quite easy. However, I was unable to set the primary key and I knew I had to pay for that once I executed the query.

In Kexi, I decided to open up the design table window and set the primary key. The program warned me however, when I did that I would lose all my data. Well, that was not an option. So I decided to skip that and get right to designing the query using the QBE window. I was unable to specify an INNER JOIN or OUTER JOIN, so that was a bummer. When I opened up the SQL window I saw why: it had created an old skool JOIN. These kinds of JOIN create a Carthesian product and use the WHERE clause to select the appropriate rows. Well, it could work. As expected, it ran for a while and then came back with 2690 rows. That was not quite what I expected. I expected 5149 rows. I decided to leave it there and continue with Knoda.

It is a shame that Knoda hasn't been developed any further, because it was a nice package. Nobody seems to know what has happened to its creator Horst Knorr. The last modifications are over 900 days ago, so consequently Knoda has not been ported to KDE 4.x. Anyway, I decided to use the dBase driver and read in the CSV files. I quicky found out that was not quite what I wanted, since every field with a name longer than 10 characters was ignored. Then I picked the SQLite3 driver. Reading in the data was much slower now. The largest table was progressing very slowly. I got nervous. In half an hour I had to catch the tramway to go to work.

(I know I have to explain this to my American audience. If you use public transport in Europe, you're not necessarily poor. I don't even have a drivers license. That doesn't mean I'm physically or mentally challenged in any way. I just means I never needed a car that badly that I thought the investment in time and money was worth it.)

So I decided to leave the thing be, take a shower and come back afterwards. Dripping wet I checked the screen. 67%. I dressed while keeping an eye on the progressbar. 96%, 97%, 98%. Come on! 99%, 100%! I quickly shut down the computer and ran outside.

Several hours later, I fired it up again and continued. The QBE was nice, I could even set the JOIN properties. I could not switch to the SQL window, though. That was only possible after I saved the query - and then I couldn't get back to the QBE window. Neither could I select multiple fields and drag them to the query bar. There was no way to make aggregations by using GROUP BY or COUNT(). Still, it did the job. Since I was unable to set primary keys, it took a while, but Ok. It came back with 5156 rows. That were 7 more than expected, but wait a minute: I was working with the uncorrected datasets. I knew that there were 7 duplications I had corrected in the report I had created yesterday. Alright!

(Yes, by this time I had already reported these inconsistencies to the Configuration Administrators and told them to figure out what was wrong and correct the appropriate entries in the Configuration Management System. I know what an "exception" is and I've been in this ballgame for twelve straight years, thank you.)

Then I had to export the result to an MS-Excel file. Knoda doesn't support this format, so I decided to export it to CSV. It took a bit longer than expected, but there it was in all its 1.2 MB beauty. I fired up OOo Calc and opened the file. OOo Calc showed a wizard, I filled in the correct values and off we went. FAIL!! OOo Calc quickly started to eat through my 2 GB RAM and then decided to consume all swap space. I frantically hit the KSysguard icon and then went for the terminal. I had only seconds to spare!
killall soffice
soffice: no process killed
ps -eaf

Help! Fortunately, KSysguard came up and although it was slow I was able to kill OOo Calc. Pfff! That was close!

I fired up KSpread 1.6.3 and decided to repeat the procedure. It didn't even blink. Duh? I saved the spreadsheet in .SXC format, fired up OOo Calc again and read the .SXC file. No problem. It was on the screen in mere seconds. WT.. is that?

The verdict
OOo Base has a horrible way to import and export data. I also think that choosing a Java SQL engine wasn't a good idea. But all in all, it offers all the functionality you need for this kind of job. It's beautifully designed and could be a real replacement for MS-Access, if only it wasn't so badly executed. Yes, it's so badly executed that it becomes unworkable. I'm not prepared to risk a freeze for doing an odd job on OOo Base that I can painlessly execute on MS-Access.

Kexi is nice, but seriously lacking. The QBE does not provide several key functionalities, ALTERing tables is virtually impossible, so it's not really a replacement for MS-Access. I've also taken a look at Kexi 2.0, but I don't see too much improvements in this area.

Knoda lacks several minor features, has its quirks and is not particularly fast (I would call it slow, but that could also be the fault of SQLite3), but it gets the job done! If you don't have MS-Access at your disposal and want something similar to do the job on your Linux machine, this would be my pick. Unfortunately, Knoda is slowly dying and that's a shame! Isn't there somebody who can pick this project up and continue it - or at least salvage the parts that can be integrated in Kexi?

Some final thoughts
What I thought to be a minor job turned into a three day quest to find the best MS-Access replacement. I automatically assumed that there would be a viable FOSS replacement for what I consider to be one of the least interesting and most buggy parts of the MS-Office suite. I was kind of shocked that we don't have it. That doesn't mean you can't get the job done - there is always the CLI of course - but it will take more time than the MS-Access way.

I use the QBE to generate most of the SQL and then edit it later. I find SQL rather verbose and don't particularly like the syntax - it reminds me of COBOL. But since SQL is the standard, it is hard to avoid.

Several readers pointed me to a bunch of major data crunching suites. But I just want to hang up a painting - not drill the Tokio subway. And I have a dislike for Java programs - don't ask me why, it's just a feeling.

I also have to admit I haven't used OpenOffice too much until now. I write my documents with LyX, hardly ever use spreadsheets and didn't have much use for OOo Base. To me OpenOffice was a way to view and modify MS-Office documents - keep a link with the madding MS-Office crowd, if you prefer. This was the first time I gave it a full workout and frankly: I was shocked. I can't have a program (any program) that brings my rather reliable machine to its knees - for whatever reason. In this small quest OpenOffice did that almost twice. That's disturbing. I find that disturbing, especially since the basic functionality is good and sometimes unequaled in FOSS.

On the other hand, I was pleasantly surprised by KOffice. It may not provide all the compatibility you need with the MS-Office world, but the functionality it has is well-executed and reliable. Next time I need a spreadsheet, but don't need absolute MS-Office compatibility, I definitely will use KSpread.


Anonymous said...

Ok fair enough...

But have you considered reaching through the developers of OpenOffice and the like? This is open source and if you fill a bug report....maybe you can help mankind:-)

Doug Gack said...

I make a living doing work in Access. I keep trying to find a reasonable Linux/FOSS replacement, but not much luck, so far. Gambas is probably the best candidate, but it really is more like Visual Basic (by design), but it works. Harbour/Xharbour ought to be a candidate, except for a total lack of introductory documentation. If there's no way to test it for a newbie, it'll not work for me.

Keep trying, there's bound to be someone who sees the need for a complete RAD environment toat can replace Access.


Anonymous said...

Still wondering why you used outdated OOo 2.4.0 when 3.1.1 is the latest and 3.2.0 is on it's way?