Wednesday, February 3, 2010

A data cruncher bites the dust

I know this post is gonna get me into trouble, but frankly - I don't care. I've been called a zealot, a bigot and lots of other names I didn't know about. I thank the people who threw those curses to my head, because without them I wouldn't have been able to expand my vocabulary with all these lofty insults.

Ok, why this lengthy introduction. Well, it's easy: FOSS programs didn't work for me. Ok, now I've said it. But it's the truth. You have to know that I'm a Configuration Manager and those guys do a lot of data crunching because their main job is to produce reports. And like it or not, but the main format they're processing and producing is MS-Excel. Users that want to have their data imported provide MS-Excel sheets and that's also the format they want to have in return.

The backend is an entirely different story. I've worked with Oracle, MySQL and SQL server backends. I use my own PHP toolkit and changing the backend is pretty trivial. I don't even care whether they're running IIS or Apache, Linux or Windows. Hell, if it doesn't perform there's always more hardware. Not my problem. I'm very particular about PHP for obvious reasons, since I don't want to throw away my code nor want to be tied to a single platform - don't get me started on Mono.

Anyway, where I'm currently working I have an entire LAMP stack at my disposal. Today, my schedule was empty so I decided to work from the comfort of my home. There were several odd jobs I had to do. The first one was to create three Change Requests on the CMDB datamodel. I always use LyX, so that job was quickly done. The next one was to produce two reports from four tables. I had dumped those into MS-Excel using PHPMyAdmin the previous day. PHPMyAdmin may have its quirks, but it works fast and reliable. No problems there.

Now I usually use MS-Access to produce those reports, but I don't have that one on my Linux box. So I decided to give OpenOffice Base 2.4.0 a go. It wanted me to load the spreadsheets in OpenOffice Calc and then paste the sheets into OOo Base. I found that a rather peculiar and somewhat slow procedure, but it worked alright. Making the reports was a little awkward, because I was not too familiar with the program, but went very well. When I was satisfied, I wanted to dump the reports into an MS-Excel sheet. And there it went terribly wrong..

My KDE 3.5 desktop froze. It froze even so bad that I had no control whatsoever. I waited for several minutes. Nothing happened. You have to know that I was a Configuration Manager at a major bank in the Netherlands for several years. I processed datasets with hundreds of thousands of records. This was a tiny dataset in comparison. Just a few thousand records. that's nothing. In the end I killed it - I still have an old skool Wyse serial terminal connected to my machine - and did something I never wanted to do: I put the OOo database on a memory stick and booted my employers WinXP laptop. That's the truth.

I hoped that maybe OOo on the WinXP desktop was a little better. No way. It froze again. I still saw some activity and waited. I waited a long time. It came back with 4096 rows. That's not enough, I needed 5150 rows (including the header). This would not do. In the end, I started MS-Access, read in the tables and produced the report. Nuff said. 20 minutes later I was done. It was infuriating!

First of all, if you spend so much time imitating MS-Access why can't you produce something that doesn't need a DDE server to export its files. It's easy! Even my 4tH compiler offers simple native MS-Excel export functions! Make it CVS, like Kexi or Knoda, I don't care! But just that? Impossible! Who produced that? Sun? IBM? Aren't they ashamed? If I were to work for government agencies that only offer OOo I couldn't get my work done! My goodness.. Microsoft saved the day.. I thought I would never see the day..

In the end, WinXP wouldn't release the memory stick. I found out why when I turned off the machine: the DDE server was still running for some reason. That's Microsoft behavior, folks! No, I won't blame Microsoft for that, I'm sure it was OOo which left that thing running.

On the other hand, I also had to create a presentation and used OOo Impress for that. It worked pretty nice, I exported the thing to PDF (I don't like presentation programs anyway, Acrobat Reader is so much easier) and I was done.

The verdict? I won't touch OOo Base with a pole anymore. It looks very nice, but it doesn't handle real world workloads. I have to try Kexi and Knoda and give this post a followup. I sincerely hope it will be much better than my latest experiences, because I can't allow myself to lose valuable time over this. When I'm working at home, I simply don't have the time for experiments: I need things that just work. Like LyX. Like OOo Impress. The only way to beat the system is to do better than the system. OOo Base just falls short.

Ok, it's true, I admit: I'm a Microsoft cshill. Sigh..


Unknown said...

You really should wash your mouth... ;-)
No honestly - I think you're quite right. The right tool for the job is the only approach that makes sense.
When you have a job to do you often don't have the luxury of taking your time and anyway - who needs frustrations like these?
Personally I promote FOSS whenever appropriate, but as with all things in life, fundamentalism is dangerous. We need to realize that FOSS are tools and if they are broken we should get them fixed and in the meantime use something that works. We need software that fits our needs - to adapt our needs to the software we use is pointless.

BTW - I like the fact that you don't take it out on the whole OOo package just because Base doesn't do the job.
Thumbs up!

"If two men are to compete, it should be in archery. As they face the same direction, trying to accomplish the same goal, it is the most civilized of sports." Confucius

S. Rajagopal said...

Perhaps you should try with Sun report builder add-on

Steve And Chriss said...

Yes the last time I used base I was also frustrated. Just did not do its job.

I work with all sorts of data and interfaces. Many of the data sets are larger then the 65,000 rows limited by the old versions of excel so have not had much luck with Microsoft's products historically either.

Like you I am a great fan of PHP, with its vast capabilities. But for this sort of task I tend to use just unix/linux/cygwin command line tools for quick data mash-ups. cut,join,awk,sed,sort etc are your friends. Then import the crunched data into excel or other products as required.

Non experienced uses would find my methods unattainable though. So we relay do need good tools that work. But Base is just not one of them :-(

Anonymous said...

Stop crying and a bug-report.

You should absolutely send a bug-report to the developers. Send your data too! That's the best way to improve the product.

Ruurd said...

Shill. Not chill. Chill is good. You know what? Maybe you should put some effort into using Kettle if you're doing data crunching.

DiBosco said...

knoda's actually pretty neat. I have used it for a couple of years and find it to be very useful. The ability to write Python scripts to do queries rather than having to learn arcane sql syntax is nice.

The down side is that it has no active development and it's been like that for some time now.

We really do need a nice Access style database front end for Linux. I'm trying to do one myself, but it's not a trivial task despite what Qt4 has in place.

Unknown said...

I'm not a huge fan of oobase or ms access. I'll never use access for anything important because I've seen data become corrupted one too many times. I am using base for one client and it is OK if a bit slow. Have not had it lock or lose data yet.

I want to make sure I understand what you were doing.

Get data from mysql database and place it into an excel spreadsheet.
Take the excel spreadsheet, convert it to oocalc and import into oobase.
Produce report from data in oobase and export it to an excel spreadsheet
That seems like too many steps and too many formats.

The Beez' said...

You're basically right. But sometimes corporate policy offers you little choice. You're also spot-on where MS-Access is concerned and apart from some Q&D data manipulation I stay far from it.

ClintG said...

I have to agree on Base. Every so often I try base again and every time I find it really worthless for what I want to do. I have traditionally used Access, and I rarely have problems doing what I want to do. But now, I have chosen not to buy MS Office and so that is not an option. So I generally turn to a SQL option like MYSQL or Postgres, neither of which is design for a single user desktop system.

I AM happy with OOO overall and I use it almost exclusively on Vista, XP, and Linux. But I don't use Base and wish OOO would integrate a better database.

The Beez' said...

HOW! I can only query or get reports from the issue database. I'm willing, but unless these guys offer an easier or more obvious way to enter bugreports I only have this blog at my disposal. BTW, it's not my task to find out whether it is already listed. I spend a little of my time, you spend a little of your time.

Anonymous said...

I dropped Oo Base like a hot potato because I found it slow and clumsy. On the same machine, Kexi (with its embedded SQLite3) just flies. It has a simple SQL window for non-basic operations. Do try it, and be aware that the current vintage 2007 model (1.1.3) is soon to be replaced with Kexi 2.

Anonymous said...

Beez the wall you run into with openoffice base I have hit a few different ways. If base is using it internal java based engine it can appear locked up. Note the word appear 40 to 50 min it can magically unfreeze nothing lost.

Base support using a mysql and quite lot of other thrid party backends. Using those it is a little more stable.

Kexi has always been my selection for rapid desktop userspace interface create for databases. Really looking forward to being able to use it on windows.

Report generations I don't depend on Kexi Base or any other front end tool.

Instead depend on software like , and others they are decanted to the task. Can be hosted web-side or location. So gives me flexibility on report deployment.

Base is one of the least worked on sections of open office and it shows.

Access is not a good option either it has the habit of remembering typos and processing the typos instead of what it is displaying you. What is the point of generating reports if they are incorrect.

At least Base locks up and dies not deceives you.

Anonymous said...

I agree: OOBase is just not up to scratch. I use Python + SQLite for small jobs and MySQL for bigger ones.