handyfloss

Because FLOSS is handy, isn’t it?

Archive for the ‘Free software and related beasts’ Category

Minipunto para Arsys

Posted by isilanes on February 17, 2008

Vaya por delante que no conozco nada de Arsys, y que (por ahora) no tengo nada que ver con ellos. Simplemente quería compartir el hecho de que he vistado su página (fantaseando con adquirir un dominio propio), y he visto esto:

arsys_ff.png

¿Nada raro? Pues fijáos en que, como buen servicio relacionado con Internet, tiene una fotico con un señor y un navegador web abierto… ¿Internet Explorer? Yo creo que no…

Advertisements

Posted in Free software and related beasts | Tagged: , , , , | 2 Comments »

Python: speed vs. memory tradeoff reading files

Posted by isilanes on February 15, 2008

I was making a script to process some log file, and I basically wanted to go line by line, and act upon each line if some condition was met. For the task of reading files, I generally use readlines(), so my first try was:

f = open(filename,'r')
for line in f.readlines():
  if condition:
    do something
f.close()

However, I realized that as the size of the file read increased, the memory footprint of my script increased too, to the point of almost halting my computer when the size of the file was comparable to the available RAM (1GB).

Of course, Python hackers will frown at me, and say that I was doing something stupid… Probably so. I decided to try a different thing to reduce the memory usage, and did the following:

f = open(filename,'r')
for line in f:
  if condition:
    do something
f.close()

Both pieces of code look very similar, but pay a bit of attention and you’ll see the difference.

The problem with “f.readlines()” is that it reads the whole file and assigns lines to the elements of an (anonymous, in this case) array. Then, the for loops through the array, which is in memory. This leads to faster execution, because the file is read once and then forgotten, but requires more memory, because an array of the size of the file has to be created in the RAM.

fileread_memory

Fig. 1: Memory vs file size for both methods of reading the file

When you do “for line in f:“, you are effectively reading the lines one by one when you do each cycle of the loop. Hence, the memory use is effectively constant, and very low, albeit the disk is accessed more offten, and this usually leads to slower execution of the code.

fileread_time.png

Fig. 2: Execution time vs file size for both methods of reading the file

Posted in Free software and related beasts | Tagged: , , , , | 3 Comments »

Igalia en Telecinco

Posted by isilanes on January 31, 2008

Esta mañana en La Mirada Crítica de Telecinco han hablado sobre conciliación de la vida laboral y personal, y sobre el “teletrabajo” (trabajar desde casa).

Como ejemplo han mencionado Igalia, y han entrevistado in situ a un par de trabajadores de dicha empresa. ¿Por qué lo menciono? Pues porque Igalia es una empresa dedicada al software libre (hecho que los entrevistados han mencionado dos veces en la breve entrevista), y porque T5 ha dicho que Igalia factura un millón de euros al año (o sea, que funciona bien).

Al describir las facilidades (horario flexible, ayudas para guarderías, etc.) que daba Igalia a sus trabajadores, me ha recordado, salvando las distancias, a Google, que repite como mejor empresa estadounidense donde trabajar, según Fortune.

Cerraba el presentador diciendo: “[…] claro, no todas las empresas trabajan en un sector que esto pueda hacerse”. Se refería a IT, obviamente, pero se hace extensivo a, concretamente, el software libre. ¡Trabajad con SL, que se vive mejor!

Posted in Free software and related beasts | Tagged: , , , , , , , | Leave a Comment »

Graphical = good and command line = bad?

Posted by isilanes on January 15, 2008

It is not uncommon to hear (mostly from Windows users berating Linux and its “useless console”) that one of the benefits of Windows is that everything can be done through a GUI. After all, clicking on icons and finding stuff in menus is more intuitive, and everything is easier that way. In contrast, with Linux you have to “type an awful lot of things, which is boring, slow, and difficult. And ugly”.

Now, don’t get me wrong, GUIs are great. I quite like them. What annoys me is the lack of command-line interface for some tasks. Both GUIs and CLIs have their place in computer use, and the wise should use each when appropriate. In this post I will try to illustrate a case where the automation allowed by using the CLI and some scripting is largely missed. The user (me) is forced to use an “intuitive” GUI, with the result that my patience takes a direct hit below the flotation line.

The first task I faced was to plot some orbitals of a molecule. The data for each orbital is saved in one file, and I am running a program that can read them and plot the given orbital (Molekel).

The following YouTube video, made by myself, shows the process of plotting 2 orbitals (I had to plot 17). Notice that, due to the program running so slow, the process takes around 1 minute per orbital!

Notice also that all the previous work has been done: choosing the colors of the background, atoms and orbitals, choosing the orientation, opening the atomic geometry… The comprehensive list of what to do for each orbital follows, with each line preceded by the point in time (seconds) when it happens:

  1. 00.00 – Click on “Delete surface” to remove previous orbital
  2. 07.80 – Click on “Load” to load a new orbital
  3. 12.67 – Choose a file from the dialog window, and click on “Accept” to load it
  4. 30.27 – Click on “Both signs”, because we want both positive and negative part of the orbital
  5. 31.33 – Introduce a value for the isosurface (0.05) in the “cutoff” box
  6. 33.13 – Click on “Create surface” to have Molekel render the isosurface
  7. 37.33 – Isosurface appears
  8. 37.93 – From a drop-down menu (called with right-click of the mouse), choose Snapshot -> RGB
  9. 52.66 – “Save as” dialog box appears
  10. 62.73 – Introduce filename for snapshot, and click “Accept”
  11. 65.00 – We’re done, and can repeat the process for the next orbital

One can’t help but notice that 65 seconds are needed to make eight clicks and introduce a short text in two boxes! The issue is that human attention is necessary during the whole 65 seconds, because the time between actions is too short to do something else in between (although long enough to get on your nerves, like the full 15 seconds to have the “Save as” dialog appear).

Another obvious point is that from the two short texts introduced by the user, one (the value of the isosurface) is always the same, and only the other (the name of the file to save the snapshot as) varies. Also, only one click of the 8 we do is ever different (the choice of orbital file to read). It would be nice to have a robot do this task, the only data we would have to feed it being a list of orbitals (to read, and then to save a snapshot). But we can’t. We are stuck with this sluggish process!

In contrast, I will next show a case where some automation was made. The process is that of cropping the snapshots taken in the previous step (the Molekel thing). Sure, we could use GIMP, or some other GUI tool, but applying exactly the same process to a list of 17 images (and this is a short list, it could have been 1000) is the kind of thing that cringes for automation.

The following video shows the process:

Recall that it takes 4 minutes to process ALL the images. This may not sound like a huge improvement over the 18:25 that it (in principle) took the process above (17 x 65 sec). However, the time spent with Molekel scales linearly with the number of orbitals. 100 orbitals would need almost 2h. The automated cropping process would have taken more than 4 minutes, but only slightly more: maybe 5 or 6.

Also notice that the 4 minutes are full of decisions, and there is no repetitive, unnecessary task (except the fact of committing errors). Let’s take a look at the actions taken during the 4 minutes:

  1. 00:11.00 – Open a Perl script I had half-done (another benefit of automation: you can reuse old stuff)
  2. 00:17.87 – Shade window to take a look at the number and name of files to process
  3. 00:21.33 – Change script accordingly
  4. 00:47.93 – Save changes
  5. 00:51.60 – Back to the CLI, and run the script
  6. 00:55.00 – Ups, nothing happened!
  7. 00:58.27 – Reopen the script, and look for the error
  8. 01:05.53 – Found it. Fix it.
  9. 01:07.33 – Save and execute
  10. 01:06.87 – It works!
  11. 01:13.00 – Finished running (0.36 sec per picture)
  12. 01:21.73 – Open a cropped image in viewer
  13. 01:22.93 – Realized the crop is wrong!
  14. 01:30.73 – Alt-Tab to script file, to modify it
  15. 01:55.67 – Save and execute again
  16. 02:11.73 – Open the cropped images. The first one seems to be OK!
  17. 02:29.80 – We reach one that is wrong
  18. 02:34.20 – Back to the script, and fix it
  19. 02:45.07 – Save, and back to CLI to re-run
  20. 02:53.00 – Reopen in image viewer
  21. 02:56.00 – Cropped part is not centered!
  22. 03:02.00 – Back to the script, and fix it
  23. 03:12.33 – Save and re-run
  24. 03:19.73 – Reopen in image viewer
  25. 03:27.87 – Yet another error: an image could have been cropped more, to hide an unwanted part
  26. 03:32.40 – Back to the script
  27. 03:40.80 – Rerun
  28. 03:47.13 – Reopen images
  29. 03:59.93 – See that all of them are correct. Stop and rest

Recall also that if I were to repeat both processes tomorrow, the image cropping would simply require to run the script again (0.36 seconds per image, and you can do something else in between, if you have 1000 images and don’t want to waste time waiting). The creation of the orbitals, on the other hand, would require to repeat the whole process again!! (65 seconds per orbital, plus you have to spend that time paying attention to the process. You can not run something and go away). And the whole problem with the creation of the orbitals is that there is no command-line way of doing it, to be able to automate it.

Posted in Free software and related beasts | Tagged: , , , , , , , | Leave a Comment »

rip2ogg released

Posted by isilanes on December 19, 2007

I have released (how pretentious!) rip2ogg.py, the wonderful CD ripper everyone was waiting for. You can check its “home page” at www.ehu.es/isilanes.

Why did I do it? Well, one of the wonderful tools GNU/Linux provides to rip CDs is KAudioCreator, which is very neat. However, it has some shortcomings I wanted to overcome (again, how pretentious!):

  1. It’s slow. It rips the CD to WAV and encodes the ripped WAVs to Ogg in parallel, while rip2ogg.py does both things sequentially. Yet rip2ogg.py is 40% faster! I have ripped a whole CD in 14 minutes with KAC, and in 10 minutes with r2o.
  2. You can not have arbitrary character substitution, just one, and the interface for that is horrible. For example, with KAC it’s very simple to substitute every blank in the track name with an underscore. BUT I have found no way to provide KAC with two lists, so that it substitutes every character in the first list with the corresponding character(s) in the second list.
  3. You can change the track title to get a “nice” filename for the Ogg, but the change is also reflected in the “track tile” tag. You can not tell KAC to substitute a “ñ” in the title for a “n” in the filename, but to keep the “ñ” in the “tag title” tag.
  4. KAC is not able to rip all CDs. It sometimes chokes on DRM‘d CDs, and copes horribly with scratched surfaces. In contrast, the programs rip2ogg.py uses to rip have never failed for me. More than once command-line was my only way of ripping some rogue CDs. KAC simply couldn’t.

In the end, it all boils down to be able to control what the ripper is doing. To do so, I decided to make this simple script.

Obviously it is FLOSS (GPLv2), so use, modify and redistribute to your heart’s content!

Posted in Free software and related beasts | Tagged: , , , | 9 Comments »

Labeled breaks in Python

Posted by isilanes on December 19, 2007

I am a recent fan of Python, a very neat scripting language.

One thing I miss from Perl is the availability of labeled breaks. What are those? Suppose you have two nested loops. When a condition is met in the inner loop, you want to exit both loops. With Python there is not straightforward way of doing it. Imagine we are reading an array of data, line by line and column by column, and we want to exit when meeting the first zero value. With Perl:


LINELOOP: foreach my $i (0..$lines)
{
  COLLOOP: foreach my $j (0..$columns)
  {
     break LINELOOP unless $val[$i][$j];
  };
};

A simple “break” will exit the innermost loop, but we can use a label to exit a specific loop. However, in Python there is no such a thing as a labeled loop, as explained in this PEP.

My rant is with the explanations given by van Rossum himself in Python mailing list to reject the change:

1. The complexity added to the language, permanently.
2. My expectation that the feature will be abused more than it will be used right.

Wow! Incredible reasons!

The first one is silly: other languages have it, and it has worked fine. Adding complexity to a tool for the sake of it is really stupid, I agree. But the fact is labeled breaks would be tremendously useful, so the increase in complexity would be justified. Surely a language that can only print “Hello world” would be less complex, yet of little use.

The second reason is absolutely over-the-shoulder-of-the-users. So now good old Guido must guide his sheep along the “correct” path, lest we get lost! He is punishing the good programmers by not giving them a useful tool, so that bad programmers are protected from their stupidity. It’s like not selling cars at all because some people drive while drunk.

Just my 2 cents…

Posted in Free software and related beasts | Tagged: , , , , | 3 Comments »

PDA Linux en el folleto de MediaMarkt

Posted by isilanes on December 16, 2007

La propaganda de MediaMarkt que hoy acompañaba al periódico me ha sorprendido bastante. Como se puede ver en la imágen (sacada de su página web), no solo ofrecen una PDA con Linux, sino que la palabra “Linux” aparece dentro de la flechita blanca que utilizan para resaltar los “puntos fuertes” del producto.

Clic para ver folleto completo

No quiere esto decir que se hayan vuelto “buenos”, pero hasta ahora (y como todos los vendedores) han estado vendidos al Enemigo Oscuro incondicionalmente. Ya sé que si venden gadgets con Linux o lo anuncian es simplemente por marketing: es una palabra de moda y vende más. Lo que me llena de alegría es precisamente que “Linux” sea una palabra que venda más y pueda ser utilizada por motivos de marketing. Eso significa que cada vez hay una base más amplia de consumidores que lo toman en cuenta, ¿no?

Posted in Free software and related beasts | Tagged: , , , , | 1 Comment »

Compiz Fusion on an integrated Intel 865G graphics chip under Debian Lenny

Posted by isilanes on December 14, 2007

Blog moved to: handyfloss.net

Entry available at: http://handyfloss.net/2007.12/compiz-fusion-on-an-integrated-intel-865g-graphics-chip/

This YouTube video shows Compiz Fusion running on my work computer. It has a fairly decent CPU (P4 3.00GHz), but no “useless” things like sound cards or (more relevant for this issue) graphics card. The only thing it has is an Intel 82865G graphics chip integrated in the motherboard. We are talking about an integrated chip (not dedicated graphics card) released in May 2003.

Judge the performance for yourself (take into account that the actual performance is higher, since the recording program to make the video also uses up some resources):

Posted in Free software and related beasts | Tagged: , , , , , | 5 Comments »

Unicode in the command line

Posted by isilanes on November 20, 2007

This is a short HowTo for making unicode work in Linux, specifically in the command line. Yet more specifically, in the konsole terminal. This is useful if you want to be able to use characters like ‘ñ’ or accents like in ‘á’ and ‘ö’.

1 – Modify your shell locale variables

You need locale settings that support UTF (for example en_US.UTF-8). For that, you can add the following lines to .tcshrc or whatever script run at login:


setenv mylang   en_US.UTF-8
setenv LANG     $mylang
setenv LC_CTYPE $mylang
setenv LANGUAGE $mylang
setenv LC_ALL   $mylang

The ‘$mylang’ thing is just because I’m lazy, and I might want to change them all in the future, and I don’t want to type too much.

2 – Modify your global locales

I don’t know if this is needed, but it doesn’t hurt. In Debian:

% dpkg-reconfigure locales

and follow the instructions, using en_US.UTF-8 or something similar as default.

3 – Modify the encoding of Konsole

In the menus:

Settings->Encoding->Unicode (utf8)

Make this permanent with:

Settings->Save as Default

Then choose xterm and not linux as keyboard setting:

Settings->Keyboard->Xterm (XFree 4.x.x)

You can make this permanent in the Session tab of:

Settings->Configure Konsole

namely inserting “xterm” in the box labeled “$TERM”.

If you follow these instructions, you will be able to introduce non-ASCII text in the terminal, and use non-ASCII filenames without problem.

Posted in Free software and related beasts | Tagged: , , , , , | Leave a Comment »

File compression: gzip vs. bzip2

Posted by isilanes on November 5, 2007

I just found out that my regular backups at a couple of computers are filling up the corresponding disks (for the Spanish readers: ¡están petaos!), and I realized that it is because I was keeping a bunch of 200MB files uncompressed. Since the files are ASCII, full of numbers, most of which are actually zeros, they are perfect candidates for compressing them with tools like gzip or bzip2. Everybody knows that the latter is more efficient, but slower, so I made a small comparison:

Original file: 211MB
gzip: 4.5MB in 11 s (compress), 6.5 s (uncompress)
bzip2: 2.4MB in 1323 s (compress), 27 s (uncompress)

Yes, the compression with bzip2 is impressing: 88x compression, where gzip gets 47x (almost a 90% better compression). But the timing is poor: bzip2 is 120 times slower than gzip. For uncompression, bzip2 fares better: “only” 4 times slower than gzip. Where gzip can uncompress a file in about half the time it took to compress, bzip2 does the same almost 50 times faster (because compressing was soooo slow).

This case is anecdotal, but it nicely illustrates my experience in general.

Posted in Free software and related beasts | Tagged: , , , , , | Leave a Comment »