handyfloss

Because FLOSS is handy, isn’t it?

Posts Tagged ‘Software’

Some more tweaks to my Python script

Posted by isilanes on February 19, 2008

All the comments to my previous post have provided me with hints to increase further the efficiency of a script I am working on. Here I present the advices I have followed, and the speed gain they provided me. I will speak of “speedup”, instead of timing, because this second set of tests has been made in a different computer. The “base” speed will be the last value of my previous test set (1.5 sec in that computer, 1.66 in this one). A speedup of “2″ will thus mean half an execution time (0.83 s in this computer).

Version 6: Andrew Dalke suggested the substitution of:

line = re.sub('>','<',line)

with:

line   = line.replace('>','<')

Avoiding the re module seems to speed up things, if we are searching for fixed strings, so the additional features of the re module are not needed.

This is true, and I got a speedup of 1.37.

Version 7: Andrew Dalke also suggested substituting:

search_cre = re.compile(r'total_credit').search
if search_cre(line):

with:

if 'total_credit' in line:

This is more readable, more concise, and apparently faster. Doing it increases the speedup to 1.50.

Version 8: Andrew Dalke also proposed flattening some variables, and specifically avoiding dictionary search inside loops. I went further than his advice, even, and substituted:

stat['win'] = [0,0]

loop
  stat['win'][0] = something
  stat['win'][1] = somethingelse

with:

win_stat_0 = 0
win_stat_1 = 0

loop
  win_stat_0 = something
  win_stat_1 = somethingelse

This pushed the speedup futher up, to 1.54.

Version 9: Justin proposed reducing the number of times some patterns were matched, and extract some info more directly. I attained that by substituting:

loop:
  if 'total_credit' in line:
    line   = line.replace('>','<')
    aline  = line.split('<')
    credit = float(aline[2])

with:

pattern    = r'total_credit>([^<]+)<';
search_cre = re.compile(pattern).search

loop:
  if 'total_credit' in line:
    cre    = search_cre(line)
    credit = float(cre.group(1))

This trick saved enough to increase the speedup to 1.62.

Version 10: The next tweak was an idea of mine. I was diggesting a huge log file with zcat and grep, to produce a smaller intermediate file, which Python would process. The structure of this intermediate file is of alternating lines with “total_credit” then “os_name” then “total_credit”, and so on. When processing this file with Python, I was searching the line for “total_credit” to differentiate between these two lines, like this:

for line in f:
  if 'total_credit' in line:
    do something
  else:
    do somethingelse

But the alternating structure of my input would allow me to do:

odd = True
for line in f:
  if odd:
    do something
    odd = False
  else:
    do somethingelse
    odd = True

Presumably, checking falsity of a boolean is faster than matching a pattern, although in this case the gain was not huge: the speedup went up to 1.63.

Version 11: Another clever suggestion by Andrew Dalke was to avoid using the intermediate file, and use os.popen to connect to and read from the zcat/grep command directly. Thus, I substituted:

os.system('zcat host.gz | grep -F -e total_credit -e os_name > '+tmp)

f = open(tmp)
for line in f:
  do something

with:

f = os.popen('zcat host.gz | grep -F -e total_credit -e os_name')

for line in f:
  do something

This saves disk I/O time, and the performance is increased accordingly. The speedup goes up to 1.98.

All the values I have given are for a sample log (from MalariaControl.net) with 7 MB of gzipped info (49 MB uncompressed). I also tested my scripts with a 267 MB gzipped (1.8 GB uncompressed) log (from SETI@home), and a plot of speedups vs. versions follows:

versions2.png

Execution speedup vs. version
(click to enlarge)

Notice how the last modification (avoiding the temporary file) is of much more importance for the bigger file than for the smaller one. Recall also that the odd/even modification (version 10) is of very little importance for the small file, but quite efficient for the big file (compare it with Version 9).

The plot doesn’t tell (it compares versions with the same input, not one input with the other), but my eleventh version of the script runs the 267 MB log faster than the 7 MB one with Version 1! For the 7 MB input, the overall speedup from Version 1 to Version 11 is above 50.

Posted in howto | Tagged: , , , , | 7 Comments »

Filelight makes my day

Posted by isilanes on February 7, 2008

First of all: yes, this could have been made with du. Filelight is just more visual.

The thing is that yesterday I noticed that my root partition was a bit on the crowded side (90+%). I though it could be because of /var/cache/apt/archives/, where all the installed .deb files reside, and started purging some unneeded installed packages (very few… I only install what I need). However, I decided to double check, and Filelight has given me the clue:

Filelight_root

(click to enlarge)

Some utter disaster in a printing job filled the /var/spool/cups/tmp/ with 1.5GB of crap! After deleting it, my root partition is back to 69% full, which is normal (I partitioned my disk with 3 roots of 7.5GB (for three simultaneous OS installations, if need be), a /home of 55GB, and a secondary disk of 250GB).

Simple problem, simple solution.

Posted in my ego and me | Tagged: , , , , | Leave a Comment »

App of the week: digiKam

Posted by isilanes on February 6, 2008

As digital cameras get more and more common, and personal photo collections grow bigger, solutions for managing all these images are more and more needed.

I bought my first digital camera (a Nikon CoolPix 2500) almost 4 years ago (now I see the model was 1 year old when I bought my unit), and now I own a Panasonic Lumix DMC FX10 I’m so happy with. I obviously have the need outlined above, plus the desire to sometimes share some pictures over the web. I didn’t want to go for something like Picasa, and made a lengthy Perl/Tk script to generate HTML albums from some info I would introduce.

When I later discovered digiKam, I realized it had all the features I wanted. It is incredibly useful to tag your pictures, so that you can later on retrieve, say, “all the pictures in which my father appears”. It also has many other features, like easy access to image manipulation (of which I only use the rotation for photos requiring it), or ordering of the pictures by date, so you can see how many pictures were taken each month. The humble, but for me killer, features is that you can automatically generate HTML albums from a list of pictures, which can be selected e.g. by their tags.

Give it a try, and you’ll love it.

Posted in Application of the Week | Tagged: , , , , , | Leave a Comment »

Reflexión repentina y aleatoria

Posted by isilanes on February 1, 2008

Hay muy pocos problemas que los ordenadores no puedan solucionar. Y casi ninguno que no puedan crear.

Posted in my ego and me | Tagged: , , , | 1 Comment »

SpyPig: another annoyance against your privacy

Posted by isilanes on January 27, 2008

I’ve read in a post in Genbeta [es], about a “service” for e-mail senders called SpyPig. It basically boils down to sending a notification to the sender of an e-mail, when the recipient opens it. This way, the recipient can not say that she hasn’t read it.

I will deal with two issues: moral and technological. Morally, I think this kind of things suck. I have received these e-mails asking for confirmation of having been read, and I never found appealing to answer. But at least you were asked politely. What these pigs SpyPigs do is provide a sneaky way of doing it without the recipient knowing. Would you consider someone doing it on you a friend? Not me.

Now, technologically, the system is more than simple, and anyone with access to a web server could do it. The idea is that the sender writes the e-mail in HTML mode, and inserts a picture (can be a blank image) hosted at some SpyPig server. When the recipient opens the HTML message, the image is loaded from the server, and the logs of the server will reflect when the image was loaded, and hence the e-mail opened. When this happens, the server notifies the sender.

The bottom line of this story is that HTML IS BAD for e-mails. My e-mail readers never allow displaying HTML messages, and show me the source HTML code instead (of course, I can allow HTML, but why would I?). So this SpyPig thing will never work for against me. And this SpyPig story is just one more reason not to allow displaying HTML in the messages you read. Of course, for the e-mails you send, consider sending them in plain text. Your recipients will be a bit happier.

For more tips on what NOT to do on web/e-mail issues, check the e-mail/web tips section in this blog.

Posted in Evil software | Tagged: , , , , , | 7 Comments »

German government caught buying malware to intercept Skype calls

Posted by isilanes on January 26, 2008

I’ll parrot here the news shaking the blogosphere today: apparently the German government intended to obtain software to spy on Skype users. What’s next?

Links:

Posted in This evil world | Tagged: , , , , , , | Leave a Comment »

Windows 2000 Server on a NAS? No, thanks.

Posted by isilanes on January 24, 2008

You would think that, as a researcher in a serious center like the DIPC, one would hardly ever encounter a MS product, at least in the server/cluster section (more than one fellow here has Windows in his/her computer, but don’t tell anyone, it’s a secret).

However, we do have some server running Windows, and its presence is almost transparent for the user (which is good). And I say “almost”, because it stumbled upon one of its “features”, and the sysadmin ended up confessing :^)

The thing is that I happened to try to create a directory with “CO” (carbon monoxide) in its name (it was a dir for a SIESTA calculation), when a dir with the same name already existed, except it had “Co” (cobalt) in the name. Well, the filesystem complained that a dir by the same name already existed! I could not believe my eyes!!

Basically the filesystem would not make any difference at all for different capitalizations. For example:

% mkdir testdir
% mkdir TESTDIR
mkdir: cannot create directory `TESTDIR': File exists

% touch testfile
% rm TESTfile
rm: remove regular empty file `TESTfile'?

The explanation? The directory I was in was exported from a NAS running… ta-da: Windows 2000 Server!

How incredibly stupid and annoying is it to have a filesystem that ignores character case altogether? And how error prone? Because if you are not aware of that, you might delete a file you didn’t intend to!! Someone could try to excuse MS by saying that, OK, that was in 2000. But, look, Linux could tell upper case from lower case since its inception in 1991, and Unix since the seventies! The root of the problem is the filesystem used by the OS, of course. But it so happens that the filesystems used by Linux since 1991 (beginning in ext and then many others) had this capability (and many more), and are free. All that MS had to do was to use them, instead of FAT or NTFS. But instead they chose to develop those (inferior) filesystems in parallel for almost 20 years now. I call that stupidity.

No need to say that the sysadmin of the DIPC absolutely regrets having been naive enough to ever buy that MS crap.

Posted in Windows no-nos | Tagged: , , , , , , , | Leave a Comment »

Graphical = good and command line = bad?

Posted by isilanes on January 15, 2008

It is not uncommon to hear (mostly from Windows users berating Linux and its “useless console”) that one of the benefits of Windows is that everything can be done through a GUI. After all, clicking on icons and finding stuff in menus is more intuitive, and everything is easier that way. In contrast, with Linux you have to “type an awful lot of things, which is boring, slow, and difficult. And ugly”.

Now, don’t get me wrong, GUIs are great. I quite like them. What annoys me is the lack of command-line interface for some tasks. Both GUIs and CLIs have their place in computer use, and the wise should use each when appropriate. In this post I will try to illustrate a case where the automation allowed by using the CLI and some scripting is largely missed. The user (me) is forced to use an “intuitive” GUI, with the result that my patience takes a direct hit below the flotation line.

The first task I faced was to plot some orbitals of a molecule. The data for each orbital is saved in one file, and I am running a program that can read them and plot the given orbital (Molekel).

The following YouTube video, made by myself, shows the process of plotting 2 orbitals (I had to plot 17). Notice that, due to the program running so slow, the process takes around 1 minute per orbital!

Notice also that all the previous work has been done: choosing the colors of the background, atoms and orbitals, choosing the orientation, opening the atomic geometry… The comprehensive list of what to do for each orbital follows, with each line preceded by the point in time (seconds) when it happens:

  1. 00.00 – Click on “Delete surface” to remove previous orbital
  2. 07.80 – Click on “Load” to load a new orbital
  3. 12.67 – Choose a file from the dialog window, and click on “Accept” to load it
  4. 30.27 – Click on “Both signs”, because we want both positive and negative part of the orbital
  5. 31.33 – Introduce a value for the isosurface (0.05) in the “cutoff” box
  6. 33.13 – Click on “Create surface” to have Molekel render the isosurface
  7. 37.33 – Isosurface appears
  8. 37.93 – From a drop-down menu (called with right-click of the mouse), choose Snapshot -> RGB
  9. 52.66 – “Save as” dialog box appears
  10. 62.73 – Introduce filename for snapshot, and click “Accept”
  11. 65.00 – We’re done, and can repeat the process for the next orbital

One can’t help but notice that 65 seconds are needed to make eight clicks and introduce a short text in two boxes! The issue is that human attention is necessary during the whole 65 seconds, because the time between actions is too short to do something else in between (although long enough to get on your nerves, like the full 15 seconds to have the “Save as” dialog appear).

Another obvious point is that from the two short texts introduced by the user, one (the value of the isosurface) is always the same, and only the other (the name of the file to save the snapshot as) varies. Also, only one click of the 8 we do is ever different (the choice of orbital file to read). It would be nice to have a robot do this task, the only data we would have to feed it being a list of orbitals (to read, and then to save a snapshot). But we can’t. We are stuck with this sluggish process!

In contrast, I will next show a case where some automation was made. The process is that of cropping the snapshots taken in the previous step (the Molekel thing). Sure, we could use GIMP, or some other GUI tool, but applying exactly the same process to a list of 17 images (and this is a short list, it could have been 1000) is the kind of thing that cringes for automation.

The following video shows the process:

Recall that it takes 4 minutes to process ALL the images. This may not sound like a huge improvement over the 18:25 that it (in principle) took the process above (17 x 65 sec). However, the time spent with Molekel scales linearly with the number of orbitals. 100 orbitals would need almost 2h. The automated cropping process would have taken more than 4 minutes, but only slightly more: maybe 5 or 6.

Also notice that the 4 minutes are full of decisions, and there is no repetitive, unnecessary task (except the fact of committing errors). Let’s take a look at the actions taken during the 4 minutes:

  1. 00:11.00 – Open a Perl script I had half-done (another benefit of automation: you can reuse old stuff)
  2. 00:17.87 – Shade window to take a look at the number and name of files to process
  3. 00:21.33 – Change script accordingly
  4. 00:47.93 – Save changes
  5. 00:51.60 – Back to the CLI, and run the script
  6. 00:55.00 – Ups, nothing happened!
  7. 00:58.27 – Reopen the script, and look for the error
  8. 01:05.53 – Found it. Fix it.
  9. 01:07.33 – Save and execute
  10. 01:06.87 – It works!
  11. 01:13.00 – Finished running (0.36 sec per picture)
  12. 01:21.73 – Open a cropped image in viewer
  13. 01:22.93 – Realized the crop is wrong!
  14. 01:30.73 – Alt-Tab to script file, to modify it
  15. 01:55.67 – Save and execute again
  16. 02:11.73 – Open the cropped images. The first one seems to be OK!
  17. 02:29.80 – We reach one that is wrong
  18. 02:34.20 – Back to the script, and fix it
  19. 02:45.07 – Save, and back to CLI to re-run
  20. 02:53.00 – Reopen in image viewer
  21. 02:56.00 – Cropped part is not centered!
  22. 03:02.00 – Back to the script, and fix it
  23. 03:12.33 – Save and re-run
  24. 03:19.73 – Reopen in image viewer
  25. 03:27.87 – Yet another error: an image could have been cropped more, to hide an unwanted part
  26. 03:32.40 – Back to the script
  27. 03:40.80 – Rerun
  28. 03:47.13 – Reopen images
  29. 03:59.93 – See that all of them are correct. Stop and rest

Recall also that if I were to repeat both processes tomorrow, the image cropping would simply require to run the script again (0.36 seconds per image, and you can do something else in between, if you have 1000 images and don’t want to waste time waiting). The creation of the orbitals, on the other hand, would require to repeat the whole process again!! (65 seconds per orbital, plus you have to spend that time paying attention to the process. You can not run something and go away). And the whole problem with the creation of the orbitals is that there is no command-line way of doing it, to be able to automate it.

Posted in Free software and related beasts | Tagged: , , , , , , , | Leave a Comment »

Re-partitioning a disk infected with Vista to dual-boot with Linux

Posted by isilanes on January 9, 2008

Some time ago I helped a friend to install Linux into a Vista laptop (incidentally, another friend asked me about the subject today). The only aspect I’m covering in this post is the re-partitioning of the disk, which is a wee bit trickier than with XP and previous Windows versions.

With my laptop (one with XP preinstalled), I just inserted my favorite Linux CD, rebooted, and used the built-in partition utility that all Linux installation CDs have to downsize the Windows partition, and then make the Linux partitions in the remaining disk space. With Vista this is not the case. You have to be very careful, because Linux can not resize the Vista partitions (at least at the time of writing these lines). The problem is that Vista uses a modified NTFS format, and Linux can not cope with it yet (read more at my source for this info: pronetworks.org).

You can also find at pronetworks.org a detailed HowTo for making the resizing of a partition. In summary (e.g., for shrinking a partition to make room for Linux):

  1. Go to Control Panel -> Administrative Tools -> Computer Management
  2. Click on Disk Management (under Storage in left hand panel)
  3. Locate partition to shrink, right click on it, and from the context menu choose Shrink Volume
  4. Fill in the self-explanatory dialog box. Basically, enter amount of MB you want the partition to be reduced by.

You will thus end up with a smaller Vista partition, and some empty space. Now, you can insert the Linux CD, reboot, and install Linux in that empty space, without touching the Vista partition.

Posted in howto | Tagged: , , , , , | Leave a Comment »

Extracting audio from a YouTube video

Posted by isilanes on January 7, 2008

This HowTo really has two parts:

1 – How to download a video from YouTube
2 – How to extract audio from any video

The second step is not limited to videos obtained in the first step, and the first step can obviously be made for the sake of it.

How to download a video from YouTube

When you play a video on YouTube, the contents of a FLV file are streamed to your screen. Downloading this FLV file is a bit more tricky than it should be, because there is no direct indication of the URL of this file in the code of the page of the video.

Apparently some guys got over it, and they made the software I use to do the job: a Firefox extension called DownloadHelper. Using it is so easy: a three-sphere icon appears to the right of the URL bar of Firefox. When a page contains material that can be downloaded with DownloadHelper (such as a YouTube video), the spheres in the icon are colorful and move (otherwise they are grayed-out, and still). You can then click on the icon to see a list of items to download, and choose the one you want (usually the .flv file).

How to extract audio from any video

It is so easy to do from the command line. First we use MPlayer to extract the audio in PCM/WAV format:


% mplayer filename.flv -vo null -ao pcm:fast:file=filename.wav

Then, we make use of oggenc to encode the WAV into Ogg Vorbis. For example, to encode with quality level 7 (a reasonable tradeoff between quality and size):


% oggenc filename.wav -q 7 -o filename.ogg

And that’s all to it!

Posted in howto | Tagged: , , , , , , | 2 Comments »

 
Follow

Get every new post delivered to your Inbox.