handyfloss

Because FLOSS is handy, isn’t it?

Python: speed vs. memory tradeoff reading files

Posted by isilanes on February 15, 2008

I was making a script to process some log file, and I basically wanted to go line by line, and act upon each line if some condition was met. For the task of reading files, I generally use readlines(), so my first try was:

f = open(filename,'r')
for line in f.readlines():
  if condition:
    do something
f.close()

However, I realized that as the size of the file read increased, the memory footprint of my script increased too, to the point of almost halting my computer when the size of the file was comparable to the available RAM (1GB).

Of course, Python hackers will frown at me, and say that I was doing something stupid… Probably so. I decided to try a different thing to reduce the memory usage, and did the following:

f = open(filename,'r')
for line in f:
  if condition:
    do something
f.close()

Both pieces of code look very similar, but pay a bit of attention and you’ll see the difference.

The problem with “f.readlines()” is that it reads the whole file and assigns lines to the elements of an (anonymous, in this case) array. Then, the for loops through the array, which is in memory. This leads to faster execution, because the file is read once and then forgotten, but requires more memory, because an array of the size of the file has to be created in the RAM.

fileread_memory

Fig. 1: Memory vs file size for both methods of reading the file

When you do “for line in f:“, you are effectively reading the lines one by one when you do each cycle of the loop. Hence, the memory use is effectively constant, and very low, albeit the disk is accessed more offten, and this usually leads to slower execution of the code.

fileread_time.png

Fig. 2: Execution time vs file size for both methods of reading the file

About these ads

2 Responses to “Python: speed vs. memory tradeoff reading files”

  1. paddy3118 said

    Try looking at the memory mapping feature amongst others, mentioned here.

    - Paddy.

  2. isilanes said

    Nice reading! Thanks a lot for the link, Paddy.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: