<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Summary of my Python optimization adventures</title>
	<atom:link href="http://handyfloss.wordpress.com/2008/02/17/summary-of-my-python-optimization-adventures/feed/" rel="self" type="application/rss+xml" />
	<link>http://handyfloss.wordpress.com/2008/02/17/summary-of-my-python-optimization-adventures/</link>
	<description>Because FLOSS is handy, isn't it?</description>
	<lastBuildDate>Mon, 19 Oct 2009 13:26:33 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Top Posts &#171; WordPress.com</title>
		<link>http://handyfloss.wordpress.com/2008/02/17/summary-of-my-python-optimization-adventures/#comment-590</link>
		<dc:creator>Top Posts &#171; WordPress.com</dc:creator>
		<pubDate>Mon, 18 Feb 2008 23:59:23 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.wordpress.com/?p=303#comment-590</guid>
		<description>[...]  Summary of my Python optimization adventures This is a follow up to two previous posts. In the first one I spoke about saving memory by reading line-by-line, [&#8230;] [...]</description>
		<content:encoded><![CDATA[<p>[...]  Summary of my Python optimization adventures This is a follow up to two previous posts. In the first one I spoke about saving memory by reading line-by-line, [&#8230;] [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lorenzo E. Danielsson</title>
		<link>http://handyfloss.wordpress.com/2008/02/17/summary-of-my-python-optimization-adventures/#comment-589</link>
		<dc:creator>Lorenzo E. Danielsson</dc:creator>
		<pubDate>Mon, 18 Feb 2008 20:43:32 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.wordpress.com/?p=303#comment-589</guid>
		<description>@James: what is wrong about calling an external tool to perform a job that it was designed to do? It doesn&#039;t make the program itself any less &quot;python&quot;. That&#039;s the UNIX way, let each tool do what it&#039;s best at.</description>
		<content:encoded><![CDATA[<p>@James: what is wrong about calling an external tool to perform a job that it was designed to do? It doesn&#8217;t make the program itself any less &#8220;python&#8221;. That&#8217;s the UNIX way, let each tool do what it&#8217;s best at.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Dalke</title>
		<link>http://handyfloss.wordpress.com/2008/02/17/summary-of-my-python-optimization-adventures/#comment-588</link>
		<dc:creator>Andrew Dalke</dc:creator>
		<pubDate>Mon, 18 Feb 2008 17:39:07 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.wordpress.com/?p=303#comment-588</guid>
		<description>Oops, missed the parens:  execfile(&quot;your_filename.py&quot;)</description>
		<content:encoded><![CDATA[<p>Oops, missed the parens:  execfile(&#8220;your_filename.py&#8221;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Dalke</title>
		<link>http://handyfloss.wordpress.com/2008/02/17/summary-of-my-python-optimization-adventures/#comment-587</link>
		<dc:creator>Andrew Dalke</dc:creator>
		<pubDate>Mon, 18 Feb 2008 17:23:43 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.wordpress.com/?p=303#comment-587</guid>
		<description>While I find it annoying to use, try the &quot;profile&quot; module to get an idea of where time is spent in your program.  Because your code isn&#039;t written as a module, the easiest way to do the profiling is using the command &quot;execfile &#039;filename&#039;&quot;.

This should tell you which line consumes the most time.</description>
		<content:encoded><![CDATA[<p>While I find it annoying to use, try the &#8220;profile&#8221; module to get an idea of where time is spent in your program.  Because your code isn&#8217;t written as a module, the easiest way to do the profiling is using the command &#8220;execfile &#8216;filename&#8217;&#8221;.</p>
<p>This should tell you which line consumes the most time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin</title>
		<link>http://handyfloss.wordpress.com/2008/02/17/summary-of-my-python-optimization-adventures/#comment-586</link>
		<dc:creator>Justin</dc:creator>
		<pubDate>Mon, 18 Feb 2008 15:07:52 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.wordpress.com/?p=303#comment-586</guid>
		<description>If you are going to do this:
&lt;code&gt;
if search_cre(line):
    line = re.sub(&#039;&gt;&#039;,&#039;&lt;&#039;,line)
    aline = line.split(&#039;&lt;&#039;)
    credit = float(aline[2])
&lt;/code&gt;

you should just change the regular expression to to be
&lt;code&gt;
&quot;total_credit &gt;(?P[^&lt;]+)&lt;&quot;
&lt;/code&gt;

or such, and then just pull out the credit if it matched.  The way you&#039;re doing it, you are processing the same line 3 times.  From the looks of it, you could change the program to use 1 regular expression for everything, instead of 4.

&lt;code&gt;
f = os.popen(&#039;zcat host.gz&#039;)
&lt;/code&gt;

Will be a lot faster than the gzip module though.</description>
		<content:encoded><![CDATA[<p>If you are going to do this:<br />
<code><br />
if search_cre(line):<br />
    line = re.sub('&gt;','&lt;',line)<br />
    aline = line.split('&lt;')<br />
    credit = float(aline[2])<br />
</code></p>
<p>you should just change the regular expression to to be<br />
<code><br />
"total_credit &gt;(?P[^&lt;]+)&lt;"<br />
</code></p>
<p>or such, and then just pull out the credit if it matched.  The way you&#8217;re doing it, you are processing the same line 3 times.  From the looks of it, you could change the program to use 1 regular expression for everything, instead of 4.</p>
<p><code><br />
f = os.popen('zcat host.gz')<br />
</code></p>
<p>Will be a lot faster than the gzip module though.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: isilanes</title>
		<link>http://handyfloss.wordpress.com/2008/02/17/summary-of-my-python-optimization-adventures/#comment-585</link>
		<dc:creator>isilanes</dc:creator>
		<pubDate>Mon, 18 Feb 2008 13:34:05 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.wordpress.com/?p=303#comment-585</guid>
		<description>Andrew, I am not doubting that the &quot;in&quot; construct is faster (I repeated your test, and here it&#039;s 3 times faster, as well). The problem is that maybe that part of the script is not the bottleneck, and the uncertainty in measured time (I always measured walltime with /usr/bin/time -f %e &lt;i&gt;command&lt;/i&gt; is of the order of the difference in using one or the other, so I can&#039;t diferentiate. I&#039;ll keep testing... (the faster the script, the more noticeable the subtle differences).</description>
		<content:encoded><![CDATA[<p>Andrew, I am not doubting that the &#8220;in&#8221; construct is faster (I repeated your test, and here it&#8217;s 3 times faster, as well). The problem is that maybe that part of the script is not the bottleneck, and the uncertainty in measured time (I always measured walltime with /usr/bin/time -f %e <i>command</i> is of the order of the difference in using one or the other, so I can&#8217;t diferentiate. I&#8217;ll keep testing&#8230; (the faster the script, the more noticeable the subtle differences).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Dalke</title>
		<link>http://handyfloss.wordpress.com/2008/02/17/summary-of-my-python-optimization-adventures/#comment-584</link>
		<dc:creator>Andrew Dalke</dc:creator>
		<pubDate>Mon, 18 Feb 2008 13:21:41 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.wordpress.com/?p=303#comment-584</guid>
		<description>When I benchmark with

python -m timeit -s &#039;import re&#039; -s &#039;search = re.compile(&quot;Linux&quot;).match&#039; &#039;search(&quot;Uses Linux&quot;)&#039;

I get &quot;0.413 usec per loop&quot;.  When I benchmark with

python -m timeit &#039;&quot;Linux&quot; in &quot;Uses Linux&quot;&#039;

I get &quot;0.141 usec per loop&quot;.  Not quite 3 times faster than using search.  (My 20-fold case was when I tested re.search(), which has a couple extra function calls overhead and a cache check.)

Are you sure you timed what you think you timed?  Every time I&#039;ve done the comparison the &quot;in&quot; test is faster, and I know the underlying implementation well enough that I can&#039;t think of how it can be slower than the re code.

Also, instead of writing to a temporary file and reading from that, use either the subprocess module or the older and harder to use os.popen call.  (Harder to use because it&#039;s harder to deal with errors.)  That should also give you some performance increase because you aren&#039;t doing a full read/write through the disk.</description>
		<content:encoded><![CDATA[<p>When I benchmark with</p>
<p>python -m timeit -s &#8216;import re&#8217; -s &#8217;search = re.compile(&#8220;Linux&#8221;).match&#8217; &#8217;search(&#8220;Uses Linux&#8221;)&#8217;</p>
<p>I get &#8220;0.413 usec per loop&#8221;.  When I benchmark with</p>
<p>python -m timeit &#8216;&#8221;Linux&#8221; in &#8220;Uses Linux&#8221;&#8216;</p>
<p>I get &#8220;0.141 usec per loop&#8221;.  Not quite 3 times faster than using search.  (My 20-fold case was when I tested re.search(), which has a couple extra function calls overhead and a cache check.)</p>
<p>Are you sure you timed what you think you timed?  Every time I&#8217;ve done the comparison the &#8220;in&#8221; test is faster, and I know the underlying implementation well enough that I can&#8217;t think of how it can be slower than the re code.</p>
<p>Also, instead of writing to a temporary file and reading from that, use either the subprocess module or the older and harder to use os.popen call.  (Harder to use because it&#8217;s harder to deal with errors.)  That should also give you some performance increase because you aren&#8217;t doing a full read/write through the disk.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: isilanes</title>
		<link>http://handyfloss.wordpress.com/2008/02/17/summary-of-my-python-optimization-adventures/#comment-583</link>
		<dc:creator>isilanes</dc:creator>
		<pubDate>Mon, 18 Feb 2008 12:01:22 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.wordpress.com/?p=303#comment-583</guid>
		<description>James, you&#039;re absolutely right. What I meant was &quot;optimizing a script&quot;, by using the best tools I could get access to (plus my limited knowledge).

Anyway, my first attempt was to make it all in Python, and as effectively as possible, and I give some hints of what to do and what not to do... so this qualifies as &quot;Python optimization&quot;? :^)</description>
		<content:encoded><![CDATA[<p>James, you&#8217;re absolutely right. What I meant was &#8220;optimizing a script&#8221;, by using the best tools I could get access to (plus my limited knowledge).</p>
<p>Anyway, my first attempt was to make it all in Python, and as effectively as possible, and I give some hints of what to do and what not to do&#8230; so this qualifies as &#8220;Python optimization&#8221;? :^)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James</title>
		<link>http://handyfloss.wordpress.com/2008/02/17/summary-of-my-python-optimization-adventures/#comment-582</link>
		<dc:creator>James</dc:creator>
		<pubDate>Mon, 18 Feb 2008 11:41:00 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.wordpress.com/?p=303#comment-582</guid>
		<description>Is this really Python optimization? Haven&#039;t you just offloaded most of the program to grep, effectively &quot;rewriting&quot; it in C?</description>
		<content:encoded><![CDATA[<p>Is this really Python optimization? Haven&#8217;t you just offloaded most of the program to grep, effectively &#8220;rewriting&#8221; it in C?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: isilanes</title>
		<link>http://handyfloss.wordpress.com/2008/02/17/summary-of-my-python-optimization-adventures/#comment-581</link>
		<dc:creator>isilanes</dc:creator>
		<pubDate>Mon, 18 Feb 2008 10:42:31 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.wordpress.com/?p=303#comment-581</guid>
		<description>Thanks Paolo and Andrew for your recommendations. The fact is that I am by no means a hacker, just a lame hobbyist :^)

I post my experience so that others can benefit, but my ignorance is huuuuge. Thanks to interaction with others, such as your comments, I keep learning every day... thanks!

&lt;b&gt;@Paolo:&lt;/b&gt; If you mean using match instead of search, I did it, and saw little or no gain (although I expected to see it).

&lt;b&gt;@Andrew:&lt;/b&gt; I have read that &quot;if pattern in line:&quot; is faster than &quot;if re.search(pattern, line):&quot;, but I have tested it, and if &quot;pattern&quot; is pre-compiled, I see no real advantage (actually it was slightly slower).

The dictionary lookup avoidance thing is a good advice, I will try it. Anyway, I am reluctant to make much more optimization here, because most of the computation time is spent in the system call to zcat and grep, so it doesn&#039;t make much sense make a big effort to reduce the time spent in the &quot;Python part&quot;. I will eventually do it, anyway, just for the fun :^)</description>
		<content:encoded><![CDATA[<p>Thanks Paolo and Andrew for your recommendations. The fact is that I am by no means a hacker, just a lame hobbyist :^)</p>
<p>I post my experience so that others can benefit, but my ignorance is huuuuge. Thanks to interaction with others, such as your comments, I keep learning every day&#8230; thanks!</p>
<p><b>@Paolo:</b> If you mean using match instead of search, I did it, and saw little or no gain (although I expected to see it).</p>
<p><b>@Andrew:</b> I have read that &#8220;if pattern in line:&#8221; is faster than &#8220;if re.search(pattern, line):&#8221;, but I have tested it, and if &#8220;pattern&#8221; is pre-compiled, I see no real advantage (actually it was slightly slower).</p>
<p>The dictionary lookup avoidance thing is a good advice, I will try it. Anyway, I am reluctant to make much more optimization here, because most of the computation time is spent in the system call to zcat and grep, so it doesn&#8217;t make much sense make a big effort to reduce the time spent in the &#8220;Python part&#8221;. I will eventually do it, anyway, just for the fun :^)</p>
]]></content:encoded>
	</item>
</channel>
</rss>
