Python Gotcha

From: Charlie Strauss <cems_at_earthlink_dot_net>
Date: Mon Oct 02 2006 - 10:45:37 CDT

The fabulous pythonistas that program for OVC probably are well aware
of this one, but I thought I'd share this just in case it was

I wrote a toy program to simulate creating or reading in ballots into
memory and then processing them en-mass and got subtly bitten by
Garbage Collection in a place that I'd guess would affect any similar

To be specific, I had a structure that looked like this:
each race-object held ten integers.
each ballot-object contained 100 race objects.

After I serially instantiated about 1500 ballot-objects then for
every 350 ballots I created the computer would pause for a moment.
The duration of the pauses grew from half a second up to almost ten

After looking into this I found this is normal behavior in any python
program cause by the garbage collection.

So anyhow the long and the short is that every 350 ballots I created
GC fired and then scanned every single object. This is a very slow
process when you have a lot of these. In this case, for every
thousand ballots I had 100 races each holding ten objects. So that's
1 million objects for 1000 ballots.

the GC was by orders of magnitude slower than all the operations I
wanted to do on the ballots! Indeed it was scanning 100% of the
memory so it got slower and slower every 350 ballots.

The solution was simple. disable GC.

Sorry if this is too much of a python-newbie observation, but I was
really surprised that I hit this problem at such a low number of
ballots. What was subtle about it was that since gc is silent, you
might easily over look it, since it's not a noticeable time lag at
low ballot numbers.

Note: in its default setting, GC works like this, every time you
create 700 new objects without deleting any, the GC scans all the new
objects to see if they contain cyclic references and are otherwise
unreachable. These are promoted to older objects, and every time you
create ten new older objects GC fires again and promotes these to
"oldest" objects, which also get scanned when the allocation minus
deallocation excceds ten.

Those trip points can be moved but that's a bandaid. unless you have
a data structure that can contain cyclic references I don't think gc
should ever be needed.

OVC-discuss mailing list
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
Received on Tue Oct 31 23:17:02 2006

This archive was generated by hypermail 2.1.8 : Tue Oct 31 2006 - 23:17:10 CST