Re: Python Gotcha

From: Reg. Charney <charney_at_charneyday_dot_com>
Date: Mon Oct 02 2006 - 19:46:11 CDT


I work with Python and your analysis is technically correct. However, it
is unusual to intentionally create millions of objects without deleting
any of them. For one thing, it eats up memory that can be used for other
things. In your case, creating millions of objects is logically correct
and you do want to disable automatic garbage collection (GC) -- so this
is not a band aid.

Having said that, I am concerned that you are keeping all of that
information in memory. If there is a power failure or a software bug,
all the information that you have collected could be lost. I would have
thought that you would want to write each ballot out to non-volatile
memory or backing storage, so that only one ballot, at most, would be
lost due to power outage or some component failure.

If you wish, you can send me directly, as opposed to the whole list, a
zip/tarball of your source for me to look at.

Reg. Charney

 Charlie Strauss wrote:
> The fabulous pythonistas that program for OVC probably are well aware
> of this one, but I thought I'd share this just in case it was
> overlooked.
> I wrote a toy program to simulate creating or reading in ballots into
> memory and then processing them en-mass and got subtly bitten by
> Garbage Collection in a place that I'd guess would affect any similar
> program.
> To be specific, I had a structure that looked like this:
> each race-object held ten integers.
> each ballot-object contained 100 race objects.
> After I serially instantiated about 1500 ballot-objects then for
> every 350 ballots I created the computer would pause for a moment.
> The duration of the pauses grew from half a second up to almost ten
> seconds.
> After looking into this I found this is normal behavior in any python
> program cause by the garbage collection.
> So anyhow the long and the short is that every 350 ballots I created
> GC fired and then scanned every single object. This is a very slow
> process when you have a lot of these. In this case, for every
> thousand ballots I had 100 races each holding ten objects. So that's
> 1 million objects for 1000 ballots.
> the GC was by orders of magnitude slower than all the operations I
> wanted to do on the ballots! Indeed it was scanning 100% of the
> memory so it got slower and slower every 350 ballots.
> The solution was simple. disable GC.
> Sorry if this is too much of a python-newbie observation, but I was
> really surprised that I hit this problem at such a low number of
> ballots. What was subtle about it was that since gc is silent, you
> might easily over look it, since it's not a noticeable time lag at
> low ballot numbers.
> Note: in its default setting, GC works like this, every time you
> create 700 new objects without deleting any, the GC scans all the new
> objects to see if they contain cyclic references and are otherwise
> unreachable. These are promoted to older objects, and every time you
> create ten new older objects GC fires again and promotes these to
> "oldest" objects, which also get scanned when the allocation minus
> deallocation excceds ten.
> Those trip points can be moved but that's a bandaid. unless you have
> a data structure that can contain cyclic references I don't think gc
> should ever be needed.
> _______________________________________________
> OVC-discuss mailing list

OVC-discuss mailing list
= The content of this message, with the exception of any external
= quotations under fair use, are released to the Public Domain
Received on Tue Oct 31 23:17:03 2006

This archive was generated by hypermail 2.1.8 : Tue Oct 31 2006 - 23:17:10 CST