Final Report for the Google Summer of Code project
Robert Schuppenies
robert.schuppenies@gmail.com
August 15th, 2008
1 Synopsis
This Google Summer of Code project aimed for a consolidation of existing Python
memory usage profiler approaches as well as interpreter support for retrieving
object size information.
It started on May 28th, 2008 and finished on August 18th. At first, the
sys.getsizeof function was implemented, imitating the C function sizeof.
Afterwards, muppy, a Memory Usage Profiler for Python was implemented, which
provides tools to identify memory leaks.
2 Deliverables
This project delivers two pieces of work: the sys.getsizeof() implementation,
as well as muppy, a memory leak tool set for Python.
2.1 sys.getsizeof
Lets start with what the documentation says:
sys.getsizeof(object[, default])¶
Return the size of an object in bytes. The object can be any type of
object. All built-in objects will return correct results, but this does not
have to hold true for third-party extensions as it is implementation
specific.
The default argument allows to define a value which will be returned if the
object type does not provide means to retrieve the size and would cause a
TypeError.
func:getsizeof calls the object’s __sizeof__ method and adds an additional
garbage collector overhead if the object is managed by the garbage collector.
sys.getsizeof has a default implementation, which is used if the type of the
object that is passed to sys.getsizeof does not have its own
implementation. Some built-in types (e.g. dict) have their own implementation
which incorporates special implementation details of each type.
An important decision was to only include the size of the memory which was
required by the object itself, not any referenced objects. This gives a clear
guideline on what should be included in an object's size. For example, unicode
objects cache a string representation of themselves. Should this object be
included? No, because it is a new object merely referenced by the unicode
object.
Also sys.getsizeof only guarantees to works for objects of built-in types and
types which adhere to the conventions. If a third-party extension provides a
new C-implemented type which, besides the size defined in basicsize and
itemsize (see 1)
allocates other memory, this will not be reported by sys.getsizeof(). Such
extensions will need to implement their own sizeof function. Usually though,
this should not be necessary.
The code created for sys.getsizeof is now integrated into the CPython code base.
Although muppy started as a consolidation of existing memory profiler approaches
it quickly turned into an memory leak detection toolset.
To be useful as leak finder, basic operations must be supported. These are
- retrieve all existing objects
- filter objects by type and size
- do diffs on object sets
- get referents (objects which another object is referring to) of objects up
to a certain level
Because it is often not useful to work with entire object sets, but sufficient
to work with summaries of those, a summary module is provided. It allows to view
existing objects grouped by type, number, and size. The following features are
provided:
- summarize a set of objects
- print summaries as tables
- do diffs on summaries.
Especially the last feature is useful if you want to monitor the memory usage
over time. To further ease this tracking, the tracker module can be
used. It allows to
- retrieve differences between a time t1 and a time t2
- print those diffs
Users could implement this them-self, but tracker instances consider previously
stored summaries and deduct them from the returned result. If a summary is too
coarse-grained, it is also possible to use the ObjectTracker which returns
object instances that were created since the last invocation.
Last but not least, muppy can identify where objects are referenced. This
is useful when objects are leaking, which is often the case when objects are
unintentionally still referenced somewhere in the application. The
refbrowser module provides reference browsing for the console, output
into a file, and interactive browsing though a graphical user interface.
When available, muppy uses the sys.getsizeof function to retrieve an object's
size. If this is not the case, the asizeof module from Jean Brouwers is
used. This provides backward compatibility of muppy for Python versions prior to
2.6.
Muppy is now hosted on the Python cheese shop and Google code. The cheese shop has
the documentation2 as well as
the package download3 and
Google code provides the development
infrastructure4.
2.3 Memory leak in Tkinter
With the help of muppy I was able to identify a memory leak in Tkinter . I was
asked to check IDLE5
for any memory leaks. In this process, I discovered that memory was indeed
leaking whenever a new window was opened and closed again. The reason was an
implementation issue in Tkinter handling of Menu entries which now is
fixed6 .
3 Time line
I started working on the sys.getsizeof function in May, with a first proposal
posted on bugs.python.org on May
17th7. After discussions, the
first patch was applied on June
1st8. The initial
patch included special implementations of getsizeof for dict, list, byte, and
long objects. Later on, unicode, tuple, set, byte array, and frame object
implementations were added. Some tests failed on Windows 64-bit systems due to
the special 64-bit model used in this architecture. This turned out to be
helpful, as it pointed to errors in the test implementation which were not
noticed on other architectures. The getsizeof implementation was correct for the
most part, but needed an additional change to deal with type polymorphism and
old style classes. The last patch regarding sys.getsizeof was committed on July
14th.
About at the same time I started working on muppy (see above). At first basic
functionality was implemented, then the summary as well as the tracker
module. A week later I started analyzing the IDLE application. With the tracker
I could see that some objects are leaking every time a window was opened and
closed, but I was not able to identify the referrers. Thus, the refbrowser
modules were implemented. Now I could trace the leaking objects back to the
Tkinter module. By the beginning of August a patch was proposed and checked-in a
week later.
4 Last words
This project was a great experience for me and I would like to thank all
involved participants.
First of all, I would like to thank Martin von Loewis, who has been a great
mentor, was always there to answer my questions, invisibly guided my first steps
in the Python community and lead me through the the depths of CPython.
Next, I would like to thank everybody from the Python developer community who
discussed issues with me, provided the necessary insights and pointed to the
resources which helped to get the job done. I'd like to mention Facundo Batista
who offered his knowledges as a co-mentor, Georg Brandl who quickly jumped in
when all the buildbots turned red on me.
On the organizational side, many thanks to Leslie Hawthorn from Google and James
Tauber from the Python Software Foundation for organizing these three months and
making it work so smoothly.
Last but not least special thanks to Jean Brouwers, who's implementation of the
asizeof script inspired my work and who shared his thoughts with me throughout
my project and beyond.
Finally an incomplete and unordered list of things I have started to understand
and make use of during the last three months: CPython code base, garbage
collection in Python, ReST, distutils, IDLE, Tkinter, googlecode hosting,
Python's cheeseshop, lots on the decision process in Python, serious bug
tracking and fixing, unicode transformation format, implemting object
orientation in a procedural programming language, 64-bit programming models,
memory alignment, breaking backwards compatability computer language (with all
implications on the user side).
Footnotes:
1http://docs.python.org/api/type-structs.html
2http://packages.python.org/muppy/
3http://pypi.python.org/pypi/muppy
4muppy.googlecode.com
5http://en.wikipedia.org/wiki/IDLE_(Python)
6http://bugs.python.org/issue1342811
7http://bugs.python.org/issue2898
8http://svn.python.org/view?rev=63856&view=rev
File translated from
TEX
by
TTH,
version 3.67.
On 15 Aug 2008, 08:46.