Final Report for the Google Summer of Code project

Robert Schuppenies
robert.schuppenies@gmail.com

August 15th, 2008

1  Synopsis

This Google Summer of Code project aimed for a consolidation of existing Python memory usage profiler approaches as well as interpreter support for retrieving object size information. It started on May 28th, 2008 and finished on August 18th. At first, the sys.getsizeof function was implemented, imitating the C function sizeof. Afterwards, muppy, a Memory Usage Profiler for Python was implemented, which provides tools to identify memory leaks.

2  Deliverables

This project delivers two pieces of work: the sys.getsizeof() implementation, as well as muppy, a memory leak tool set for Python.

2.1  sys.getsizeof

Lets start with what the documentation says:
sys.getsizeof(object[, default])¶

    Return the size of an object in bytes. The object can be any type of
    object. All built-in objects will return correct results, but this does not
    have to hold true for third-party extensions as it is implementation
    specific.

    The default argument allows to define a value which will be returned if the
    object type does not provide means to retrieve the size and would cause a
    TypeError.

    func:getsizeof calls the object’s __sizeof__ method and adds an additional
    garbage collector overhead if the object is managed by the garbage collector.

sys.getsizeof has a default implementation, which is used if the type of the object that is passed to sys.getsizeof does not have its own implementation. Some built-in types (e.g. dict) have their own implementation which incorporates special implementation details of each type.
An important decision was to only include the size of the memory which was required by the object itself, not any referenced objects. This gives a clear guideline on what should be included in an object's size. For example, unicode objects cache a string representation of themselves. Should this object be included? No, because it is a new object merely referenced by the unicode object.
Also sys.getsizeof only guarantees to works for objects of built-in types and types which adhere to the conventions. If a third-party extension provides a new C-implemented type which, besides the size defined in basicsize and itemsize (see 1) allocates other memory, this will not be reported by sys.getsizeof(). Such extensions will need to implement their own sizeof function. Usually though, this should not be necessary.
The code created for sys.getsizeof is now integrated into the CPython code base.

2.2  muppy

Although muppy started as a consolidation of existing memory profiler approaches it quickly turned into an memory leak detection toolset.
To be useful as leak finder, basic operations must be supported. These are
Because it is often not useful to work with entire object sets, but sufficient to work with summaries of those, a summary module is provided. It allows to view existing objects grouped by type, number, and size. The following features are provided:
Especially the last feature is useful if you want to monitor the memory usage over time. To further ease this tracking, the tracker module can be used. It allows to Users could implement this them-self, but tracker instances consider previously stored summaries and deduct them from the returned result. If a summary is too coarse-grained, it is also possible to use the ObjectTracker which returns object instances that were created since the last invocation.
Last but not least, muppy can identify where objects are referenced. This is useful when objects are leaking, which is often the case when objects are unintentionally still referenced somewhere in the application. The refbrowser module provides reference browsing for the console, output into a file, and interactive browsing though a graphical user interface.
When available, muppy uses the sys.getsizeof function to retrieve an object's size. If this is not the case, the asizeof module from Jean Brouwers is used. This provides backward compatibility of muppy for Python versions prior to 2.6.
Muppy is now hosted on the Python cheese shop and Google code. The cheese shop has the documentation2 as well as the package download3 and Google code provides the development infrastructure4.

2.3  Memory leak in Tkinter

With the help of muppy I was able to identify a memory leak in Tkinter . I was asked to check IDLE5 for any memory leaks. In this process, I discovered that memory was indeed leaking whenever a new window was opened and closed again. The reason was an implementation issue in Tkinter handling of Menu entries which now is fixed6 .

3  Time line

I started working on the sys.getsizeof function in May, with a first proposal posted on bugs.python.org on May 17th7. After discussions, the first patch was applied on June 1st8. The initial patch included special implementations of getsizeof for dict, list, byte, and long objects. Later on, unicode, tuple, set, byte array, and frame object implementations were added. Some tests failed on Windows 64-bit systems due to the special 64-bit model used in this architecture. This turned out to be helpful, as it pointed to errors in the test implementation which were not noticed on other architectures. The getsizeof implementation was correct for the most part, but needed an additional change to deal with type polymorphism and old style classes. The last patch regarding sys.getsizeof was committed on July 14th.
About at the same time I started working on muppy (see above). At first basic functionality was implemented, then the summary as well as the tracker module. A week later I started analyzing the IDLE application. With the tracker I could see that some objects are leaking every time a window was opened and closed, but I was not able to identify the referrers. Thus, the refbrowser modules were implemented. Now I could trace the leaking objects back to the Tkinter module. By the beginning of August a patch was proposed and checked-in a week later.

4  Last words

This project was a great experience for me and I would like to thank all involved participants.
First of all, I would like to thank Martin von Loewis, who has been a great mentor, was always there to answer my questions, invisibly guided my first steps in the Python community and lead me through the the depths of CPython.
Next, I would like to thank everybody from the Python developer community who discussed issues with me, provided the necessary insights and pointed to the resources which helped to get the job done. I'd like to mention Facundo Batista who offered his knowledges as a co-mentor, Georg Brandl who quickly jumped in when all the buildbots turned red on me.
On the organizational side, many thanks to Leslie Hawthorn from Google and James Tauber from the Python Software Foundation for organizing these three months and making it work so smoothly.
Last but not least special thanks to Jean Brouwers, who's implementation of the asizeof script inspired my work and who shared his thoughts with me throughout my project and beyond.
Finally an incomplete and unordered list of things I have started to understand and make use of during the last three months: CPython code base, garbage collection in Python, ReST, distutils, IDLE, Tkinter, googlecode hosting, Python's cheeseshop, lots on the decision process in Python, serious bug tracking and fixing, unicode transformation format, implemting object orientation in a procedural programming language, 64-bit programming models, memory alignment, breaking backwards compatability computer language (with all implications on the user side).

Footnotes:

1http://docs.python.org/api/type-structs.html
2http://packages.python.org/muppy/
3http://pypi.python.org/pypi/muppy
4muppy.googlecode.com
5http://en.wikipedia.org/wiki/IDLE_(Python)
6http://bugs.python.org/issue1342811
7http://bugs.python.org/issue2898
8http://svn.python.org/view?rev=63856&view=rev


File translated from TEX by TTH, version 3.67.
On 15 Aug 2008, 08:46.