refcount balancer
Contact:
Chris Waterson (waterson@netscape.com)
overview
One of the things that sucks about XPCOM is that you have to deal with
reference counting. It's hard and prone to errors, and Mozilla leaks
like a sieve because of it. Unlike the good old fashioned
malloc() and free() model where memory gets
allocated in exactly one place and freed in exactly one other place,
reference counting is distributed all over. There may be twenty
different spots in the code where a single object is
AddRef()-ed. And if just one of those
AddRef()-ers forgets to Release(), well,
you're screwed.
Traditional leak tracking tools like
Purify
don't help much either. They'll tell you that you leaked an object,
but they won't help you track down the twenty different clients that
AddRef()-ed it, let alone the joker that forgot to
Release() it.
This crude set of tools attempts to address that problem. It's not a
panacea, but it at least gives some insight into who is
AddRef()-ing whom.
From 50,000 feet, here's what happens.
-
You discover that your FooImpl object is leaking, maybe Bruce
Mitchener tells you, maybe you notice on your own because your
destructor is never called. You cringe and moan and later the bug for
3 or 4 milestones. But since you know about this tool, you eventually
roll up your sleeves and start working on it.
-
You set a couple of environment variables in a debug build.
-
As you run, you notice piles and piles of information will start to
spew out to the console. Specifically, as your object is
AddRef()-ed and Release()-ed, a stack
trace is generated, along with the operation (AddRef or
Release), this (i.e., the object that just got
operated on), and the current reference count of your object. This
mountain of information, although impressive, is useless in its
current form.
-
You next run Perl script #1 over the resulting log file. This Perl
script will pick out the instances of objects that leaked. You choose
one of the objects that's particularly interesting to you.
-
You now run Perl script #2 over the log file. This script is the Fancy
Magic. It takes each stack trace and strings it together into a call
graph. Each node in the graph represents a call site, and has a
"balance factor" which is the total number of AddRef()
operations that it has been included in minus the total number
of Release() operations that it has been included in. (I
told you it was Fancy Magic.)
So what does all that mean? The cool part -- you were waiting for the
cool part -- is that you can look at this graph and see what subtrees
are "balanced"; i.e., total number of AddRef()s equals
total number of Release()-es. You know you don't
need to worry about those trees because no evil leakage happened
there.
For trees that are out of balance, you need to dig a little bit
deeper. Subtrees get out of balance when one code path
AddRef()s the object, and a code path somewhere else does
the corresponding Release().
Like I said, it's not a panacea, but you can start to play Mah Jongg
with the out-of-balance trees, proving to yourself in each case that
the AddRef() from one tree matches with the
Release() in another. In short, it does a decent job of
directing you to the places you need to verify in your code.
details
Enabling Runtime Logging. You need to set a couple of runtime
environment variable to produce output.
setenv XPCOM_MEM_REFCNT_LOG log-file.dat
setenv XPCOM_MEM_LOG_CLASSES MyLeakyObjectImpl
setenv XPCOM_MEM_LOG_OBJECTS MyLeakyObjectSerialNumber (optional)
for Windows
set XPCOM_MEM_REFCNT_LOG=log-file.dat
set XPCOM_MEM_LOG_CLASSES=MyLeakyObjectImpl
set XPCOM_MEM_LOG_OBJECTS=MyLeakyObjectSerialNumber (optional)
for Mac
TBD
(Note that case is important.) These variables are described
in more detail in the
Memory Tools
documentation.
Now when you run, you should see lots of information dumped to your
log-file.dat (which defaults to the console, if not
set). Specifically, each time an object is AddRef()-ed and
Release()-ed, several lines will get added to the file. So
make sure you have plenty of disk space.
Postprocessing Step 1: Finding the Leakers. First you have to
figure out which objects leaked. There's a Perl script that does
this. It grovels through the log file, and figures out which objects
got allocated (it knows because they were just allocated because they
got AddRef()-ed and their refcount became 1). It adds
them to a list. When it finds an object that got freed (it knows
because its refcount goes to 0), it removes it from the
list. Anything left over is leaked.
The script is called
find-leakers.pl.
So, depending on your platform, do something like:
% perl -w find-leakers.pl my-leaks.log
(Replace my-leaks.log with your logfile.) This will print out
a list of pointers:
0x00253ab0 (1)
0x00253ae0 (2)
0x00253bd0 (4)
The number in parenthesis is the order in which it was allocated, if
you care. Pick one for use with Step 2.
Postprocessing Step 2: Building the Balance Tree. Now that
you've picked an object that leaked, you can build a "balance tree"
(anyone who can think of a better name feel free to let me know). This
process takes all the stack AddRef() and Release()
stack traces and munges them into a call graph. Each node in the graph
represents a call site. Each call site has a "balance factor", which
is positive if more AddRef()s than Release()-es have
happened at the site, zero if the number of AddRef()s and
Release()-es are equal, and negative if more
Release()-es than AddRef()s have happened at the
site.
To build the balance tree, run
make-tree.pl; e.g.,
% perl -w make-tree.pl --object 0x00253ab0 < my-leak.log
Note that you specify the object that you want make-tree.pl
to examine. This will build an indented tree that looks something like
this (except probably a lot larger and leafier):
.root: bal=1
main: bal=1
DoSomethingWithFooAndReturnItToo: bal=2
NS_NewFoo: bal=1
Let's pretend in our toy example that NS_NewFoo() is a
factory method that makes a new foo and returns
it. DoSomethingWithFooAndReturnItToo() is a method that
munges the foo before returning it to main(), the main
program.
What this little tree is telling you is that you leak one
refcount overall on object 0x00253ab0. But, more
specifically, it shows you that:
-
NS_NewFoo() "leaks" a refcount. This is probably "okay"
because it's a factory method that creates an AddRef()-ed
object.
-
DoSomethingWithFooAndReturnItToo() leaks two
refcounts. Hmm...this probably isn't okay, especially because...
-
main() is back down to leaking one refcount.
So from this, we can deduce that main() is correctly
releasing the refcount that it got on the object returned from
DoSomethingWithFooAndReturnItToo(), so the leak must be
somewhere in that function.
So now say we go fix the leak in
DoSomethingWithFooAndReturnItToo(), re-run our trace, grovel
through the log "by hand" to find the object that corresponds to
0x00253ab0 in the new run, and run
make-tree.pl. What we'd hope to see is a tree that looks
like:
.root: bal=0
main: bal=0
DoSomethingWithFooAndReturnItToo: bal=1
NS_NewFoo: bal=1
That is, NS_NewFoo() "leaks" a single reference count; this
leak is "inherited" by DoSomethingWithFooAndReturnItToo();
but is finally balanced by a Release() in main().
hints
Clearly, this is an iterative and analytical process. Maybe somebody
smarter than me can figure out ways to automate parts of it. To date,
I've figured out some tricks.
Ignoring balanced trees. The make-tree.pl script
accepts an option --ignore-balanced, which tells it
not to bother printing out the children of a node whose balance
factor is zero. This can help remove some of the clutter from an
otherwise noisy tree.
Playing Mah Jongg. An unbalanced tree is not necessarily an
evil thing. More likely, it indicates that one AddRef() is
cancelled by another Release() somewhere else in the code. So
the game is to try to match them with one another.
Excluding Functions To aid in this process, you can create an
"excludes file", that lists the name of functions that you want to
exclude from the tree building process (presumably because you've
matched them). make-tree.pl accepts the option --exlude
[file], where [file] is a newline-separated list of
function names that will be excluded from consideration while
building the tree. Specifically, any call stack that contains that
call site will not contribute to the computation of balance factors in
the tree.
pricing & availability
As of this writing, the stack tracing code is implemented for Win32
and i386 Linux (compiled with egcs and glibc 2.0 and
2.1). Dontations gladly accepted; Bourbon preferred over other
currencies.
The Perl scripts, of course, require only Larry Wall's finest (5.00504
seems to work for me).
find-leakers.pl
make-tree.pl
credits
I stole the stack walking code from
Kipp Hickman
and Matt Pietrek (see
this article).
For Linux,
Mike Shaver,
Bruce Mitchener, and
Ramiro Estrugo.
all helped me get things right. Mucho gusto.
Waldemar Horwat and
Jim Roskind
helped to improve the post-processing scripts.
$Id: refcnt-balancer.html,v 1.7 1999/11/16 02:14:49 waterson%netscape.com Exp $
|