Sam's Blog

Monkey-patching Benchmark.pm to auto-correct custom controls

Date: Thursday, 18 March 2010, 13:41.

Categories: perl, ironman, benchmarking, analysis, monkey-patching, hack, tutorial, advanced.

Part of the series: Coercing Modules Into Doing What You Want.

In "Advanced Benchmark Analysis I: Yet more white-space trimming", I mentioned that you could automatically take into account the cost of your control benchmark and eliminate it from the rest of your results.

This blog entry shows you how to monkey-patch Benchmark.pm to let you do just that.

Firstly, let me say that monkey-patching is evil. It's a dirty, nasty, dangerous hack that sooner-or-later will come back and bite you on the ass. Probably sooner.

However, sometimes it's a quick-n-dirty way to get your one-time script running, when you can't afford the time to Do It Right.

Just remember that one-time scripts have a habit of becoming permanent parts of your codebase, and when that happens you're just storing up trouble against a future date.

Dire warnings out of the way, let's move on.

In "Advanced Benchmark Analysis I: Yet more white-space trimming", we covered the need to write a control benchmark to establish the overhead of your benchmark setup, we then established a simple rule-of-thumb to determine if the effect of the overhead was significant, and merrily continued. We were lucky enough in that example that the overhead had minimal impact: it either had insignificant impact on the timings, or it had impact on insignificant timings, and we could safely dismiss it.

But what if it had a significant impact on all our timings?

Would we want to jump through mental hoops to translate all those figures, with the risk of human error for each one?

Not really, we'd want to automatically remove the impact of the control function, and just return the already-compensated results, then we could just analyse those directly without all the mental contortions.

As it happens, Benchmark.pm already subtracts the cost of a control function when it produces your benchmark timings, so let's take a look at the relevant code from Benchmark.pm 1.11, the version that came with perl 5.10.1.

Code:
sub timeit {
    my($n, $code) = @_;
    my($wn, $wc, $wd);

    die usage unless defined $code and
                     (!ref $code or ref $code eq 'CODE');

    printf STDERR "timeit $n $code\n" if $Debug;
    my $cache_key = $n . ( ref( $code ) ? 'c' : 's' );
    if ($Do_Cache && exists $Cache{$cache_key} ) {
        $wn = $Cache{$cache_key};
    } else {
        $wn = &runloop($n, ref( $code ) ? sub { } : '' );
        # Can't let our baseline have any iterations, or they get subtracted
        # out of the result.
        $wn->[5] = 0;
        $Cache{$cache_key} = $wn;
    }

    $wc = &runloop($n, $code);

    $wd = timediff($wc, $wn);
    timedebug("timeit: ",$wc);
    timedebug("      - ",$wn);
    timedebug("      = ",$wd);

    $wd;
}

The sub timeit() is called with the number of iterations $n and the code snippet $code, it returns the resulting timing data.

First thing it does, ignoring all the caching junk, is work out $wn, which is the timing for running the "null code", sub { }.

Next up it works out $wc, the timing of the supplied code.

Then it calculates $wd, the difference of the two timings, and returns that.

Or, to be brief, it's using the sub { } timing as a control, and removing its influence from the results.

This is just what we want, but there's one snag, we don't want the timing of sub { }, we want the timing of our own control function.

Unfortunately use of sub { } is hard-coded, there's no way to change it.

Or is there?

This is where we begin our slippery descent into monkey-patching.

The Right Way to fix this problem, is to write a patch for Benchmark.pm that lets you specify a custom control sub, then submit that patch to the maintainer, wait for them to apply it, test it, release it, and sit back and bask in the warm glow of having contributed to the Open Source community.

Benchmark.pm is a core module though, this complicates matters, it also means that the turnaround time isn't likely to happen on "While-U-Wait" timescales.

You want results now don't you?

Well, you could still write your patched version of Benchmark.pm, but that seems a bit much work for some instant gratification, and your fork will only get out of sync with any changes to the original module if you do that anyway...

So, we do a quick-n-dirty monkey-patch by inserting the following block of code before we run our benchmarks.

{
    no warnings 'redefine';
    no strict 'vars';

    package Benchmark;

    our $Control_Sub = sub { };
    our $Control_Str = '';

    sub timeit {
        my($n, $code) = @_;
        my($wn, $wc, $wd);

        die usage unless defined $code and
                         (!ref $code or ref $code eq 'CODE');

        printf STDERR "timeit $n $code\n" if $Debug;
        my $cache_key = $n . ( ref( $code ) ? 'c' : 's' );
        if ($Do_Cache && exists $Cache{$cache_key} ) {
            $wn = $Cache{$cache_key};
        } else {
            $wn = &runloop($n, ref( $code ) ? $Control_Sub : $Control_Str );
            # Can't let our baseline have any iterations, or they get subtracted
            # out of the result.
            $wn->[5] = 0;
            $Cache{$cache_key} = $wn;
        }

        $wc = &runloop($n, $code);

        $wd = timediff($wc, $wn);
        timedebug("timeit: ",$wc);
        timedebug("      - ",$wn);
        timedebug("      = ",$wd);

        $wd;
    }
}

What we're doing here is redefining someone else's function, by temporarily pretending we're inside their package, and overwriting it with our own code.

That code is a copy of their code, with just the hard-coded reference to the control subs replaced with a package global.

I'll repeat again that this is as nasty hack, here be dragons, proceed at your own risk, on your head be it, and other dire pronouncements.

Some of the more obvious drawbacks:

  • If the internals of their module change in a new version, we'll be changing the guts of their module to something that is quite likely to be incompatible, unless we also base our code on the new version.

  • If anything else in the codebase is attempting to monkey-patch the same code, we'll trash each other's changes. Possibly the last to apply its changes will "work", but it's just as likely that they'll only partly overwrite each other and much confusion will abound.

  • People will hate you for selfishly withholding your improved code, rather than donating it to the community at large as a patch.

Continuing, when we want to run our benchmarks we do something like:

$Benchmark::Control_Sub = sub { ... my control stuff goes here ... };
cmpthese( -10, \%benchmarks );

However, unlike our previous benchmarks, don't include the control function as one of the benchmarks anymore, you'll cause big problems if you do:

  • If you're doing a fixed number of iterations it'll give you huge floating-point values that will mess up the result table columns.

  • Worse, if you're doing a number of iterations based on CPU time, the CPU time limit is based on the corrected benchmark, which in the case of the control function will be as near to zero as statistical error allows. This will cause Benchmark.pm to try with ever larger numbers of iterations, to make the near-zero corrected time be over the CPU time limit, effectively an infinite loop.

These also hold true if the overhead is a disproportionately high portion (90+%) of any of the benchmarks.

Watch your benchmarks, if they seem to be taking much much longer than usual, Ctrl-C out and try with a fixed number of iterations.

OK, lets throw the bits of code together into an example benchmark script,

#!/usr/bin/perl -wT

use warnings;
use strict;

#  Attempt to protect ourselves from version changes.
use Benchmark 1.11;
die "Fatal error:\n" .
    "  This script was written with a monkey-patch for Benchmark.pm 1.11.\n" .
    "  It has been detected that you are using $Benchmark::VERSION.\n" .
    "  You should manually check and alter the monkey-patch and this " .
      "version check before proceeding.\n"
    if $Benchmark::VERSION ne '1.11';

{
    no warnings 'redefine';
    no strict 'vars';

    package Benchmark;

    our $Control_Sub = sub { };
    our $Control_Str = '';

    sub timeit {
        my($n, $code) = @_;
        my($wn, $wc, $wd);

        die usage unless defined $code and
                         (!ref $code or ref $code eq 'CODE');

        printf STDERR "timeit $n $code\n" if $Debug;
        my $cache_key = $n . ( ref( $code ) ? 'c' : 's' );
        if ($Do_Cache && exists $Cache{$cache_key} ) {
            $wn = $Cache{$cache_key};
        } else {
            $wn = &runloop($n, ref( $code ) ? $Control_Sub : $Control_Str );
            # Can't let our baseline have any iterations, or they get subtracted
            # out of the result.
            $wn->[5] = 0;
            $Cache{$cache_key} = $wn;
        }

        $wc = &runloop($n, $code);

        $wd = timediff($wc, $wn);
        timedebug("timeit: ",$wc);
        timedebug("      - ",$wn);
        timedebug("      = ",$wd);

        $wd;
    }
}

sub setup
{
    #  We're pretending here that we're doing something vital
    #  to setting up the benchmark, in reality we're just wasting
    #  as much time as for benchmark ONE to do its "real work".
    for( my $i = 0; $i < 100_000; $i++ )
    {
    }
}

my %h = (
    ONE => sub { setup(); for( my $i = 0; $i < 100_000; $i++ ) { } },
    TWO => sub { setup(); for( my $i = 0; $i < 200_000; $i++ ) { } },
    );

print "Testing with default control.\n";
{
    local $Benchmark::Control_Sub = sub { };
    Benchmark::cmpthese( -5, \%h );
}

print "\nTesting with our custom control.\n";
{
    local $Benchmark::Control_Sub = sub { setup(); };
    Benchmark::cmpthese( -5, \%h );
}

This is a highly-contrived benchmark of the subs ONE and TWO. TWO does twice the work of ONE in a for-loop, and each calls setup() to simulate setup overhead for each iteration of the benchmark sub.

Unfortunately for the quality of their results, setup() takes as much time to run as the entire of the rest of ONE.

We run two benchmarks, one with the default control function and one with a control function that calls setup() to gauge its cost.

We set the control function with local so that we're only setting it for that block of code, we don't want to alter the control function for anyone else outside that block.

Now let's run the script.

Output:
Testing with default control. Rate TWO ONE TWO 20.8/s -- -33% ONE 31.2/s 50% -- Testing with our custom control. Rate TWO ONE TWO 31.0/s -- -50% ONE 61.6/s 99% --

As we can see, with the default Benchmark.pm control function, we end up with ONE only having a 50% performance advantage over TWO.

This is a perfectly valid result in one sense, it does report the relative times to call the two subs, but we know that what we're really after is the relative performance with the setup() cost stripped out.

If we look at the second set of results, with our custom control, we can see that ONE is now reporting a much-healthier 99% performance advantage over TWO, we'd expect from our contrived example for ONE to have a 100% performance advantage, so 99% is close enough.

We've successfully stripped out our overhead cost.

So, let's recap:

  • Benchmark.pm almost does what we want, but does't let us define a custom control function.

  • Submitting a patch to allow this would be the right thing to do.

  • Sometimes we don't have time, inclination or ability to produce a patch of sufficient quality to be accepted: "good enough" to work for your situation is seldom "good enough" for the general case.

  • As a responsible adult, in full awareness of the short-comings and dangers of the technique, a quick-n-dirty monkey-patch lets us get our results and move on.

  • We take the code we want from Benchmark.pm and modify it, and shove it back into Benchmark.pm by placing it within a block set to the same package.

  • We try to safeguard ourselves from version changes to the underlying code, so we don't run our one-shot code after its sell-by date. (And long afer we've forgotton it was supposed to be one-shot.)

  • We use local scoped variables to set the control functions, to minimise the chances of polluting other benchmark calls.

Now the alert among you might be thinking that this all seems a bit unneccessary, can't we just massage the returned results by writing a wrapper around cmpthese() and avoid the whole mess of monkey-patching entirely?

To some degree we can and we can't.

Benchmark.pm's timethese() could be wrapped, to compensate for a control, modifying the returned results. Unfortunately, you can't tell cmpthese() to use your new timethese(), so you'd need to wrap that too.

In fact you'd possibly need to wrap the entire public API of Benchmark.pm to try to convince it to use the corrected version of timethese(), and you'd need to modify the calling code to use your wrapper module.

It's a sad fact that wrapping a procedural module is a lot more work than wrapping an OO module, because you can't subclass it.

It might not even be possible in some cases.

In our specific case, it is possible, and we'd actually only need to wrap two functions, because we're only using one part of the API - if the rest isn't supported, we don't care because we're not using it.

Monkey-patching, on the other hand, allows the entire API to continue working as before, via the same Benchmark.pm package, rather than needing to refactor code to point at a wrapper module.

As with most things, you need to balance the trade-offs and dangers to fit your needs.

Now, if only I could rewrite my patch to remove the danger of the infinite-loop, I might be able to submit it as a proper patch... oh well, a project for a rainy day.

This blog entry is part of the series: Coercing Modules Into Doing What You Want.

  1. Monkey-patching Benchmark.pm to auto-correct custom controls
  2. Wrapping Benchmark.pm to auto-correct custom controls

Browse Sam's Blog Subscribe to Sam's Blog

By day of March: 03, 05, 09, 10, 18, 27, 31.

By month of 2010: March, April, May, June, July, August, September, November.

By year: 2010, 2011, 2012, 2013.

Or by: category or series.

Comments

blog comments powered by Disqus
© 2009-2013 Sam Graham, unless otherwise noted. All rights reserved.