Sam's Blog

< The Science of Naming | Index | Text::Matrix.pm Released >

Regression Benchmarks with Template::Benchmark

Date: Wednesday, 7 April 2010, 11:04.

Categories: perl, ironman, template-benchmark, benchmarking, regression, qa, development-environment.

As part of my development environment for Template::Sandbox, I maintain a suite of regression benchmarks, using Template::Benchmark against all previous versions of the distribution.

While a crude tool, it's something I find useful and thought I'd use this week's column to share how I automated as much pain away as I could.

First of all, Adam Kennedy makes a good argument for the use of regression benchmarking, I suggest you go read that, because he sums up nearly everything I'd say if I was going to explain it here.

My setup is fairly simple, I have a directory full of files, one for each benchmark run, containing a JSON data-structure with the benchmark details inside:

$ ls ~/projects/Template-Sandbox/benchmarks/
Template-Sandbox-1.00-full.json         Template-Sandbox-1.01_07-full.json
Template-Sandbox-1.00-standard.json     Template-Sandbox-1.01_07-standard.json
Template-Sandbox-1.00_01-full.json      Template-Sandbox-1.01_08-full.json
Template-Sandbox-1.00_01-standard.json  Template-Sandbox-1.01_08-standard.json
Template-Sandbox-1.00_02-full.json      Template-Sandbox-1.01_09-full.json
Template-Sandbox-1.00_02-standard.json  Template-Sandbox-1.01_09-standard.json
Template-Sandbox-1.00_03-full.json      Template-Sandbox-1.01_10-full.json
Template-Sandbox-1.00_03-standard.json  Template-Sandbox-1.01_10-standard.json
Template-Sandbox-1.01-full.json         Template-Sandbox-1.01_11-full.json
Template-Sandbox-1.01-standard.json     Template-Sandbox-1.01_11-standard.json
Template-Sandbox-1.01_01-full.json      Template-Sandbox-1.02-full.json
Template-Sandbox-1.01_01-standard.json  Template-Sandbox-1.02-standard.json
Template-Sandbox-1.01_02-full.json      Template-Sandbox-1.02_01-full.json
Template-Sandbox-1.01_02-standard.json  Template-Sandbox-1.02_01-standard.json
Template-Sandbox-1.01_03-full.json      Template-Sandbox-1.02_02-full.json
Template-Sandbox-1.01_03-standard.json  Template-Sandbox-1.02_02-standard.json
Template-Sandbox-1.01_04-full.json      Template-Sandbox-1.03-full.json
Template-Sandbox-1.01_04-standard.json  Template-Sandbox-1.03-standard.json
Template-Sandbox-1.01_05-full.json      Template-Sandbox-backdev-full.json
Template-Sandbox-1.01_05-standard.json  Template-Sandbox-backdev-standard.json
Template-Sandbox-1.01_06-full.json      Template-Sandbox-dev-full.json
Template-Sandbox-1.01_06-standard.json  Template-Sandbox-dev-standard.json

The naming convention should be fairly obvious, "dev" refers to my active development copy, "backdev" is the previous dev benchmark. It's oddly named so that it sorts alphabetically before dev. Yes I was too lazy to Do It Right. Sue me.

"Full" versions include all supported template features (ie, fancy syntax stuff: hash loops, expressions, functions, and so on), whereas "standard" is just the default "lowest common denominator" features enabled by Template::Benchmark (ie, token replacement, array loops and the other basic features implemented by nearly all template engines).

I generate this spew of files with a ts_full_regression_benchmark script:

$ ts_full_regression_benchmark
Running benchmarks for Template-Sandbox-1.00.
  Skipping: /home/illusori/projects/Template-Sandbox/benchmarks/Template-Sandbox-1.00-standard.json exists.
  Skipping: /home/illusori/projects/Template-Sandbox/benchmarks/Template-Sandbox-1.00-full.json exists.
Running benchmarks for Template-Sandbox-1.00_01.
  Skipping: /home/illusori/projects/Template-Sandbox/benchmarks/Template-Sandbox-1.00_01-standard.json exists.
  Skipping: /home/illusori/projects/Template-Sandbox/benchmarks/Template-Sandbox-1.00_01-full.json exists.
Running benchmarks for Template-Sandbox-1.00_02.
  Skipping: /home/illusori/projects/Template-Sandbox/benchmarks/Template-Sandbox-1.00_02-standard.json exists.
  Skipping: /home/illusori/projects/Template-Sandbox/benchmarks/Template-Sandbox-1.00_02-full.json exists.

... much MUCH more of this ...

Running benchmarks for Template-Sandbox-1.02_02.
  Skipping: /home/illusori/projects/Template-Sandbox/benchmarks/Template-Sandbox-1.02_02-standard.json exists.
  Skipping: /home/illusori/projects/Template-Sandbox/benchmarks/Template-Sandbox-1.02_02-full.json exists.
Running benchmarks for Template-Sandbox-1.03.
  Skipping: /home/illusori/projects/Template-Sandbox/benchmarks/Template-Sandbox-1.03-standard.json exists.
  Skipping: /home/illusori/projects/Template-Sandbox/benchmarks/Template-Sandbox-1.03-full.json exists.
Running benchmarks for devel version.

As you can see, it skips generation of any of the "proper" distribution benchmarks if they already exist, but always runs a new benchmark for the dev copy (renaming the old one to backdev).

This means I only take time to run the benchmarks I actually need, and if for some reason I want to regenerate the lot, it's as easy as an rm *.json.

The ts_full_regression_benchmark script to generate the files:

#!/bin/sh

PROJECTS='/home/illusori/projects'

BENCHMARK_DIR="${PROJECTS}/Template-Sandbox/benchmarks"
BENCHMARK_SCRIPT="${PROJECTS}/Template-Benchmark/src/script/benchmark_template_engines"

BENCHMARK_DURATION="-d 60"
BENCHMARK_TYPES="--notypes --uncached_string --memory_cache --instance_reuse"
BENCHMARK_PLUGINS="--onlyplugin TemplateSandbox"

COMMON_SWITCHES="--json $BENCHMARK_DURATION $BENCHMARK_TYPES $BENCHMARK_PLUGINS"

mkdir -p /tmp/ts_full_regression

for dist_file in ${PROJECTS}/released/Template-Sandbox*.tar.gz;
do
    tar -xzf $dist_file -C /tmp/ts_full_regression
done

for dist_dir in /tmp/ts_full_regression/*;
do
    dist_name=`basename $dist_dir`
    echo "Running benchmarks for $dist_name."
    out="${BENCHMARK_DIR}/${dist_name}-standard.json"
    if [ -s "$out" ]; then
       echo "  Skipping: $out exists."
    else
       $BENCHMARK_SCRIPT $COMMON_SWITCHES -I $dist_dir/lib >$out
    fi
    out="${BENCHMARK_DIR}/${dist_name}-full.json"
    if [ -s $out ]; then
       echo "  Skipping: $out exists."
    else
       $BENCHMARK_SCRIPT $COMMON_SWITCHES --allfeatures -I $dist_dir/lib >$out
    fi
done

rm -rf /tmp/ts_full_regression

#  Always overwrite the devel benchmarks.
echo "Running benchmarks for devel version."
out="${BENCHMARK_DIR}/Template-Sandbox-dev-standard.json"
if [ -s $out ]; then
   mv -f "$out" "${BENCHMARK_DIR}/Template-Sandbox-backdev-standard.json"
fi
$BENCHMARK_SCRIPT $COMMON_SWITCHES -I $PROJECTS/Template-Sandbox/src/lib >$out
out="${BENCHMARK_DIR}/Template-Sandbox-dev-full.json"
if [ -s $out ]; then
   mv -f "$out" "${BENCHMARK_DIR}/Template-Sandbox-backdev-full.json"
fi
$BENCHMARK_SCRIPT $COMMON_SWITCHES --allfeatures -I $PROJECTS/Template-Sandbox/src/lib >$out

Yes it's a shell script. Yes I know this is a perl blog. Get over it.

This script loops through the directory I keep my CPAN release tarballs in, looking for Template::Sandbox dists, and extracts them, runs benchmarks using the script provided with Template::Benchmark, which provides a handy JSON mode, and dumps the output into an appropriate filename.

Yep, it extracts the tarballs even for versions it's going to skip running benchmarks for, it's a quick-n-dirty hack.

Of course, having all that benchmark data in near-unreadable JSON format files isn't much good without a way to display them neatly, this is where ts_old_benchmarks comes in:

$ ts_old_benchmarks
1.00     full     uncached_string  4.46 memory_cache    13.50
1.00_01  full     uncached_string  4.42 memory_cache    13.70
1.00_02  full     uncached_string  4.50 memory_cache    13.50
1.00_03  full     uncached_string  4.44 memory_cache    13.50
1.01     full     uncached_string  4.49 memory_cache    13.70
1.01_01  full     uncached_string  4.53 memory_cache    13.50
1.01_02  full     uncached_string  3.54 memory_cache    13.60
1.01_03  full     uncached_string  3.53 memory_cache    13.50
1.01_04  full     uncached_string  3.54 memory_cache    16.00
1.01_05  full     uncached_string  3.56 memory_cache    16.30
1.01_06  full     uncached_string  3.64 memory_cache    16.40
1.01_07  full     uncached_string  4.19 memory_cache    19.50
1.01_08  full     uncached_string  4.20 memory_cache    19.60
1.01_09  full     uncached_string  4.18 memory_cache    19.60
1.01_10  full     uncached_string  4.16 memory_cache    19.50
1.01_11  full     uncached_string  4.20 memory_cache    19.70 instance_reuse  28.40
1.02     full     uncached_string  4.19 memory_cache    19.70 instance_reuse  28.40
1.02_01  full     uncached_string  4.11 memory_cache    19.70 instance_reuse  28.30
1.02_02  full     uncached_string  4.46 memory_cache    20.20 instance_reuse  29.70
1.03     full     uncached_string  4.47 memory_cache    20.10 instance_reuse  29.70
backdev  full     uncached_string  4.49 memory_cache    20.30 instance_reuse  30.00
dev      full     uncached_string  4.47 memory_cache    20.20 instance_reuse  30.00
1.00     standard uncached_string  2.75 memory_cache    42.20
1.00_01  standard uncached_string  2.68 memory_cache    42.10
1.00_02  standard uncached_string  2.72 memory_cache    41.00
1.00_03  standard uncached_string  2.66 memory_cache    42.10
1.01     standard uncached_string  2.66 memory_cache    40.90
1.01_01  standard uncached_string  2.50 memory_cache    41.00
1.01_02  standard uncached_string 13.20 memory_cache    40.70
1.01_03  standard uncached_string 13.20 memory_cache    41.50
1.01_04  standard uncached_string 13.30 memory_cache    52.20
1.01_05  standard uncached_string 13.30 memory_cache    52.10
1.01_06  standard uncached_string 13.70 memory_cache    53.00
1.01_07  standard uncached_string 14.60 memory_cache    62.10
1.01_08  standard uncached_string 14.60 memory_cache    62.10
1.01_09  standard uncached_string 14.70 memory_cache    61.70
1.01_10  standard uncached_string 14.50 memory_cache    61.60
1.01_11  standard uncached_string 14.50 memory_cache    62.00 instance_reuse  85.30
1.02     standard uncached_string 14.40 memory_cache    60.10 instance_reuse  83.30
1.02_01  standard uncached_string 14.20 memory_cache    61.70 instance_reuse  86.30
1.02_02  standard uncached_string 15.10 memory_cache    62.60 instance_reuse  88.40
1.03     standard uncached_string 15.20 memory_cache    62.30 instance_reuse  88.60
backdev  standard uncached_string 15.20 memory_cache    62.80 instance_reuse  87.70
dev      standard uncached_string 15.10 memory_cache    62.10 instance_reuse  88.60

ts_old_benchmarks kinda sucks for a name, but I think I've mentioned before that this is a quick-n-dirty hack.

"instance_reuse" benchmarks only show up late, because previous versions of Template::Sandbox had undefined behaviour (ie, they probably broke horribly) if you reused an instance.

Please don't ask how performance increased five-fold between versions 1.01 and 1.01_01, it's deeply embarrassing.

And here's the script to generate that output:

#!/usr/bin/perl -w

use strict;
use warnings;

use JSON::Any;
use File::Slurp;

my $archive_dir = '/home/illusori/projects/Template-Sandbox/benchmarks';

my ( @entries, $json, @full, @standard );

opendir( DIR, "$archive_dir" ) or
    die "Unable to opendir '$archive_dir': $!";
@entries = grep /^Template-Sandbox.*\.json$/, readdir( DIR );
closedir( DIR );

$json = JSON::Any->new();

@full = @standard = ();
foreach my $file ( sort( @entries ) )
{
    my ( $content, $result, $line, $name, $type );

    eval { $content = read_file( $archive_dir . '/' . $file ); };
    if( $@ )
    {
        warn "Unable to read $archive_dir/$file: $@";
        next;
    }
        
    next unless $content;

    eval { $result = $json->decode( $content ); };
    if( $@ or not $result )
    {
        warn "Unable to decode content of $file: " . ( $@ || 'empty result' );
        next;
    }

    ( $name, $type ) = $file =~ /^Template\-Sandbox\-(.*)\-(.*)\.json$/;
    $line = sprintf( '%-8s %-8s', $name, $type );
    foreach my $benchmark ( @{$result->{ benchmarks }} )
    {
        my ( $timing );

        next unless @{$benchmark->{ comparison }} > 1;
        #  TODO: grab from timings when working.

        $timing = $benchmark->{ comparison }->[ 1 ]->[ 1 ];
        $timing =~ s/\/s$//;
        $line .= sprintf( ' %-15s %5.2f', $benchmark->{ type },
            $timing );
    }
    if( $type eq 'full' )
    {
        push @full, "$line\n";
    }
    else
    {
        push @standard, "$line\n";
    }
}

print @full, @standard;

Yes, this one's perl. Yes, I know the other was a shell script. Sue me. Again.

This script mangles out the timing data from the human-readable comparison chart, it should really pull it from the timings section of the data-structure, but because allow_blessed seems to be inconsistently supported by JSON back-ends even with JSON::Any, a bunch of my old benchmarks have big blanks there.

Version 0.99_11 of Template::Benchmark, which should be hitting a mirror near you about the time this column is published, fixes this (or hacks around it anyway), but I've not recreated my benchmark "database" yet.

That's basically it.

When I'm done working on a revision, I run ts_full_regression_benchmark just after I've checked my ./Build test, then I compare the results with ts_old_benchmarks to make sure I've not caused a hideous performance regression.

Because it only needs to run the benchmarks for the current version, it lets me run a longer benchmark and get a less variable result.

I should point out that, like a test suite, this only partially helps you in fixing any problems that occur. It does however, like a test suite, provide a mostly-automated way to see whether something has gone wrong or not.

Now there's definite scope for improvement here, some items on my "when I get around to it" list:

Build benchmarks for each template feature option individually, to specifically spot regressions in each area of functionality.
Command-line tool to just present the recent most-relevant results rather than the whole spam.
Run with nightly build/regression tests.
Pretty HTML reports, with GOOD/BAD colour coding so I don't need to engage brain to read them.
I'm running the perl code from the dists directly from their source dir, I really ought to run a build first and run from the build dir. Thankfully Template::Sandbox is pure-perl so it still works, but if I wanted to run this on other template engines I'd have problems with the XS code probably.
Bundling all that together and there's the glimmerings of some manner of potentially-useful community website similar to www.cpantesters.org (on a rather more modest scale).
World Domination.

I strongly doubt I'll get further than pretty HTML reports, for me this is a useful tool but still just a means to an end rather than an end in itself, but maybe someone out there has had a light bulb go off above their head while reading this.

Sam's Amazing Website

Sam's Blog

Regression Benchmarks with Template::Benchmark

Date: Wednesday, 7 April 2010, 11:04.

Categories: perl, ironman, template-benchmark, benchmarking, regression, qa, development-environment.

Browse Sam's Blog

By day of April: 07, 14, 22, 25, 28.

By month of 2010: March, April, May, June, July, August, September, November.

By year: 2010, 2011, 2012, 2013.

Or by: category or series.

Comments