Faster DBIC Schemata

Last week I did a talk for the Milwaukee Perl Mongers and this week I did it again for the Los Angeles Perl Mongers. I will do it one more time for DFW Perl Mongers soon, hopefully, if only to get the best version recorded.

In the talk somehow near the end I discussed with Steve Nolte how to make a large DBIC schema faster. The best way to solve this is to make Core DBIC lazy load it’s results, but that has historically been hard to get working. I hear that work is under way again, which is good, but I figured it wouldn’t hurt to share how I’d resolved this issue at MTSI.

Manual Lazy Loading

The first thing is to not load results if you can avoid it. This makes the most sense in the context of CLI (ie tests) and CGI. I don’t have the code in front of me, but the idea is this: you have some tables that are autogenerated or something like that, so they tend to have lots of columns and few relationships. You can store them in their own namespace and load them when they are accessed.

So for example, a typical DBIC schema namespace might look like MyApp::Schema::Result:: and then for stuff that is generated you make a MyApp::Schema::AuxResult::. In our setup I think we prefixed all of the results with Rpt as well, since they were reporting tables, but that’s not actually that important. So then, after you have a structure like that, you can make your schema load those auxiliary results on demand:

 package MyApp::Schema;

 ...


 sub source ($self, $moniker) {
    if ($moniker =~ m/^Rpt/ && grep { $moniker ne $_ } $self->sources) {
       $self->register_class($moniker, "MyApp::Schema::AuxResult::$moniker")
    }
    return $self->next::method($moniker)
 }

Again, I don’t have the code in front of me, but I’m pretty sure the above works. Feel free to mention in the comments if there are better ways.

(Note from the future, I already blogged about this many years ago!)

Skip Column Generation

Interestingly, for “wide” results (ie that have a lot of columns) a non-trivial amount of time is spent in creating accessors for all of the columns. This may have gotten better since I used this trick to resolve the problem. Also, again, I don’t have the code handy but something like the following is what we did:

 package MyApp::Schema::Result::WideTable;

 ...

 our $AUTOLOAD;
 sub AUTOLOAD ($self, @args = ()) {
   my $accessor = $AUTOLOAD =~ s/.*:://r;
   my %data = $self->get_columns;

   if (exists $data{$accessor}) {
      if (@args) {
         $self->set_column($accessor, $args[0])
      } else {
         return $data{$accessor}
      }
   } else {
      die "no such column!"
   }
 }

There are almost surely bugs in the above code, especially since I wrote it “blind,” but again, if you try this and figure out something that works well please comment and I’ll update the post appropriately.

Inherit from a Solid Base Class

This is advice from so far back in my DBIC career that it may actually be superstition, but the idea goes like this: it used to be common for results to be defined along the lines of:

 package MyApp::Schema::Result::Foo;

 use base 'DBIx::Class';

 __PACKAGE__->load_components(qw(DBIx::Class::Core));

I actually think that it was often longer, like including ::PK::Auto, but it’s been too long and I don’t feel like looking at my old old git repos. So this is not as efficient as just use base 'DBIx::Class::Core'. Additionally, if you have commonly used helpers like maybe ::TimeStamp, those should go in a base class and all your classes should extend that base class. While it is undoubtedly nice to have all extensions directly in the result that is using them, like you are supposed to do with Moose roles, that has a real cost, especially for a large schema.

So I hope that helps those of you out there with a giant schema. Sadly nothing will work perfectly in all situations, but the above really helped us. If any of you know other tricks to speed up the loading of a large schema please post info!

Posted Fri, Mar 27, 2015