A new Join Prune in DBIx::Class

At work a coworker and I recently went on a rampage cleaning up our git branches. Part of that means I need to clean up my own small pile of unmerged work. One of those branches is an unmerged change to our subclass of the DBIx::Class Storage Layer to add a new kind of join prune.

If you didn’t know, good databases can avoid doing joins at all by looking at the query and seeing where (or if) the joined in table was used at all. DBIx::Class does the same thing, for databases that do not have such tooling built in. In fact there was a time when it could prune certain kinds of joins that even the lauded PostgreSQL could not. That may no longer be the case though.

The rest of what follows in this blog post is a very slightly tidied up commit message of the original branch. Enjoy!

Recently Craig Glendenning found a query in the ZR codebase that was using significant resources; the main problem was that it included a relationship but didn’t need to. We fixed the query, but I was confused because DBIx::Class has a built in join pruner and I expected it to have transparently solved this issue.

It turns out we found a new case where the join pruner can apply!

If you have a query that matches all of the following conditions:

  • a relationship is joined with a LEFT JOIN
  • that relationship is not in the WHERE
  • that relationship is not in the SELECT
  • the query is limited to one row

You can remove the matching relationship. The WHERE and SELECT conditions should be obvious: if a relationship is used in the WHERE clause, you need it to be joined for the WHERE clause to be able to match against the column. Similarly, for the SELECT clause the relationship must be included so that the column can actually be referenced in the SELECT clause.

The one row and LEFT JOIN conditions are more subtle; but basically consider this case:

You have a query with a limit of 2 and you join in a relationship that has zero or more related rows. If you get back zero rows for all of the relationships, the root table will basically be returned and you’ll just get the first two rows from that table. But consider if you got back two related rows for each row in the root table: you would only get back the first row from the root table.

Similarly, the reason that LEFT is specified is that if it were a standard INNER JOIN, the relationship will filter the root table based on relationship.

If you specify a single row, when a relationship is LEFT it is not filtering the root table, and the “exploding” nature of relationships does not apply, so you will always get the same row.

I’ve pushed the change that adds the new join prune to GitHub, and notified the current maintainer of DBIx::Class in the hopes that it can get merged in for everyone to enjoy.

Posted Fri, Apr 29, 2016

Receive Blog Posts in Your Email