fREWdiculous!
6 Aug
The project I am working on right now is rewriting a large, mostly CRUD application. The current app (second generation) is all VB6 and Stored Procedures. We are making the app entirely web based with DBIx::Class for the brunt of the backend and ExtJS for the UI. There are a few other technologies involved, but they should remain fairly light and unobtrusive.
As we’ve designed our code I’ve made an effort to only look at the inputs and outputs of the original code, to avoid using any existing design mistakes that have already been made. Generally this methodology works well. But sometimes the very format of the input/output leads me astray. Here’s an example that I encountered today.
Our customer typically uses composite primary keys to allow for public facing id’s. That makes sense. Typically serial numbers follow actual reason and composite primary keys work for that use case. In some places these keys are also the natural ordering for a set of items. For example, the company will have a list of operations that were performed to fix a part. Those operations are listed in order (for obvious reasons) and that order must be maintained. The id makes sense initially. Yet sometimes people need to change the order of some of these things. The most specific part of the composite pk always starts at one and increments by one. So when the user reorders the items in the list we ddo something like this (from memory):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | method set_id ($new_id) { my $old_id = $self->id; my $siblings_to_increment = $self->siblings->search({ id => { '<' => $old_id, '>=' => $new_id, }, }); my $siblings_to_decrement = $self->siblings->search({ id => { '>' => $old_id, '<=' => $new_id, }, }); $self->result_source->schema->txn_do(sub { $self->update({ id => 0 }); $siblings_to_increment->update({ id => \'id + 1' }); $siblings_to_decrement->update({ id => \'id - 1' }); $self->update({ id => $new_id }); }); } |
Works great!
But today I was considering what this would be like if I were to remove the composite pk’s from the system. How would I order the items? I would no longer change the id because they would be completely unique and reordering would be a big hassle. Solution? Real numbers! If you have 1 and 2 and you want 3 to be between them, you set it to 2.5! or if you want to displace 2 with 5 you set 2 to 1.5 (or 2.5) and just set 5 to 2. Of course you’d need some code to find the midpoint ((x+y)/2). But that’s no big deal.
Now, I’d like to think that I would have come up with that (simpler) solution originally if I hadn’t already assumed the database format that we have. Although the first form was fun to come up with it is inferior to this. Id’s should really be forever, and ordering shouldn’t change the id of a thing. Anyway, keep that in mind when you do your rewriting and reverse engineering. Think of the data as it would be if you’d made it originally, and simpler solutions may come to you out of the blue.
3 Responses for "On Rewrites, or Why One Should Read as Little Code as Possible"
Two thoughts:
1) I’ve found that using anything other than a database-generated serial number for a primary key always leads to sadness in the end. Even if the data model has something that looks like a primary key, some day they’ll want to change the value of that key, and you’ll want to update the row without updating every other thing in the db that points to it. Or maybe they’ll want to have a duplicate in the db for some absurd reason. Or maybe they’ll stop using that serial number.
2) I went through all kinds of insanity trying to come up with a good ordering scheme for the taxonomy on Birdstack. I was using ints, and if I ever wanted to insert something into the existing order, I’d have to renumber everything (very expensive). So, I started doing the numbering with big gaps between everything, but then things got really hard when I filled up the gaps. I tried to make intelligent redistribution algorithms and make guesses about where to place new items in the gaps so I wouldn’t run out of room as fast, but the code turned into a total nightmare.
My eventual solution was to layer a singly-linked list on top of the database. So, every record has an id that points to the previous record in the list. I can do all my reordering very easily and quickly, and then when I’m finished, I just regenerate the order id as an int with no gaps or anything. This has made taxonomy management hugely more efficient than anything else I’ve tried.
The real number scheme sounds nice, but I’d be worried about loss of precision with those floating point numbers. If you want something between 2 and 3, it’s 2.5. But then, 2.75, 2.875, 2.9375, 2.96875, etc. It doesn’t take too many reordering operations before roundoff errors become a real concern (depending, of course, on how your db and your app store these things). It’s essentially the same as my integers with gaps solution: you’ll eventually have to reorder.
But, maybe your problem is different enough that it doesn’t matter. Let me know what you think!
@Fjord: as for your number 1 issue, I agree in full. The serial number as composite key issue is what got me thinking about this, because it’s one of the things I would do differently if I were to do this project over again: no more composite pk’s and serial numbers should be allowed to be arbitrary (yet unique).
I see that your solution is more efficient when it comes to updates, and that my issue ultimately is very similar to yours where you just put large gaps between initial lists. I think that we ultimately may have different use-cases though, so our implementations need to differ.
I do an order by. I will typically have lists of around 50 items, and the ultimate regression would be if a user were to move the first item to the second to last, 50+ times. Users are fully allowed to do this but it is so rare that I think “optimizing” for this case would be a poor use of my time and the computer’s time.
Your dataset is much larger and theoretically could get too dense too quickly. Hence the linked list. So you either select all of your items into memory and rebuild the linked list, or select them one at a time based on “next_id”, to get them in order. (Is that correct? It’s the only way I can imagine doing it.)
So, when I get the chance I plan on doing the floating point implementation to see how much work it is. I imagine it will be the same amount of SQL statements (4) but they will change a maximum of three items, instead of a maximum of n + 1 where n is the size of the list.
Yeah, I just did a select one at a time based on prev_id. I suppose the big difference between our problems is that I tend to reorder hundreds of items in my list all at once every few months. So, I want to do a bunch of reorder operations and then do the renumbering, and I don’t mind if the renumbering takes a few minutes. But with yours, the renumbering happens interactively.
If you do go for the floating point, you might research where roundoff will happen and add a rule that says something like: if the distance between the two numbers I want to insert something between is < n, then just do a quick renumbering of everything. I don't know what your exact application is, but I can totally see a crazy user deciding to renumber everything by hand.
So I was walking down U Street in DC last night on my way to a Fiery Furnaces concert (awesome!) and noticed that someone decided to throw "9 1/2 St" in between "9th St" and "10th St." You been into city planning lately?
Leave a reply