fREWdiculous!
22 Jul
I am just getting through chapter four of the Catalyst book and there are already a whole lot of things worth mentioning. My internet is currently at 50% packet loss because our wifi router is busted so this is pretty painful for me. So we’ll keep it short.
The book has a nice (very short) introduction to Moose. Not only is this good because Catalyst is now based on Moose, but also I would say you probably want your OO code to be based on Moose. There are times when you probably don’t want to use Moose, but there are also times when you don’t want to use strict. As a rule, use Moose for OO code.
There is also a very good introduction to usage of CPAN. A lot of us think that CPAN is our programming platform. Knowing how to use it is extremely important. It includes not just finding stuff on CPAN, but also ascertaining the quality of those modules, and how to install them. Very good information for a perl programmer.
In chapter 4 mst discusses how he writes tests (which can be slightly supplemented with his latest blog post) and it’s actually quite helpful. Some people write tests after writing their code and run the risk of forgetting to test at all (that’s me!) Other people are all hardcore TDD and write tests first, but that assumes that they have already thought through the interface for what they are writing. mst posits that it’s better to write your code, and “test” it from test files as you go. And test in this case means warns, Data::Dumps, whatever. After it works how you expect, you then take those warns and whatnot and translate them into ok’s, is’s, and cmp_deeply’s. It’s really much nicer than the alternative: build it all and see if it works. Try it!
Lastly, I really like how they represent code as diffs instead of monolithic code. Writing large swaths of code doesn’t work that well in real life. It works much better to do tiny changes and make sure they still compile, do what you want, etc.
But the book certainly isn’t perfect! There are some weird code layout issues, (p34, 36, 39, etc) and I am pretty sure I saw at least one syntax error (END__ instead of __END__).
So far though, I would say that the book is better than most programming books. Really, a lot of programming books need to be more like this, instead of focusing entirely on the arcana of one framework they should help you be a better programmer overall.
17 Jul
Yesterday I asked “Module::Build? EU::MM?“. Turns out that was a false dichotomy! Almost everyone who responded to my post recommended Module::Install, which is cool since it’s what we use at work because of the Catalyst swap. We never had any kind of install method before
Although I would also point out that I hope that this choice is pointless for personal project, as I hope to use Dist::Zilla.
Have a nice weekend!
17 Jul
So this week, as previously alluded to, I convinced my boss to let me switch my current app from CGI::Application to Catalyst. I had gotten the book in the mail and I showed it to him to make the point that it’s a serious framework. Fortunately the switch has been mostly painless. The first reason being that our controller is pretty bare right now aside from validation, which took about a day to get entirely ironed out.
The interesting thing for me, most of all, is that I have gotten pretty good at writing regular expressions with vim to search and replace for CGIApp-isms to replace with Cat-isms.
Here are a list of some of the big ones:
Simple replacements:
1 2 | :%s/return/$c->stash->{json} = :%s/$self->query->Vars/$c->request->params |
More complex stuff
1 2 3 | :%s/$self->query->param(\(.\{-}\) = $c->request->params->{\1} :%s/method (.\{-}) : Runmode/method \1($c) : Local :%s/$self->schema->resultset(\s*'\(.\{-}\)'\s*) = $c->model('DB::\1') |
If you know anything about regular expressions you know that the \1 means the first back reference. Now, vim’s regex flavor is a little strange because it is optimized for searching for plain text, so *most* characters default to literals. That’s why I have to escape the parentheses to make a matching group. Also note the following unusual construct: .\{-} . That’s the same as .*? in Perl. That’s actually surprisingly important.
Anyway, this switch has been fairly fun and exciting. The best part being the inimitable structure of a Catalyst application. For example, the fact that we have a dev server with lots of affordances for (duh) developers other than little setup is great, and built in config file reading is something that I have always wanted. We always ended up rolling our own solution in other projects, but this is really supreme since it’s in one place and not just Perl code.
An there are lots of pleasant things like how it’s really easy for our app to have both JSON and TT support. This will be really good later on when we start to do pdf printouts and whatnot. Instead of adding methods for those things into the controller, like in CGIApp, we will just add another View module.
The main thing that has weirded my out so far is that in CGIApp the App is the controller. In Catalyst you have an App, which also seems to be an instance variable, with accessors for CGI parameters and whatnot, and you also have Controllers. Anyway, I need to wrap my head around all that. Hopefully reading through the book will help with some of these issues.
How about you? Are you still happy with CGIApp? Are you adventurous enough to use Reaction?
16 Jul
Some developers say to use ExtUtils::MakeMaker, some say to use Module::Build. MB is supposed to supplant EU::MM, but people complain that it’s too chatty. Thoughts? Hopes? Dreams? Inquiring minds want to know.
15 Jul
At work we have a certain customer who has a database with something like 250 report tables. They are generated and maintained purely in code and if you ever touch one manually it’s for a one-off script or something. Anyway, we recently started using DBIx::Class at work and part of that meant accessing those report tables with DBIC.
The first step was to use DBIx::Class::Schema::Loader, which looks at the table structure and generates a bunch of perl files. Then we just use DBIC as normal. Unfortunately this is in a CGI environment, without mod_perl, or FastCGI or any of that stuff. That means not only is this loading all 250 files (each 25~ K in size,) but also parsing them etc. Just to be clear, we have a 15 second startup time. Have fun telling your customer that that’s better in an AJAX context.
So that was just Not Okay. I asked in #dbix-class and robkinyon suggested that I make a YAML file that would represent all of the tables. He couldn’t give me code and it was Friday, but I did get my code to add columns on the fly, so it couldn’t be much harder to go from there, could it?
Of course it could! It always will in such a context. So I asked again, what would be the best way to generate in memory classes of a single data structure, in #dbix-class. castaway recommended subclassing DBIx::Class::Schema::Loader to do what I wanted. So that took a few hours to get to work, including figuring out how everything worked. That was really pretty exciting because it was a Good Way to do what I wanted. Too bad there are some Schema::Loader implementation issues.
Turns out that after making our full data structure it took longer to load the classes into memory than to leave them on the hard drive. I should have realized this would be the case, but for some reason I blocked it out: S::L works by writing temporary files and having perl include them, so really we were reading just as much data but also writing it too. At this point I have spent about 10 hours total on this project and it’s absurdly slow. My boss was not very happy. The irony was that I had used the initial success of the subclass of S::L for leverage in a certain bargain, which I hope to post about soon.
I spoke with ilmari, the person who wrote S::L and he was telling me how to make S::L do everything in memory, but I couldn’t get it to work and my boss (quite reasonably) was breathing down my neck.
So pure, unadulterated Black Magic it was. I would write all the code as a string and then include that with strange require tricks. I can’t take credit for this really, as I got a lot of help from people on StackOverflow. Anyway, here is how that could be done:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | #!/usr/bin/perl use strict; use warnings; use feature ':5.10'; my $data_struct = [{ table => 'Foo', columns => [qw{foobar foobaz}], },{ table => 'Rpt1', columns => [1..20], }]; $data_struct = [map { { table => "EPMS::Schema::Result::".$_->{table}, columns => $_->{columns} } } @{$data_struct} ]; my $tables = [ map { $_->{table} } @{$data_struct} ]; my $columns = { map { $_->{table} => $_->{columns} } @{$data_struct}}; foreach my $class (@{$tables}) { no strict 'refs'; *{ "${class}::INC" } = sub { my ($self, $req) = @_; return unless $req eq $class; my $data = qq! package $class; use feature ':5.10'; sub foo { my \$self = shift; say "\$self:".\$self->columns; } \$columns = [!.join(',', map { qq['$_'] } @{ $columns->{$class} } ).qq!]; sub columns { my \$self = shift; return join ',', \@{\$columns}; } 1; !; open my $fh, '<', \$data; return $fh; }; my $foo = bless { }, $class; unshift @INC, $foo; require $class; } |
That works, but it was still actually pretty slow, surprisingly.
I also tried concatenating all of the files into a single file and it was still more or less just as slow.
So finally I broke down and did the unthinkable: I RTFM’d on DBIx::Class::Schema to see if there were any clues. The clue that I got out of it was the following bit:
register_class
…
You will only need this method if you have your Result classes in files which are not named after the packages (or all in the same file). You may also need it to register classes at runtime.
So what I could do is generate all the classes with code, but really simply, without all the column metadata since the DB is the single point of truth in this context, and then load the ones we’d need on the fly!
It was easy as pie. Use a template to generate the classes and write them to files (we used the namespace EPMS::Schema::NonDefaultResult, so that it’s clear that it’s result, but not loaded by the load_namespaces method of the schema.) Then I just added a method to our Schema that would do the following (from memory):
1 2 3 4 5 | sub load_report { my ($self, $report_num) = @_; eval "require EPMS::Schema::NonDefaultResult::Rpt$report_num"; $self->register_class("Rpt$report_num", "EPMS::Schema::NonDefaultResult::Rpt$report_num"); } |
And that was basically it. I also wrote a little bit of code to short circuit if the report is already loaded. Anyway, it works reasonably quickly and isn’t too ghetto! So the moral of the story is probably to RTFM before you try crazy stuff.
14 Jul
Today we had another P6M meeting. There were seven of us despite the fact that three of the regulars were gone at a birthday party, so that was fairly heartening.
As you may already know from the Iron Man Feed, s1n did a talk on .WALK, which is a selector based system for introspecting the methods of a class. One really interesting thing about it is that it (apparently?) isn’t actually for dealing with inherited/overridden methods as much as it is for manually tweaking the multiple dispatch that Perl 6 supports.
Just to be clear, multiple dispatch is how Perl 6 chooses what method to run based on the parameters (and invocant) of a method. So you can do something like this:
1 2 3 4 | class Frew { method foo($self: Int $foo, Str $bar) { ... } method foo($self: Str $baz) { ... } } |
And when you call the method it will call the right one based on the params passed to the method. You can even dispatch based on the value of the parameter.
Cool stuff!
13 Jul
So a couple perl giants I have already heard of responded to my previous post regarding NULL’s in the database.
NULL means “this piece of information exists but is unknown to us”. Follow this simple rule when deciding whether to allow things to be NULL or not and you’re basically sorted – and the standard SQL logic will suddenly work with you rather than against.
Until you do a LEFT JOIN and discover that it uses NULLs for “doesn’t exist” in there … but anyway …
–mst
I’ve a blog entry about this. Basically, NULLs can lead to queries which are logically impossible to get correct answers for. They’re rare, but I’ve hit them on larger queries and they’re a nightmare to debug.
There’s also the problem of what a NULL is supposed to represent. Is the data unknown? Is it not applicable? Is it something else entirely? I often see NULL values in a databsae where people have tried to overload the meaning of NULL and it’s done on an ad hoc basis. For example, consider a “salary” field in a database. Why would it be NULL? Are they unemployed? Are they a volunteer? Do you simply not know it? Are they hourly and therefore not salaried? A NULL value could potenitally have four different meanings.
–Ovid
I personally think that they both make good points. I lean the direction of mst, which is that NULL’s are ok, but all they mean is that you don’t know that piece of information. Treating them as more information than that is probably a bad idea. Normally I’d just make a bit field to represent other information about the field, like why it’s NULL or something like that. In general fields should only be NULL when they are optional, which should probably be rare.
Although, Ovid links to an article (from the article he wrote) that advocates the removal of *all* NULL’s which I think is relatively extreme. But it resonates with the coder inside of me. The same coder who thinks it’s a good idea to make a new class for everything and do everything with method dispatching instead of if-else’s. I’d like to point out that this part of me has never won out against pragmatism, but I’m sure it will happen someday.
Anyway, I present to you two options from the luminaries above. I find both of the options very attractive and I will probably take mst’s route in general, but I think that the link Ovid gave is surprisingly compelling. It would make the data very consistent, but the cost would be lots of JOIN’s, tables, and classes representing those tables.
The answer may be some place in the middle; I don’t know. One way or the other, ponder the path of your feet; then all your ways will be sure. No one ever got to be a good programmer by blindly following some random blogger.
12 Jul
So recently I made a post regarding NULL’s and ” with respect to numeric fields in a database. I asked questions on a couple different mailing lists for help and one of the interesting responses I got was that You Shouldn’t Have NULL’s In Your Database Unless Required.
Now, I totally understand that for strings, which is all the noted article actually discusses. But my issue wasn’t with a string, it was with a number. I’d say that 0 is not the same as NULL when it comes to data. How many kids to you have? Oh you didn’t answer. That must mean none. Seriously?
I think for non-text fields, converting ” to NULL makes perfect sense. After all, when someone does a submit from the browser it must be a string so unless you are doing something special you have to convert and filter and validate the input as a string anyway.
So tell me I am wrong. Show me a good reason why non-string field’s shouldn’t be NULL. I am certainly not as smart as other programmers, but I haven’t seen a good argument yet.
(Note: I meant to post this before but I apparently forgot to press publish :-/)
8 Jul
This is just a rant.
I am so sick of validating forms. I do all that I can to make it easy and whatnot, but it still comes back to spite me! Here are two examples of things that are dumb:
So html checkboxes are SO DUMB. If they are checked, the value is set to ‘on.’ That’s annoying alone, but if the checkbox is not set it doesn’t even get submitted! Anyway, that’s pretty annoying. I made a little utility function that lets me just do something like this:
1 |
That works ok I guess. It just feels ghetto.
This is less about forms and more just about how suck I am of this stupid stuff. So let’s say you have some numbers fields in your db. If a person wants to leave the number blank, it gets submitted as ”. Unfortunately that is not a valid number. So you have to convert the ” to undef to get it to store into the database as a NULL. That’s annoying right? No?
Blah. I’m done.
7 Jul
So this is probably old hat to those people who are already big on architecture or know a lot about design patterns, but I thought it was a pretty clever implementation of data security. Anyway, first I’ll start off with how I actually did it, and then maybe talk about it in the abstract.
So here’s the idea, I have a user, and that user should only be able to view a certain set of messages. The messages are linked to groups which the users are linked to. So users have groups, and then groups have messages. So to display the messages we do something like this:
1 2 | my $to_display = $user->groups->related_resultset('messages'); |
And then you can use that kind of code to limit other things which would more easily cause security issues:
1 2 3 | my $message = $user->groups->related_resultset('messages') ->find($id); |
The fact that DBIC allows you to chain your searches is really what allows this kind of thing to happen. Of course, it could be emulated with most data structure based ORM’s by modifying the data structure that gets passed to the search or find method.
(I am pretty sure that you could do this just as easily with DBIx::Class::Schema::RestrictWithObject, but chaining off of user makes a lot of sense to me, so for now that’s how I’ll pull that off.)
Now before we get into a more general discussion I’d like to point out that because of DBIC’s implementation (and possible emulation of it already previously mentioned) this shouldn’t really be too much of a performance hit. Of course, the more related_resultset based chaining you do the more tables you are joining into the query, and that’s where you will start seeing performance issues.
Ok, so the general approach:
It seems to me that it wouldn’t be too hard to make a Highlander (Singleton) that would basically have methods for all of your ResultSet’s (or tables in SQL-talk.) It would contain any user credentials that are needed to get at any data. The idea would be to have it throw an exception if you were to try to instantiate it without all of the data needed to do your security stuff. Really that’s just good OO; any instantiated object should be complete.
Now I have to point out that this really isn’t a complete solution. My friend Fjord works on Birdstack and they need to support the hiding of specific columns, of specific rows, depending on a number of criteria. It’s possible that he could do this for birdstack, but that would end up making each optional column a join table, which would be slow and cumbersome. I don’t remember how he solved the issue, but I imagine that the best way to pull it off would be with a Highlander class that filtered each Result (row) coming from each ResultSet. I guess it would need to return specialize read-only classes or something.
One way or another, I think that no matter what, this fine grained control of public vs private data is going to be hard to manage and slow in a regular RDBMS. An object database might be able to handle it better, but I haven’t really thought much in that vein yet.
So with that I say to you peace be within your walls and security within your towers and racks!