fREWdiculous!
8 Jun
One of the common issues I hear about CPAN is that it’s so sprawling that people do not know which modules to use and which not to use. Hopefully part of that issue will be solved by the Enlightened Perl Core, but that will only go so far. Recently there were a couple posts regarding this issue. (Note: They are in reference to a post I made and they are from the same guy.) I even recently had a discussion regarding this with my boss recently because we needed some barcode generation code. (We ended up using Barcode::Code128 but we spent a lot of time trying to get GD::Barcode to do what we wanted.) Furthermore I chatted with the EPCore guys regarding this and they all helped me think through a lot of these issues; I have a muddled mind
I think a solution to this problem is feasible. I imagine a web service that will help recommend various packages for given tasks. I have the following (pie in the sky) goals in mind:
Here are some possible sources of data to make this all work:
CPAN Testers is obvious. It has massive amounts of data and it can at least tell you if a module is good by it’s own measure. It might be worthwhile to look into some kind of scaling based on tests (configurable of course.) The idea there is that if a module has never failed because it has no tests that shouldn’t count.
CPAN Deps isn’t even completed. I’ve only heard this name dropped, but the idea is clear. With it you could find out what modules are effectively core in that lots of people depend on them. You could use this in a PageRank style way in that modules that have a high score help add to the score of modules they depend on.
The Github watches link that I posted is where I originally got the idea for this. I’m not really sold on it, but mst liked it so much I figured I’d keep it in the list; I wish I could give you a link to the actual conversation. He hated the idea of using “failhub”
I do like the idea, but I am certainly not as smart or motivated as mst.
And last but certainly not least, CPAN Ratings. CPAN Ratings is an excellent idea, but it needs some love. Part of that has to do with it’s actual implementation (at the very least it’s ugly,) but the real issue is the use of it. More people need to use it. I don’t know how to do that other than to use it myself. I think it might be good if, after using a module for at least an iteration, I rated it. If one were to rate a module too soon the results could become inflated. And as a side-note, I personally think we should use OpenID instead of BitCard, but it’s not worth changing a bunch of stuff just for that.
And then I was thinking that we could use a combination of module name searching, tags added to META.yml, and tags added manually. So DBIx::Class would theoretically add the ORM tag (and others possibly) to their META.yml, and then someone would manually add the tag to Class::DBI. Then when people search for ORM they would at least find those two. They would then get a score based on the previous five metrics. I would say have anything with a score beneath a certain number not even displayed, but have a link that would allow the display; and maybe a user option that would permanently display hidden items.
I think this is something that would certainly be worth attempting. It wouldn’t be easy, and the stuff I’ve said above is certainly riddled with errors, but that shouldn’t stop us. What do you guys think? Other ideas for data sources? Implementation ideas? Tuits?
1 Jun
Since the beginning of my serious webcomic journey with xkcd, I think that was four years ago, I’ve been writing little scripts to help me get started. The first type of script is to grab integer-based, monotonically increasing files. Very easy. Done in Ruby.
1 2 3 4 5 6 7 | #!/usr/bin/ruby -w Fromat = "http://foobar.com/comics/%08d.gif" 1.upto(986) do |i| `wget #{sprintf(Fromat, i)}` sleep 1 end |
The next harder are the ones that are based on the date of publication. Usually though, they will be published Monday-Wed-Fri or something like that, so you can just increase per day and then check if it’s the correct weekday. See more Ruby.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #!/usr/bin/ruby -w Day = 60 * 60 * 24 Fromat = "http://www.foobar.com/comics/st%Y%m%d.gif" t = Time.local(2005, 2, 5) MWF = [1,3,5] until t == Time.local(2007, 7, 9) if MWF.include? t.wday `wget #{t.strftime(Fromat)}` sleep 3 end t += Day end |
And then lastly, and hardest of all, are arbitrary files that can only be ascertained by clicking links. Perl + CPAN to the rescue!!!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | #!perl use strict; use warnings; use feature ':5.10'; use WWW::Mechanize; my $mech = WWW::Mechanize->new( autocheck => 1 ); sub process_page { my @images = $mech->find_all_images( url_abs_regex => qr{http://www\.foobar\.com/memberimages/.*\.jpg}i ); foreach (@images) { my $url = $_->url; if ($url !~ qr/banner/i) { say "downloading $url"; qx{wget $url}; } } } $mech->get( 'http://www.foobar.com/foo/bar/series.php?view=single&ID=72709' ); process_page; while ( $mech->follow_link( # third link on page matching regex n => 3, url_abs_regex => qr{http://www\.webcomicsnation\.com/dmeconis/familyman/series\.php\?view=single&ID=\d+}i ) ) { sleep 1; process_page; } |
This last one should be checked on every now and then as it is easy for it to get stuck in an infinite loop on the last couple comics.
Anyway, enjoy! This set of scripts should take care of all of your webcomic scraping needs
Note: these are not to avoid ads, but to speed up the initial reading process as speed is an issue when reading 400 or more strips.
31 May
I recently got a new car stereo due to the other one being stolen. I am almost entirely happy with the model that I ended up purchasing, but one thing that it does, which is really obnoxious, is that it doesn’t sort the files correctly unless the track number is early on in the file name. Even if all tracks are “FooBarBaz 01 – name.mp3″ it seems to ignore the number unless it’s the very beginning of the file name. Anyway, easy fix:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | #!perl use strict; use warnings; use feature ':5.10'; use Music::Tag; use File::Find::Rule; use File::Basename "fileparse"; use File::Copy "move"; use File::Spec; my $directory = shift || '.'; my @songs = File::Find::Rule->file()->name( '*.mp3' ) ->in( $directory ); foreach my $song (@songs) { my $info = Music::Tag->new($song); $info->get_tag; my $track = $info->track; my $title = $info->title; my (undef, $dir, $suffix) = fileparse($song, qr/\.[^.]*/); $info->close; if ($track and $title) { my $newfilename = File::Spec->catfile( $dir, sprintf "%02d %s%s", $track, $title, $suffix ); if ($song ne $newfilename) { say "renaming $song to $newfilename"; move $song, $newfilename; } } } |
It doesn’t really deal with illegal characters, it just doesn’t rename those files. Eventually I’ll get around to doing that. Anyway, just figured someone might be interested/want to copy paste this.
18 May
With DBIx::Class we typically have a TO_JSON method which returns a hashref of the data you want in your json. Here’s an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Here’s the shorter version mst inspired me to write:
1 2 3 4 5 6 7 8 |
Anyway, not very complex, but still awesome.
15 May
My coworker Wes asked me if there could be a nice refactor for the following function which checks CAS Numbers to ensure their validity. After struggling for 30 minutes I gave up trying to make it a little bit nicer with reduce.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Let’s take a look at this and figure it out. The crunchy bit is the for loop, so I’ll go through that. Basically we are summing each item times a weight that is inversely proportional to it’s location in the list. Or to be more explicit, let’s do an example on the board (7732-18-5.) 5 is the check digit.
| $_ | $count | $_ * $count | $sum |
|---|---|---|---|
| 7 | 6 | 42 | 42 |
| 7 | 5 | 35 | 77 |
| 3 | 4 | 12 | 89 |
| 2 | 3 | 6 | 95 |
| 1 | 2 | 2 | 97 |
| 8 | 1 | 8 | 105 |
So basically we are making a special summation. The thing that’s unusual is that we have a decrementing counter along with it. If I had the control structure which I am about to show you in my mind already the solution might have jumped out sooner.
So I asked about it in #perl6 and it turns out there is a very nice Perl 6 version. It takes advantage of the mystical hyperoperator (>>infix op<<); that is, it takes two lists and performs an operation on each element together. Think SIMD. It also uses reduce ([infix op]) which I have mentioned before. Check it out!
1 2 3 4 5 6 7 8 9 10 | sub cas(Str $cas) { if $cas ~~ /(\d ** 1..8)\-(\d\d)\-(\d)/ { my @digits = $0~$1.split ''; my $check = $2; return ([+] @digits.reverse >>*<< (1..@digits)) % 10 == $check; } return Bool::False; } |
This does the same thing as above. Or to put it in English, we take our digits, reverse them, and then multiply each digit (hyperoperator, >>*<<) by the respective integer in the other list, that is, 1 to the size of the list. We then sum (reduce, [+]) that new list, and get the modulus 10 of it.
Very elegant, no?
Update: Turns out there is also a very elegant version in p5, according to mst. Check it out!
1 2 3 4 5 6 7 8 9 10 11 12 |
It’s very similar to the p6 version, just using fewer generalized operators, so you should be able to follow it fairly well.
14 May
We had the second Dallas.p6m on May 12, 2009. Along with me there were two of my coworkers, s1n, Graham Barr, and Patrick Michaud. We discussed a lot of things. One of which was the difference between subs and methods in Perl6. And the fact that you can’t imply self. This should explain it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | class A { sub foo { say 'foo'; } method bar($o:) { # much to s1n's chagrin, you can't # have baz() imply self.baz. These # are his options say 'bar'; self.baz; $o.baz; # great for when you have a lot # of object methods close by given self { .baz; } # this is the same as above but it's # not scoped. Good for short methods $_ = self; .baz; # not really recommended as it # doesn't seem to be for anything # but attributes. $.baz; } method baz { say 'baz'; } } my $a = A.new; A::foo; #A::bar; # dies #$a.foo; # dies $a.bar; A.bar; # note: there may be a distinction between # class and instance methods, but for now # you use the same method for both |
I also asked Patrick if he thought that Perl 5 in Perl 6 would really happen and if so how. He said it would happen, but probably not soon. There are really three options. The first is to embed Perl 5 in Parrot. This is really the “best” option as it would have 100% compatibility (except weird XS stuff,) and I think Patrick said that it had been prototyped, so that’s encouraging. The next option would be to reimplement most of Perl 5 in Perl 6. This would never get close to 100% but it would still be an option. The last option would be to have major parts of CPAN reimplemented in Perl 6, thus making compatibility far less important. Important CPAN modules would be DBI, a good templating system, a web framework (in progress), and some form of GUI toolkit.
Somehow we got into a discussion about Mojo, the framework Graham uses at $work. It is modeled after Rails and is supposed to be simple to port to Perl 6. The most important thing about it, as far as I can tell, is that it has no dependencies. Graham made it sound like a lightweight framework, but I guess he just meant the no deps thing. CGI-Application (what I use) totals to 7k lines. Mojo, on the other hand, totals to 50k. Not exactly lightweight, but low dependencies is an interesting goal.
And then two thirds of us are going to YAPC::NA, so we talked about that some. Very exciting things coming up!
13 May
This is in response to chromatic’s post Writing Perl 5’s Support Policy
I want to be able to use the support policy as a reason to convince customers with lots of Perl installs that they need to update. A big part of this means an easy upgrade.
Probably most of the people using Perl 5 are in Unix. That makes it easier for you folks. On Windows installing Perl is no simple task, either ActivePerl or Strawberry Perl.
For example, at $work we use Apache/mod_perl. (I don’t wanna hear your “FastCGI! FastCGI! FastCGI!”, none of you people have actually helped me so far!) Let’s say I want to use the shiny new Perl 5.11. I install it. Awesome. Wait! mod_perl doesn’t work! Ok, reinstall that. Oh wait! All of my XS/compiled modules don’t work! etc etc. I understand the fact that Windows is a second class citizen here, but in Unix you probably have the same issue. You just made some kind of package to install it all at once or something. That’s great, but does it help me? Is updating really supported?
On a side note a point that my boss made when we discussed this issue is the fact that the community may not support Perl 5.8 in the future, but if companies will pay for it, ActiveState will probably support it. This is good for the customers and helps the community, so I’d say that’s really a win-win.
Anyway, those are my thoughts. Take ‘em or leave ‘em.
12 May
At work one of our customers is having us revamp one of the major sections of the site. We are moving in the “Web Application” direction; that is, very little HTML, and almost all Javascript. The section of the site that my coworker was working on recently does a lot of calculation. On the old HTML page a customer would log in, ask for a certain report, and I guess go get a cup of coffee. The report took something like five minutes to run; not impossibly long, but long enough that you don’t want to leave it like that and turn the output into JSON.
So we took a look at the module that does the calculation. First off, it was a little long (four thousand lines,) so just eyeballing it wouldn’t easily point us to what was causing the slowdown. Fortunately my coworker Neil, who wrote the module, had also written a simple to execute perl script, as opposed to everything being in a cgi script. So with that in hand, we fired up NYTProf.
NYTProf is a profiler that was written for use at the (wait for it) New York Times. After installation using it was extremely easy:
1 | perl -d:NYTProf calc.pl |
That generates a database of what went down. Next we ran:
1 | nytprofhtml |
Which turned everything into nicely formatted html files. It creates profiles for all of the files used and has some nice javascript sorting built in to the tables etc. It showed us that we were calling a certain heavyweight function more than we should have (DBI::st::execute
.) And then there were other things that we did (simple caching for example: if (!$self->{foo}) { $self->{foo} = …;} return $self->{foo}) which sped things up something like 70% in the profile.
Anyway, if you find yourself hitting a speed wall, NYTProf isn’t hard to use at all. The only problem you may run into is not having a simple to run test script. This is another reason to split business logic into modules; you can easily use the modules in your controllers, test scripts, and profilers.
Have fun!
8 May
It’s Friday, so a long post is not in order. With that in mind, a simple refactor for your pattern matching skulls and skills:
before:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | my @files = File::Find::Rule->file()->name('*.t') ->maxdepth( 1 )->in( File::Spec->catdir( $self->get_directory, 't' ) ); my @total_results; foreach my $file (@files) { push @total_results, "<span class='file'>$file</span>"; push @total_results, @{ $self->test( $file ) }; } return join "\n", @total_results; |
Do you see what I see? We’re iterating over a list and generating a new list… And then we are just doing a join on that. Enjoy the nice and functional rewrite.
after:
7 May
I work at a Microsoft Company more or less. We use SQL Server, IIS (moving to Apache…), and various flavors of Windows for all of our machines. I haven’t had the cojones to install Ubunutu on my desktop yet, so I am stuck with cygwin and friends. But the perl that runs my server is not in cygwin. That means that if I want to do valid testing I have to do it with the regular perl. I’ve tried running prove from the cygwin commandline, (ie, /cygdrive/c/usr/bin/perl/prove,) but it just hangs. So I just have to suck it up and run prove from cmd.
The color highlighting doesn’t work (or at least I couldn’t get it to) so with code that sometimes throws a lot of warnings (that’s intentional, I am testing croaks often) I see way too much noise to follow what’s going on. Redirection to the rescue! Instead of just running prove, I can run
1 | prove 2> foo |
instead and it redirects all of the warnings to the file foo. Although it’s obvious, it makes running tests that much easier, and I think running tests should really be the path of least resistance. So thank you DOS makers, for implementing STDERR redirection, I didn’t think you’d have done that, but you did.