fREWdiculous!
15 Jun
This month’s Dallas.p6m was bigger than before! We had my coworkers Geoff, Neil, and Wes, myself, Graham Barr, Jason Switzer (s1n,) Patrick Michaud, and John Dlugosz. We got a domain hooked up (dallas.p6m.org, which doesn’t point at anything yet,) discussed interesting stories about rakudo optimization (and often lack thereof,) and sometimes delved into perl5 stuff.
s1n decided to mention that we need to start doing our feature expositions, which is where someone picks a feature in perl 6, does some research, does a talk on it, and then we write some code which is based on it. s1n is going to talk about .WALK, which allows you to look at the Abstract Syntax Tree. I’m pretty excited about that.
Patrick explained to us how it seems that most traits are becoming declarators. That is, in class foo is bar, class is a declarator, and bar is the trait. Of course the migration isn’t a big deal because defining a declarator is similar to defining a subroutine.
We also discussed Rakudo’s Unicode support. Patrick mentioned that they are looking for a “golf mode” character which will enable unicode characters for things like >= .
We also discussed some of the Rakudo release strategy. It’s very similar (for obvious reasons) to Parrot’s release strategy. It’s exciting that they will continue to release regularly (monthly.)
I can’t recall any other details regarding the meeting. I know we talked about more though. I should start taking notes…
11 Jun
You may remember my post from before asking about the differences between these two frameworks. I only got a couple of responses, but they certainly helped me to see what is up.
Basically it boils down to this (as pointed out by mst): CGI::Application is a microframework, and Catalyst is an extremely configurable MVC stack. Before you correct me, Catalyst doesn’t actually provide the Model or View code; it lets you pick whatever you want to pull that off. But nonetheless it has affordances for both model and view code.
CGI::Application, on the other hand, doesn’t even have a built in way to deal with models! I love CGI::Application, but mostly because it keeps our code way more organized than our previous framework; (note, our previous framework were files that had use CGI; somewhere at the top…)
So you could really look at it like this: Catalyst is extremely extensible, because of their brilliant design. Maybe large frameworks have to be designed the way Catalyst is; I don’t know. I do know that at this point in my career my coding chops are not good enough to have that good of a design/API.
CGI::Application, on the other hand, is simple enough to grasp in an hour or so. It has:
And that’s it! There are plugins that give you extra features, like RESTful dispatching, authorization, and authentication, but out of the box it’s just a microframework.
Catalyst, on the other hand, is much more complex. For the ruby people out there it’s probably a mix between Rails and merb. Not quite Rails because it’s much less opinionated, but not quite merb because it has quite a few features that I don’t think merb has.
Recently I have been feeling some of the growing pains of the app that we recently started from scratch at work. It’s based on CGI::Application. The reason behind that was that my boss was hesitant to try something new (to us) like Catalist. I had used CGI::Application in TOME and so I had at least a little experience with it, although in TOME we didn’t even go close to what we could have done.
Anyway, if you are starting a new project that will be large (for almost any value of large) you probably want Catalyst. If you are making something simple (like WebCritic,) using Catalyst is totally overkill, and CGI::App fits the bill nicely.
10 Jun
I’ve used Open Source for a little over ten years now. I’ve been sufficiently indoctrinated that Open Source (Free Software) is both morally and technically the right choice. That’s not what this post is about. If you disagree with those premises, that’s fine. The idea here is that I use all kinds of Free Software all the time. I use Vim for a text editor. I use zsh as a shell. Firefox is my browser. This blog runs on WordPress. The webserver we use at work is Apache. And the of course all of our code depends on Perl and numerous libraries.
We don’t pay for any of that software! Not a dime! And that’s fine, but nothing comes for free.
So far I’ve worked on three open source projects. The first was TOME, a book sharing system we used at school. Next I wrote some of the spec tests for Perl 6. And then most recently I’ve been doing some things for DBIx::Class.
One of the excellent things about the DBIx::Class developer community is that they really do their best to help you to work on the source.yourself. Recently they (or more specifically ribasushi) helped me add the full sorting capabilities to the SQL Server parts of DBIx::Class. More lately I’ve been adding things for the paging capabilities, which is great because paging with SQL Server is horrendous.
Anyway, the most important part of all of this is that I am part of something that will help me and other people. Furthermore, it really wasn’t that hard to add the code. They showed me where to add it, did a little code review to help me clean it up, and that was it! If only more communities were like that the Open Source world might be even more vibrant.
9 Jun
I recently purchased an Avatar to be created by Scott Meyer of Basic Instructions. Today he sent me the completed avatar. Here it is:

This is me!
Pretty sweet, huh? Anyway, I figured this would be cool, because I get to look cool and support an excellent webcomic/artist.
Get your own here!
Oh yeah, and maybe you want to see the original. That was done by my roommate at the time.
8 Jun
One of the common issues I hear about CPAN is that it’s so sprawling that people do not know which modules to use and which not to use. Hopefully part of that issue will be solved by the Enlightened Perl Core, but that will only go so far. Recently there were a couple posts regarding this issue. (Note: They are in reference to a post I made and they are from the same guy.) I even recently had a discussion regarding this with my boss recently because we needed some barcode generation code. (We ended up using Barcode::Code128 but we spent a lot of time trying to get GD::Barcode to do what we wanted.) Furthermore I chatted with the EPCore guys regarding this and they all helped me think through a lot of these issues; I have a muddled mind
I think a solution to this problem is feasible. I imagine a web service that will help recommend various packages for given tasks. I have the following (pie in the sky) goals in mind:
Here are some possible sources of data to make this all work:
CPAN Testers is obvious. It has massive amounts of data and it can at least tell you if a module is good by it’s own measure. It might be worthwhile to look into some kind of scaling based on tests (configurable of course.) The idea there is that if a module has never failed because it has no tests that shouldn’t count.
CPAN Deps isn’t even completed. I’ve only heard this name dropped, but the idea is clear. With it you could find out what modules are effectively core in that lots of people depend on them. You could use this in a PageRank style way in that modules that have a high score help add to the score of modules they depend on.
The Github watches link that I posted is where I originally got the idea for this. I’m not really sold on it, but mst liked it so much I figured I’d keep it in the list; I wish I could give you a link to the actual conversation. He hated the idea of using “failhub”
I do like the idea, but I am certainly not as smart or motivated as mst.
And last but certainly not least, CPAN Ratings. CPAN Ratings is an excellent idea, but it needs some love. Part of that has to do with it’s actual implementation (at the very least it’s ugly,) but the real issue is the use of it. More people need to use it. I don’t know how to do that other than to use it myself. I think it might be good if, after using a module for at least an iteration, I rated it. If one were to rate a module too soon the results could become inflated. And as a side-note, I personally think we should use OpenID instead of BitCard, but it’s not worth changing a bunch of stuff just for that.
And then I was thinking that we could use a combination of module name searching, tags added to META.yml, and tags added manually. So DBIx::Class would theoretically add the ORM tag (and others possibly) to their META.yml, and then someone would manually add the tag to Class::DBI. Then when people search for ORM they would at least find those two. They would then get a score based on the previous five metrics. I would say have anything with a score beneath a certain number not even displayed, but have a link that would allow the display; and maybe a user option that would permanently display hidden items.
I think this is something that would certainly be worth attempting. It wouldn’t be easy, and the stuff I’ve said above is certainly riddled with errors, but that shouldn’t stop us. What do you guys think? Other ideas for data sources? Implementation ideas? Tuits?
8 Jun
Have you ever highlighted something in vim, yanked it, and then realized you wanted to yank it to a different buffer, often + or *? Well, try the command gv. It will highlight whatever you had previously selected. I probably use it at least once a day.
Enjoy!
5 Jun
I just completed World of Goo (or buy direct, here). Very fun game!
I like video games, but I tend to not play them very much because I do all kinds of other things (lots of programming if you can’t tell
) but recently I’ve found that they help me clear my mind when I am trying to figure stuff out. Like, I’ll be coding and I will usually get stuck on a design issue. It’s rarely a question of how to get it done, but more, what’s the best way to get it done. I certainly don’t always choose the right answer, but I try to go back and correct wrong answers as soon and as often as possible.
Furthermore, I really appreciate these indie games. They have something in common with what I call art (a post for another day.) Here’s an interesting factoid: World of Goo was made by four people. Only two of which were coders! That blows me away.
Anyway, the game was great; it installed and ran without a hitch on Ubuntu; so try out the demo, and if you dig it, purchase it.
5 Jun
So I’d like to do a post on CGIApp and Catalyst. People on IRC keep telling me that using CGIApp is wrong (mostly because they’ve never used it) and that I should switch to Catalyst.
Catalyst may be great, but I haven’t seen any solid posts about how Catalyst is great. So help me out. Ignoring the fact that Catalyst is what everyone uses (so there are lots of plugins for it) what makes it so good?
2 Jun
At $work I manage the subversion repositories for all of the software that we develop. It’s certainly not something that I’m great at, but I’ve used it longer than most so I am the most qualified to deal with it.
Furthermore, at work we use this tool (Freescale?) which, when it creates a project, creates a Boot directory and a Con directory. Ok, so I had helped our head honcho EE create a repository to store his project data and versions. He’s puttering along and he thinks, “Hey, I want to ‘save’ this version so I can go back to it later!” So I explain to him tags and how to set it up and all this jazz. Well, it turns out that when we made the repo initially we did not make tags, trunk, and branches, like we should have. We just put everything in the root of the repo. Foolish! So anyway, I tell him that we can reorganize it fairly easy and we do that. So we make the changes, delete the old directory, and recheckout from trunk…
It failed. It could not check out the directory! Some of you may be able to guess why: in Windows you cannot (easily) create a directory named “con” (or com, or a few others.) So we are having the hardest time getting it to check out. Meanwhile he has to make a release for the customer and I am under the gun. So he pulls up a copy he made (how?) and gets back to work and I try to figure out how to deal with this in my office. At this point he has asked me to just revert the changes.
So I go back to my office, try checking it out a few different ways and have no luck. So finally I get an idea, I figure I’ll check it out in a virtual machine with Linux installed! So I do that, I run the reverse merge to undo all of our changes, and I check everything back in. It worked!
So the moral of the story? Don’t name folders “con.”
1 Jun
Since the beginning of my serious webcomic journey with xkcd, I think that was four years ago, I’ve been writing little scripts to help me get started. The first type of script is to grab integer-based, monotonically increasing files. Very easy. Done in Ruby.
1 2 3 4 5 6 7 | #!/usr/bin/ruby -w Fromat = "http://foobar.com/comics/%08d.gif" 1.upto(986) do |i| `wget #{sprintf(Fromat, i)}` sleep 1 end |
The next harder are the ones that are based on the date of publication. Usually though, they will be published Monday-Wed-Fri or something like that, so you can just increase per day and then check if it’s the correct weekday. See more Ruby.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #!/usr/bin/ruby -w Day = 60 * 60 * 24 Fromat = "http://www.foobar.com/comics/st%Y%m%d.gif" t = Time.local(2005, 2, 5) MWF = [1,3,5] until t == Time.local(2007, 7, 9) if MWF.include? t.wday `wget #{t.strftime(Fromat)}` sleep 3 end t += Day end |
And then lastly, and hardest of all, are arbitrary files that can only be ascertained by clicking links. Perl + CPAN to the rescue!!!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | #!perl use strict; use warnings; use feature ':5.10'; use WWW::Mechanize; my $mech = WWW::Mechanize->new( autocheck => 1 ); sub process_page { my @images = $mech->find_all_images( url_abs_regex => qr{http://www\.foobar\.com/memberimages/.*\.jpg}i ); foreach (@images) { my $url = $_->url; if ($url !~ qr/banner/i) { say "downloading $url"; qx{wget $url}; } } } $mech->get( 'http://www.foobar.com/foo/bar/series.php?view=single&ID=72709' ); process_page; while ( $mech->follow_link( # third link on page matching regex n => 3, url_abs_regex => qr{http://www\.webcomicsnation\.com/dmeconis/familyman/series\.php\?view=single&ID=\d+}i ) ) { sleep 1; process_page; } |
This last one should be checked on every now and then as it is easy for it to get stuck in an infinite loop on the last couple comics.
Anyway, enjoy! This set of scripts should take care of all of your webcomic scraping needs
Note: these are not to avoid ads, but to speed up the initial reading process as speed is an issue when reading 400 or more strips.