Tagged as rails

5 Feb 2008

Extending BlueCloth with Pygments

I just spent a fun couple of days tinkering with the engine of this site and have finally emerged bruised, tired and slathered in axle grease.

I started out just adding syntax highlighting but ended up adding page caching (to offset the performance hit of parsing code), and upgrading to Rails 2 (to get at a couple of caching related niceties). So due to all the change there may be the odd unresolved issue here or there that I've missed. Just drop me a comment if you spot anything poking out at an awkward angle.

I figured I'd inaugurate my code highlighting with a little write-up of how I ended up doing it. I originally planned to use Ultraviolet, but found it way too slow and a bit verbose in the CSS-naming department among other things. I also looked at various JS highlighters, but couldn't find one to suit, and decided it's a fairly resource intensive task to be off-loading onto the client anyway. Then I finally came back around to what I'd been looking at originally, which was the excellent Python library, Pygments. I was put off at first by it not being ruby-native, and wondered if I'd need something like RuPy to bridge the gap. In the end I decided it'd be acceptable to do a system call out to the pygmentize command line interface:

class BlueCloth

  def escape_shell_arg(str)
    "'%s'" % str.gsub("'","'\\\\''")
  end

  def transform_code_blocks(str, rs)
    @log.debug " Transforming code blocks"

    str.gsub(CodeBlockRegexp) {|block|
      code,rest = $1,$2

      # Remove the syntax line and extract the language from it
      regx = /(?:[ ]{4}|\t)+@@(.*)\n+/
      lang = code.slice!(regx).slice(regx,1)
      # Call out to pygmentize to markup the code for highlighting
      code = `echo #{escape_shell_arg(code)} | pygmentize -f html -l #{escape_shell_arg(lang)}`
      # Remove the extraneous wrapper markup that we don't need
      code.sub!(/<div class="highlight"><pre>(.*)<\/pre><\/div>/m, '\1')

      # Generate the codeblock
      %{\n\n<pre class="code"><code>%s\n</code></pre>\n\n%s} % [ outdent(code).rstrip, rest ]
    }
  end

end

As you can probably see, I'm using BlueCloth to format my posts, so I'm over-riding the method it calls to deal with indented code blocks. Doesn't provide any more graceful hooks unfortunately. I've also added an @@language line inspired by an article at Warpspire so that Pygments doesn't have to play the language guessing game. In fact, if you're using a JS highlighter and that's all you need, you can get away with something like this:

class BlueCloth

  alias old_transform_code_blocks transform_code_blocks

  def transform_code_blocks(str, rs)
    str = old_transform_code_blocks(str, rs)
    str.gsub!(/<code>@@(.*)\n+/, '<code class="\1">')
    return str
  end

end

Aliasing the existing handler method to another name so that we can write a wrapper for it rather than over-writing it completely.

Probably bears repeating that the first option isn't the speediest thing in the world, and might be best avoided unless you're prepared to implement some sort of caching. I should also mention that I finished putting this in place and then immediately found mention elsewhere of CodeRay, which is a self-professed "fast" Ruby-native highlighter. I haven't tried it out yet but it looks fairly young and recommends Pygments itself right there on the front page anyway. Worth a look though.

Oh, one last thing. I've used javascript (and Prototype) for my line-numbering so it doesn't get in the way in the source and stays out of CSS-unfriendly formats like RSS:

function codeLineNumbering() {
    $$('pre.code code').each(function(code){
        var count = code.innerHTML.split("\n").length - 1;
        var lines = $A($R(1,count)).join("\n");
        code.insert({before:'<pre class="line"><code>'+lines+'</code></pre>'});
    });
}

Short and sweet!

Update: Seems I was a little hasty with that line numbering function, here's a version that should also work in IE.

2 comments

15 Jan 2008
13 Jan 2008
12 Jan 2008

Deploying Web Frameworks

There was a lot of chitter-chatter this last week about the new web frameworks and the situation with regard to their performance and ease of deployment in shared hosting environments:

  • A Dreamhost employee bemoans the lack of support for shared hosting in Rails.
  • David Heinemeier Hansson responds with his standard "scratch your own itch" riposte.
  • Alex Payne chips in with the view that shared hosts are toy environments anyway.
  • James Bennett points out that this is indicative of a broader issue which also affects Django and the like.
  • GNU VInce adds some particulars about Django's deployment situation. He also hints at the problem of game-changing technologies making it difficult for n00bs to climb aboard.

I've been waiting for a while for the hype to die down so we could have a sensible conversation about all of this. It seems to me that the vast majority of the Rails naysaying has been hinged on a misguided comparison between it and PHP, and complaints about it's performance. There's clearly an issue there, but the issue doesn't seem to me to be with Rails in particular. The "it's hard" and "it's slow" complaints both seem to be missing the fact that Rails, Django et al are solving a much larger problem than PHP.

The new web-development technologies are insta-frameworks, they come with all your professionalism and all your infrastructure in place from the outset. And the necessary baggage of memory overhead and complexity come along for the ride. The old familiars that we've been used to; PHP, Perl and the like come from the direction of shell-scripting-land where the most common case is the most basic. There are 100 hello worlds for every 1 webapp with all mod-cons. In contrast, the new web frameworks have the giants of Java-land in their sights. They say: we expect you to use url-rewriting, ORM, granular caching, TDD, asset servers etc. They'll make it as easy as humanly possible to do so, but as part of the bargain they'll expect you to step up and fulfill your side of the contract as a modern web professional.

And therein lies the crux of the problem I think. The new web frameworks shine a harsh critical light on the yawning gap between how we all started out and where we all should be. It seems like the majority of "I tried it, I didn't like it" stories are from people who burnt their tongue on the first taste and realised their skills weren't up to it. And the cries of "deployment! performance!" are from those who are aghast that they should understand anything about the servers and environments their apps are running in.

When a framework solves all the problems you were familiar with, it's humbling to find that the problems you're left with are those you know the least about.

Yes, there's more to it than that and there is the odd complaint that's valid and well considered. There's certainly room for improvement, but these technologies are very, very young. If we can just support new ideas, give them a little breathing room, help investigate the new bottlenecks and understand that no technology can be all things to all men, then we might just see some progress.

0 comments

7 Jan 2008
3 Jan 2008
9 Nov 2007
31 Jul 2007
19 Jul 2007
19 Jun 2007
17 Jun 2007
14 Jun 2007
3 Dec 2006
23 Nov 2006
26 Sep 2006
5 Sep 2006
8 Aug 2006
29 Aug 2005
16 Aug 2005

Where am I?

This site belongs to Matthew J. Tarbit esquire. A tired old web developer holed-up in a hideaway somewhere in the depths of Leeds, England.

Along with being a home for my ramblings and linkings, it's also a resting place for the bones of a shared blog by the name of Pixelised, now long departed.

If you feel the need, you may add my outpourings to the deluge that is your already overflowing info drip feed.

Or why not rest a while and dig through my entries like a corpulent pig in search of all that is truffle-icious.