Guardian PDF newspaper downloader

Script for automated downloading of PDFs from Guardian's subscription service (guardian.newspaperdirect.com, used to be digital.guardian.co.uk)

Download latest version
Help Forum - use this to report problems and/or request features

Requirements

Currently works only on OS X, but Linux/Unix only need a bug fix or two. Windows currently unsupported. Needs dos2unix, Netcat and Python (used for MD5 encryption).

It's just a big fat Bash script (only because I'm more comfortable with Bash than Perl or Python), which I will tidy up and rewrite using something else in the future (see below).

History

Guardian's website is probably the most popular newspaper website in the UK. As well as online news, they have also been offering digital subscription to full newspaper edition for a long time. This has recently undergone a significant facelift. Even though very fancy, this redesigned system doesn't change anything about the form of this service, which is a online JavaScript viewer/reader running in a web browser. There are several drawbacks of this approach (see below). However, there has always been the option of downloading individual pages in PDF, which solves all of these issues.

Problems with web-browser approach

web browser's font rendering is awful compared to printed newspaper
no support for e-readers and other non-web-enabled electronic media
cannot archive newspapers or go back to a newspaper that's a few months old
cannot make decent quality screenshots or print outs for reader's own use
active Internet connection required in order to read
...

There are some subjective issues too, like web browser's general clumsiness, awkwardness, laziness and preference of desktop applications like a decent PDF viewer.

Development

When I signed up in early 2009, I could use bash sripted Lynx to log in and download PDFs for a whole issue of Guardian. This worked ok until some day in summer 2009 when Guardian changed the website and introduced s*** loads of JavaScript that made Lynx solutions impossible.

I was generally pissed off with that, so after discovering that the PDF option still exists in the redesigned website, I sat down and painfully went through sources and traffic dumps of logging in and downloading a PDF and wrote a very-very-very-very crude Netcat based downloader, which still works to my 100% satisfaction as of writing this on 29 Jun 2009.

Recently I have also written an automated script that periodically checks the date and downloads a newspaper every day using the downloader script. So basically with this running in the background, if I switch computer for a while every day, I will have a newspaper to read anytime I want.

TODOs

fix Linux issues
tidy up output and make script friendly
the ability to download specific issues (not only today's issue)
bad login and other common errors should produce a helpful message
come up with the most suitable form for wrapper script (a cron job? a GUI?) and publish too
grand rewrite: using Python so that Windows can be supported

Hosted by SourceForge.