Using Google Apps Script for automation of small tasks

19 May 2013

I've been manually tracking the number of .NET/C# jobs in Melbourne for some time now. I could've automated it, probably, but it was easier to do it just by hand - it took only a couple of minutes every day anyway. So I thought it wasn't worth it. Although I love automating things like that (sometimes even overly so), I still hesitated.

But recently I learnt about Google Apps Script service, and after taking a quick look at the docs I decided to implement a small tool for gathering job market statistics.

My goal was simple: get a search results page from seek.com.au, load it as a DOM, run a CSS select to find the element that I need and write the value of that element into a spreadsheet that I use for tracking/analysis.

Apart from small changes to the original plan (I had to use semicolons, apparently Apps Script hates you for not doing so, throwing random errors at you when debugging; I also decided not to load the whole page as XML, and only load a tiny bit), everything worked amazingly well.

I must say Apps Script's API is quite comprehensive; API docs are great too - it was quite easy to find the stuff I needed.

Below is a simple code sample, demonstrating parsing of some HTML:

function getJobsCount() {
    //
    // Download the actual page
    //
    resp = UrlFetchApp.fetch("http://www.seek.com.au/...");
    if (resp.getResponseCode() !== 200) {
    return null;
    }

    //
    // Get contents of an element in HTML
    //
    var xml = Xml.parse(xmlText, true);
    if (xml === null) {
    return null;
    }

    var root = xml.getElement();
    if (root === null) {
    return null;
    }

    // return inner text of HTML element, containing number of jobs
    return root.body.h1.strong.getText();
}

//
// Add a row to the spreadsheet
//
var doc = SpreadsheetApp.openByUrl('https://docs.google.com/spreadsheet/...');
if (doc === null) {
    return;
}
doc.setActiveSheet(doc.getSheetByName('Seek Scraper Test'));

var now = new Date();
var dateStr = [now.getDate(),
               now.getMonth()+1,
               now.getFullYear()].join('.')
            + ' '
            + [now.getHours(),
               now.getMinutes(),
               now.getSeconds()].join(':');

var jobsCount = getJobsCount();

if (jobsCount !== null)
    doc.appendRow([dateStr, jobsCount]);

You can schedule a recurring execution of your scripts - for example to set certain function to run automatically every hour, on Google's side, without you having to maintain any infrastructure.

Here's a neat example of some data this simple script has collected over a period of time:

Overall, I am quite pleased with the result. Although some people seem to be raising concerns about the stability of the platform and API, I think it's great for little tools like this one.

Maybe you'll have to invest a bit more time into error reporting and logging, but overall the platform looks promising.

Subscribe and never miss a new article

If you liked this article and want to get more helpful updates - sign up to my mailing list below.


comments powered by Disqus