19 May 2013
I've been manually tracking the number of .NET/C# jobs in Melbourne for some time now. I could've automated it, probably, but it was easier to do it just by hand - it took only a couple of minutes every day anyway. So I thought it wasn't worth it. Although I love automating things like that (sometimes even overly so), I still hesitated.
But recently I learnt about Google Apps Script service, and after taking a quick look at the docs I decided to implement a small tool for gathering job market statistics.
My goal was simple: get a search results page from seek.com.au, load it as a DOM, run a CSS select to find the element that I need and write the value of that element into a spreadsheet that I use for tracking/analysis.
Apart from small changes to the original plan (I had to use semicolons, apparently Apps Script hates you for not doing so, throwing random errors at you when debugging; I also decided not to load the whole page as XML, and only load a tiny bit), everything worked amazingly well.
I must say Apps Script's API is quite comprehensive; API docs are great too - it was quite easy to find the stuff I needed.
Below is a simple code sample, demonstrating parsing of some HTML:
function getJobsCount() {
//
// Download the actual page
//
resp = UrlFetchApp.fetch("http://www.seek.com.au/...");
if (resp.getResponseCode() !== 200) {
return null;
}
//
// Get contents of an element in HTML
//
var xml = Xml.parse(xmlText, true);
if (xml === null) {
return null;
}
var root = xml.getElement();
if (root === null) {
return null;
}
// return inner text of HTML element, containing number of jobs
return root.body.h1.strong.getText();
}
//
// Add a row to the spreadsheet
//
var doc = SpreadsheetApp.openByUrl('https://docs.google.com/spreadsheet/...');
if (doc === null) {
return;
}
doc.setActiveSheet(doc.getSheetByName('Seek Scraper Test'));
var now = new Date();
var dateStr = [now.getDate(),
now.getMonth()+1,
now.getFullYear()].join('.')
+ ' '
+ [now.getHours(),
now.getMinutes(),
now.getSeconds()].join(':');
var jobsCount = getJobsCount();
if (jobsCount !== null)
doc.appendRow([dateStr, jobsCount]);
You can schedule a recurring execution of your scripts - for example to set certain function to run automatically every hour, on Google's side, without you having to maintain any infrastructure.
Here's a neat example of some data this simple script has collected over a period of time:
Overall, I am quite pleased with the result. Although some people seem to be raising concerns about the stability of the platform and API, I think it's great for little tools like this one.
Maybe you'll have to invest a bit more time into error reporting and logging, but overall the platform looks promising.
If you liked this article and want to get more helpful updates - sign up to my mailing list below.