Following on from my earlier post on the raw data related to company totals and the raw combined spreadsheet, I am publishing two further data sets.
Last month I attended the ScraperWiki Hacks and Hackers event. During the day, and with the sterling efforts of two Python developers, a scraper was written. This scraper does something relatively simple (though was a little complex to write). It inputs each company into the EI website here and then outputs the Development Advisor (if any) for that company. It then puts that data into a .csv file. This data is raw and incomplete (many returned with no DA). If people want the code for the scraper do leave a comment.
Enterprise Ireland Development Advisors
The second scraper ran was Joe Drumgoole’s CRO scraper. A reader ran the scraper for us, and sent us the result. I am now publishing this also. I did run the result into a geocode batch analyser for Google Maps, and it was largely successful. But I am sure there are people reading this that can do cool mapping with this data and do so better than I can.
Enterprise Ireland company addresses
I again emphasise: we are publishing the raw data. Because the process was automated we cannot guarantee the results are 100% accurate. It does not purport to be a full representation, and if you want to use it you might have to spend a bit of time cleaning it up.
I can think of some nice research or visualisations that could result from it though.
Hey,
Nice use of technology. If it’s ok I’d like to take you up on your offer to share the scraper code.
Thanks.
Would be really interested to hear more about this Gav…
Could you make the geocoded data available? I have a set of base maps in my GIS that I could overlay the locations onto.
tried out the croscraper, the CRO site has changed, joe confirmed for me that it doesn’t work anymore, a coder could fix it but probably best to look at using scraperwiki if you want to scrape some data as you’ve recommended ( its still in invite alpha).
Gavin, any chance I could take a look at the scraper you were using, please? I’d like to see if I could use it on some other gov’t sites for data mining. Thanks!