Enterprise Ireland – two more data sets

Following on from my earlier post on the raw data related to company totals and the raw combined spreadsheet, I am publishing two further data sets.

Last month I attended the ScraperWiki Hacks and Hackers event. During the day, and with the sterling efforts of two Python developers, a scraper was written. This scraper does something relatively simple (though was a little complex to write). It inputs each company into the EI website here and then outputs the Development Advisor (if any) for that company. It then puts that data into a .csv file. This data is raw and incomplete (many returned with no DA). If people want the code for the scraper do leave a comment.

Enterprise Ireland Development Advisors

The second scraper ran was Joe Drumgoole’s CRO scraper. A reader ran the scraper for us, and sent us the result. I am now publishing this also. I did run the result into a geocode batch analyser for Google Maps, and it was largely successful. But I am sure there are people reading this that can do cool mapping with this data and do so better than I can.

Enterprise Ireland company addresses

I again emphasise: we are publishing the raw data. Because the process was automated we cannot guarantee the results are 100% accurate. It does not purport to be a full representation, and if you want to use it you might have to spend a bit of time cleaning it up.

I can think of some nice research or visualisations that could result from it though.

5 thoughts on “Enterprise Ireland – two more data sets”

  1. tried out the croscraper, the CRO site has changed, joe confirmed for me that it doesn’t work anymore, a coder could fix it but probably best to look at using scraperwiki if you want to scrape some data as you’ve recommended ( its still in invite alpha).

  2. Gavin, any chance I could take a look at the scraper you were using, please? I’d like to see if I could use it on some other gov’t sites for data mining. Thanks!

Leave a Reply to steve white Cancel reply

Your email address will not be published. Required fields are marked *