I met up with Louise Crow after work yesterday – a great opportunity to geek at each other about NaPTAN and CIF data, amongst other things.
She reminded me about GitHub – and so I took a few minutes this morning to push TSDB Explorer up there. If you want to have a look around the code, it’s all here. I’ll try to come up with some sample data in the next week or two, and maybe a working demo if I get time.
Author: Peter Hicks
CIF parsing
Wow, what a lot of data. And what an absolute dog ActiveRecord is for inserting data en-masse! Still, I have a CIF extract of all London Overground services being imported on my laptop as I write this.
I’m excited. A working proof-of-concept is not far off…
Open Rail Data
Jonathan Raper and I gave presentations on Open Rail Data – Jonathan from a more political angle, and me from a decidedly technical angle.
The material went down really well – there’s plenty of scope for us to show what can be done if timetable, real-time running and fares data is made openly available. I thoroughly enjoyed delivering the presentation – I haven’t done that since Berlin in 2006, and I’d forgotten how easily I slip in to “presenter mode”.
Here is a copy of my OpenTech 2011 presentation in PDF format if you’re interested. Or, if you simply want to get in touch, peter.hicks@opentraintimes.com.
I’m celebrating this evening with a curry.
Google Maps' Data Quality
Harry Wood pointed out that Google Maps has removed Camden Town tube station from its map.Whilst I doubt Google have done this intentionally, it has set me thinking about data quality.
When developing TransportHacker (which isn’t live yet, there aren’t enough hours in the day!), I noticed the M25 was named “Autoroute Britannique M25”. It’s been corrected now, but how on earth did that one slip by?
More data quality issues (which may have been fixed by the time you read this):
- Upper Holloway station has three icons – the Underground roundel, the Overground roundel, and the National Rail symbol. Click the Underground/Overground (Wombling Free?) icon, and you see it’s actually from the bus stop outside the station
- Hop down to Highbury Corner, and you can see that Highbury and Islington station has the Underground and Overground roundels, but no National Rail symbol. Click on the roundels, and you’ll see that – yes – National Rail trains do serve the station
- Examine, if you will, The Famous Cock. On Google Maps, it’s between Starbucks and Flight Centre. Google Streetview shows no Famous Cock there – in fact, it’s right next to Highbury and Islington station
- Finally, what is White Stadt? I think it should be White City…
Here lies the danger with processing large sets of data – do you know they’re correct?
Speeding up Ruby on Rails' ActiveRecord INSERT rate
A project that I’m working on (OK, it’s TSDBExplorer) generates a metric shedload of database rows. For a record that says “This train runs between 01-01-2011 and 31-05-2011 on Mondays – Fridays”, the code generates a timetable for each day. It takes an age to import, and I hope it’s going to be fantastically quick at querying data.
There’s a big downside with ActiveRecord out-of-the-box – it takes a long time to INSERT a record in to a MySQL database. I left some INSERTs going at 9.50am, and they’d just about finished when I got back from the gym five hours later. 1.2 million rows in five hours is shockingly poor.
activerecord-import appears to solve the problem in the least impact way. To group up your INSERTs, you create new instances of a model object – say, Association. You push these in to an array, and then use the new ‘import’ method to do a mass INSERT.
I am quite happy at 62 minutes to insert 1.2 million rows, including processing, considering it’s an activity that only needs to be done twice, maybe three times a year.
National Fail Enquiries
Whilst I wholeheartedly support National Rail Enquiries’ aggregation of live train running data and disruption information, sometimes it can be wholly inaccurate and present a misleading picture.
Suppose I am travelling from Highbury and Islington to Shoreditch High Street today. I know these stations are on the same line, so I visit the Live Departure Boards site. I am presented with a warning saying there are no train services from this station on Sunday 3rd April.
What? But there’s a list of trains to West Croydon and Crystal Palace that all stop at Shoreditch. I visit the link in the warning and find that, actually, there are no trains between Stratford and Acton Central. The map linked to is very helpful actually, and it shows the route with the disrupted section in red. But what’s missing? The link from Dalston Junction to Highbury and Islington. So, do I need to go to Dalston Junction to take my train now?
The answer is actually quite straightforward – the website is wrong, and I know this because I’ve looked up the departures from Shoreditch High Street and seen that they’ve all departed Highbury and Islington.
What on earth is Joe Public going to do when presented with conflicting and incorrect information? It’s no wonder a number of people I know get aggravated at the quality of disruption information.
TransportHacker and DATEX II
I’ve spent a couple of weeks wrestling with Nokogiri to parse a tonne of DATEX II data in to some usable format. Previously I had a mash of libxml and REXML, and the code was either ‘fast’ or ‘pretty’, but not both.
Nokogiri is good – it’s very good, in my opinion. The only trouble is, documentation and examples are a little thin on the ground, which slows everybody down. Here’s the dilemma – do I spend time writing poor documentation based on my limited understanding of part of Nokogiri, or leave it to somebody else?
Doubled Sided Printing
Having run out of blank A4 paper and needing to print something, I decided “Hey, I’ve got 24 sheets, I’ll print this 40 page document double-sided!”
Why is it so difficult for me to get my head around how to do this? There are many ways I could screw up – pages back to front (printing odd and even pages on the same side), pages upside down (printing odd pages in one direction, and even upside down on the back), printing the odd pages in order, but the even pages in reverse order (the first sheet having page 1 on the front, and page 39 on the back), offsetting the pages by one sheet…
I only misprinted four pages. I call this a success 🙂
Virgin Media in stealth Ofcom marketing tactic
The BBC are reporting that “Ofcom wants to ban misleading broadband speed ads”.
All well and good, but marketing a service as “up to 24Mbps” makes many people believe they will about 24Mbps. In reality, there is negative correlation between the length of a phone line and the speed you’ll get from ADSL. Is there a widespread user perception problem, or is it just down to marketing? (Remember the “Up to 50% off!” adverts you see on the high street – you may not realise that everything may be 10% off apart from one item which is discounted by 50%…)
What bothers me is that Ofcom claim Virgin Media – who provide a much different service using a wholly different technology – provide speeds much closer to their “up to” figures. This is wholly wrong – Ofcom are not comparing apples with apples. Virgin Media’s service is dissimilar to ADSL, with more equipment closer to users’ locations, and fewer limits on the amount of power they can shove down a piece of co-axial cable.
The BBC article goes further to make it look like your ISP has a lot to do with your “broadband speed”. The reality is that all ISPs using BT Wholesale’s DSL infrastructure for a phone line will get about the same speeds across that line to the exchange, but depending on the level of oversubscription and contention in their network, their customers may not reach that speed.
Is this clever stealth marketing from Virgin Media? Have Ofcom forgotten that Virgin are advertising their service as “fibre-optic broadband”, when it’s really a fibre-optic backbone and copper cabling to your house? Fibre is not broadband – it’s a very narrow, specific range of frequencies. Have BT, who are advertising their “fibre to the cabinet” (FTTC) service as “Infinity”, forgotten that there is no number greater than infinity, and they’ve just shot themselves in the foot marketing-wise?
Seven days of 3
A little over a week ago, feeling horribly overcharged and dissatisfied with Orange, I terminated my contract and moved over to 3.
The process was incredibly smooth. I ordered a 3 SIM, which arrived the next day along with a free PAYG SIM. Orange sent me my PAC. I called 3, gave them my PAC – but had the usual overjoyed offshore callcentre guy thank me far too much for moving – and two days later, my number moved.
I have had one problem with data in the last week, and I was in a low-signal area and had to restart my phone. I could get a 3G signal when in the middle of nowhere in Kent yesterday, and browsing when in London is really very quick and responsive.
I made a good choice, and it’s £30/month cheaper than Orange.
I am on a train at the moment, updating my blog using my 3 PAYG data connection, which is equally fast and a very usable tool for when I’m out and about.
3, you’ve made me happy.