Big Data for London traffic analysis

Employees of Datatonic, a Europe-based data analytics consultancy, recently participated in a week-long hackathon (“Data in Motion Hack Week”) organized by Traffic for London (TfL), that city’s official transport authority. As you might expect, the goals of the hackathon included stimulating developer creativity to overcome, through innovative use of public-cloud infrastructure and open data, high-priority TfL challenges such as limited overall transport capacity, endemic road congestion and air-quality degradation. (Whether you’re a resident of London, the San Francisco Bay Area, Rome, Sao Paulo, or Beijing, you can probably relate to these challenges.)

Most of the other teams chose to focus on data mashups or visualizations to give London residents information for making better route decisions during their commutes. The Datatonic hackers, in contrast, looked to machine learning (ML). By augmenting real-time data visualization with an ML model, they found they could predict areas of congestion during the morning and evening commutes, which currently stand at 30 million daily journeys, and more than 1 million net-new journeys expected by 2018. Their solution uses Google Cloud Platform (GCP) for storage and data processing and provides insights based on 3 months of data from 14,000 traffic sensors across London, amounting to well over 100 billion rows. (From this dataset, 8 days of data from 300 sensors were used for model-training purposes.)

For details about the Datatonic team’s methodology, architecture and results, see these blog posts:



Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s