Big data versus traditional data sets

There are numerous questions over the validity and robustness of mobile phone data at the mesoscopic and microscopic level. These questions aren’t just being raised in the conference rooms of the UK, but in the meeting rooms of many of our clients

Paul O'Neill
27 April 2016
Data possibilities galore, but the only real way to use any data set is to take a pragmatic approach. Understand what you are collecting. This thought came to mind as I was sitting in a taxi in London, behind a bus and in front of a car, travelling at the same speed as a cycle and a pedestrian on the pavement. How would you know which mode I was using, and for what journey purpose? Photo Eddie Cloud
Data possibilities galore, but the only real way to use any data set is to take a pragmatic approach. Understand what you are collecting. This thought came to mind as I was sitting in a taxi in London, behind a bus and in front of a car, travelling at the same speed as a cycle and a pedestrian on the pavement. How would you know which mode I was using, and for what journey purpose? Photo Eddie Cloud
Traditional data collection methods have their challenges, but mobile phone data is unlikely to surpass traditional data in the modelling process. It most likely possesses the power to replace some elements of O-D surveys, but it should be viewed as one part of the model build process
Traditional data collection methods have their challenges, but mobile phone data is unlikely to surpass traditional data in the modelling process. It most likely possesses the power to replace some elements of O-D surveys, but it should be viewed as one part of the model build process

 

Intelligent Data has observed the change in the use of mobile phone data over the past few years with interest, wondering how this is going to 'change the industry forever', yet no one seems to be sticking their heads (and/or shoulders) above the parapet and suggesting the best way to harness these vast data sets. 

Yes, there are early adopters using mobile phone data but, as with any new technology, there appear – from our understanding and conversations within the market – to have been a few burnt fingers. This is not uncommon, there have been many technology products over the years which have promised much, but delivered less, as the expectation or appreciation of the end user cannot robustly validate the algorithms or hypothesis underpinning these products, or the products themselves cannot deliver. 

We have a number of very astute quizzical minds in our industry who love to ask questions, some of which seem unanswered: What exactly is being sampled? How do you expand this data? Can you really accurately triangulate that piece of data to a given section of road network accurately, bearing in mind this data is being derived over long periods of time? Can the data truly be factored using traditional data sets, or do we go back to a 'finger in the air' approach using our gut feelings?

Mobile phone data performance

Based on the feedback from Modelling World last year, and from various meetings I have had this year, at a more local level the data sets do not perform well. This thought came to mind as I was sitting in a taxi in London, behind a bus and in front of a car, travelling at the same speed as a cycle and a pedestrian on the pavement. How would you know which mode I was using, and for what journey purpose? 

In the meantime, Intelligent Data has pioneered real time ANPR data to allow data layering with big data sets to add accuracy and robustness to these data sets. When I asked a well-known mobile provider if it could provide me with data to calibrate with ANPR at a local level, the answer was a flat no!  That rejection aside, there is definitely a place for big data sets and the industry must embrace their power, but (and there is a large but), only if they provide something useable, robust and of value. We all remember the Klondyke-style goldrush during the birth of Bluetooth data a few years ago; there were quite a few prospectors but, sadly, not a lot of gold, even fewer sample rates and much confusion over what this equipment or data actually provides. 

The only real way to use any data set is to take a pragmatic approach. Understand what you are collecting. By understand, I mean really understand it, play around with the source (if you can) and ask yourself the question; is this fit for purpose? Yes, it may give you 300,000,000 records, but will it tell you the capacity of a junction? Extremely unlikely. Give you accurate O-D movements to MSOA level? Possibly. Can you derive a sample rate for it and relate to a given section of roadspace? Very unlikely. But the data may give you a good prior matrix in a model build, if used correctly with other data sources.

Data layering

The best way to use this data is to form a data layering system which can be linked to known robust data sets such as ANPR, which addresses the issues at the meso and microscopic level and enables full understanding and validation of the data you are intending to use.

Intelligent Data has taken this data layering approach one step further and developed an ANPR capability and back-office software to provide a direct link to the DVLA database, enabling a fully audited Government agency database to be the known quantum in the data layer system.

ID recently collected, processed and fully classified 10 million plates daily for a month to 90 per cent+ sample rates. Using the data layer approach will enable ANPR, as a known data source, to validate mobile phone data to given events (VRN/time/date) between matching points. Not only will this enable validation of mobile data, it will infill gaps in the data at a local level. Rather than relying on an algorithm, or a finger in the air, this is 100 per cent evidence-based data which will be used to give greater clarity to mobile data. 

The ANPR data locks a data layer in the matrix build process as a known quantity, and enables confidence to be given the wider data sets being used on any study.

Mobile phone data is unlikely to surpass traditional data in the modelling process. It most likely possesses the power to replace some elements of O-D surveys, but it should be viewed as one part of the model build process. The data sets need to work together, when you have a mobile data set with a 30 per cent sample of users, ANPR with >90 per cent sample at local level and other localised traffic data sets with close to 100 per cent sample rate the data needs to be integrated and traditional data sets have the power to unlock the potential of mobile phone data. Although when you have 10,000,000 ANPR records in a survey, perhaps traditional data sets are also seen as Big Data.

The road to new technology markets is littered with the corpses of great (untested) ideas. Hopefully mobile data won’t end up the same way. 

Paul O’Neill, Managing Director, Intelligent Data Collection Limited

For more information on data layering, real time ANPR or data integration please visit our stand at Modelling World or email: paul.oneill@intelligent-data-collection.com

Discuss this at Modelling World 2016

This Article is supported by:
Transport Services Manager
Rutland County Council
Rutland
£54,976 - £58,977
Transport Services Manager
Rutland County Council
Rutland
£54,976 - £58,977
Traffic Network Engineer
Portsmouth City Council
Portsmouth
£31,067 - £37,937
View all Vacancies
 
Search
 
 
 

TransportXtra is part of Landor LINKS

© 2024 TransportXtra | Landor LINKS Ltd | All Rights Reserved

Subscriptions, Magazines & Online Access Enquires
[Frequently Asked Questions]
Email: subs.ltt@landor.co.uk | Tel: +44 (0) 20 7091 7959

Shop & Accounts Enquires
Email: accounts@landor.co.uk | Tel: +44 (0) 20 7091 7855

Advertising Sales & Recruitment Enquires
Email: daniel@landor.co.uk | Tel: +44 (0) 20 7091 7861

Events & Conference Enquires
Email: conferences@landor.co.uk | Tel: +44 (0) 20 7091 7865

Press Releases & Editorial Enquires
Email: info@transportxtra.com | Tel: +44 (0) 20 7091 7875

Privacy Policy | Terms and Conditions | Advertise

Web design london by Brainiac Media 2020