Spatiotemporal Variation in Travel Regularity

It seems life and work is less than compatible with #content generation for this blog, so ‘why don’t I just’, I figure, ‘write a short piece about the research I have been doing in all this time’. I also figured this was a good idea, so here is the first in what I’m going to tentatively call a series of short descriptions of recent publications I have been involved with. 

Disclaimer: I think transportation studies, at the risk of vastly overgeneralising matters, is a bit too interested with measuring ‘usual’ conditions. I can’t blame people who do that sort of research – we are still suffering with strained transportation systems, congested, crowded, and polluted, that are damaging to economies, to health, and wellbeing. But in focusing too much on ‘usual’ patterns, we miss the effect of disruption and rapid change, and once we start considering those, maybe there is no ‘usual’ after all.

This particular piece of research focused on the spatial and temporal appearance of ‘usual’ and ‘unusual’ travel behaviour on the London public transport system. Through a long-term collaboration with Transport for London we analysed around 1 billion Oyster Card transactions, and constructed patterns of ‘usual’ travel behaviour at the (anonymous) individual scale. These ‘usual’ patterns of behaviour are constructed through a simple DBSCAN clustering of data points over space and time, with these clusters indicative of regular activity. We make a simple assumption that if a person appears at this space and time on a regular basis (as defined by the clustering algorithm), then that is a ‘usual’ activity for that individual. Most importantly, perhaps, the classification of ‘usual’ activity at an individual scale, allows us to identify spatial and temporal occurrence of ‘unusual’ activity. This differentiation enables us to understand travel behaviour from a different point of view.

There are various findings from this work, and these can be better explored in the paper itself, but here are a few highlights.

For me the most interesting snapshot is the spatial variation in ‘usual’ Underground journeys. The maps show us proportions of ‘usual’ journeys occurring in each Underground station, relative to ‘unusual’ journeys. And, in fact, we see quite a lot of variation in this behaviour. Commuter belt zones in outer London, such as Sudbury and Roding Valley, demonstrate high proportions (55% to 75%) of regular journeys. This suggests that people are more regularly undertaking journeys to work from these locations, as opposed to other activity, and perhaps work in industries with strong ties to defined working hours. These areas, I would expect, have fewer attractions that lead to ad hoc journeys to be conducted there. On the opposite end of the spectrum are placed like Covent Garden, which demonstrate low levels (below 30%) of regular activity. This is in part due to these areas being full of tourist attractions, restaurants, and other reasons for trips to be conducted there at a variety of times of day. Also ranking low at the airports, Heathrow and City. This is no surprise, relatively few people work regularly at this locations, but provides us with some validation of the approach used here.

In the paper, we also explore temporal variation by different modes, exposing some quite diverse uses of different travel modes. As expected, each mode shows higher regularity during peak times, however, the extent to which ‘unusual’ travel fills that gap varies by mode. On the bus, for example, regular journeys are observed at a steady rate (around 75%) throughout the day, whereas on the Underground we see a drop to below 40% during the same period, as shown in the chart below.

There is an extent to which this work exposes some of the trends we already sort of understood. But I think this sort of analysis goes much further than this, and allows for significantly more nuance in the way we manage transportation systems. If we’re able to better understand how people are using the transportation systems at finer spatial and temporal detail, then we can potentially develop policies and management strategies that match that granularity. Through this deeper understanding, our transportation systems can build in more depth and resilience to disruption and change.

The paper was published in Transportation in 2018 (it was, err, actually completed quite a long time before that…), and is available open access from this link if you want to find out more.

Understanding Cities through Individual-Level Data – Opportunities and Challenges

As it’s been a while since I last posted, I thought I’d put up something I prepared for a Royal Society Smart Cities and Transportation workshop next week. I’ve focussed on data collected at the individual-level, and the opportunities the data present for better understanding cities, and the challenges the maximisation of these resources face. There are no doubt alternative perspectives, arguments that go deeper beyond this very short piece, and methodological issues too to contend with. Feel free to add your thoughts in the comments at the end.

 

As the creation, capture and accumulation of granular datasets becomes increasingly engrained within the urban environment, the potential for analysing urban processes in finer and finer detail increases. New forms of data are being generated at spatial, temporal and individual-level scales that surpass all that have gone before. These data transcend the boundaries that previously imposed on analyses of cities – traffic flow can be captured on a second-by-second basis road-by-road, crime incidents are habitually recorded with a longitude and latitude, and commuting patterns can be captured live through the movements of mobile phones. Through the development of a wealth of new methods, machine learning approaches are able to derive deeper insight from these data, revealing new patterns and understanding of cities than have been available before. It is, however, increasing granularity individual behaviours that offers the greatest promise, and poses the biggest challenges for future urban data analysis.

Data derived insights around the individual offer a chance to better understand the behavioural heterogeneity within the population across a range of domains, as well as revealing the complex interconnectivity of urban systems. Capturing these details at finer level could allow us to better measure and model cities, allowing us to improve our current conceptions on how we understand, manage and organise our cities.

The opportunities presented by individual-level analyses are plentiful. Longitudinal data allow us to learn how individuals adjust behaviour over different periods of time and under different conditions, and how they adapt to longer-term changes to the city. Within domains such as transportation, conventional models lack strong behavioural insights, failing to capture behavioural heterogeneity or measure how individual experiences and perceptions influence behaviour. The new lessons we can potentially learn from these data can not only aid our longer term models of urban futures, but contribute towards our management of cities on a day-to-day basis.

The individual-oriented nature of these analyses are able to transcend disciplinary boundaries through which cities have previously been understood and managed. At present, we lack a deep knowledge around the integration of different urban systems, and the influence of the urban realm upon these connections. We might, for example, be interested in the influence of travel on shopping behaviour, or on health, or crime patterns, but the potential interconnections extend far and wide. While conventional surveys provide good localised insight into these behaviours and systems, only through large scale data collection can these interconnectivities be observed across the whole population and entire urban area. The improved understanding of the people and systems that make up the urban realm offers considerable potential for those operating and optimising cities.

Despite the promise, there are considerable challenges to capitalising on these opportunities – underlined primarily by the fact that many of the datasets that could advance our understanding of cities already exist. At the individual scale, longitudinal travel behaviour can be captured by smart card transactions, many retail transactions are captured via loyalty cards, and mobile phones tracked from cell tower to cell tower. There is, however, little opportunity for joined up thinking, as many of these datasets exist within silos, accessible to interested parties only in exchange for a considerable fee. The potential for asking new questions, discovering new insights, and crossing urban systems and disciplines is restricted by commercial confidentiality. Crossing these boundaries requires leadership and openness from business and government, where too often, siloed within their own priorities, perspectives and worldview, a wider vision or motivation for an improved city is lacking.

Beyond structural challenges, however, there are questions of morality, and how far data collection and analysis should be deployed for the purpose of urban development. When one starts to generate data at the individual level, the risk of de-anonymising individuals becomes very real. Data analysts have already proven this in various contexts, using datasets cleared for public release – from the identification of individuals from the movements of their mobile phones, to the identification Netflix users from their viewing habits, to establishing whether celebrities tipped their taxi driver or not. These analyses may have been conducted for benign reasons, but they illustrate the point that the opportunities for revealing identities from data traces sharply increase as data collection reaches individual-level granularity. The questions therefore become how far should these analyses extend, what constraints (if any) should be placed on data collection and analysis to ensure anonymity, and how should methods and results be communicated to the public. At present, there is little guidance from government and seemingly little leadership beyond. Without due consideration given to the treatment of these issues, there is a risk that public trust in data collectors and analysts will be eroded, risking the imposition of limiting constraints on how these data are exploited in future.

Mapping Connected Places on London’s Public Transport Network

I haven’t written much on this blog about the work I’m currently doing at UCL CASA.  As a Research Associate working on the Mechanicity with Mike Batty, I’m tasked with drawing meaning out of a massive dataset of Oyster Card tap ins and tap outs across London’s public transport network.  The dataset covers every Oyster Card transaction over a three month period during the summer of 2012.  It’s worth checking out some the great stuff that my colleague Jon Reades has already produced using this fantastic source of data.

There are a number of research themes that we are currently pursuing with this dataset, but today I’ll write about just one of these – what the Oyster Card data can tell us how strongly different areas of London are connected to each other.

Most Popular Destinations

For this initial exploration I just want to keep it simple, and use quite a basic metric for assessing how associated two places are.  What we do here is look at the most popular destination station for each origin location.  So, using the big dataset of Oyster Card transactions (here is the Oyster contact number for support), we pull out the most likely end point for any traveller beginning their journey at any given station on London’s public transport network.

We are focussing here on only Underground, Overground and rail travel in London, obviously by Oyster Card alone.  Bus trips are unfortunately not covered because of the way the Oyster Card works.  Yes that mean you will need to pay for those Bus Tours to New York from Halifax outright. Within this dataset I have extracted only the most popular destinations for each origin between 7am and 10am on weekday mornings.  The dataset covers a total of 48.9 million journeys over 49 weekdays, so averaging at around 1 million morning peak trips per day.  In focussing only on the morning commuter influx into London, we exclude any ambiguity that might come with including bidirectional flows of travellers.

The map below shows the connections formed between all London stations and their most popular destinations.  A link has been drawn between the two places, and the link and points coloured according to the destination.  Each destination is given a unique colour.  If you click on the image below you’ll get a full screen version, and be able to switch to an annotated version of the map.

Map showing the most popular destinations by origin, derived from a large dataset of morning peak Oyster Card trips
Map showing the most popular destinations by origin, derived from a large dataset of morning peak Oyster Card trips

Map showing the most popular destinations by origin, derived from a large dataset of morning peak Oyster Card tripsThe map itself is made using Gephi – an open-source network analysis package with some excellent visualisation capabilities – and is supported with a bit of good old data crunching to get at these popular destination figures.

What Does The Map Show?

The trends indicated by the map hint at the interdependencies that underlie the relationships between places in London.  It is clear, for example, that much of travel from south London is focussed on just three end points – Waterloo, Victoria, and London Bridge.  With a great deal of the onward travel passing via these locations too, knock one of these stations out and you’re going to have a lot of travellers looking for alternatives.

While south London’s dependency on these core rail termini is clear, perhaps of greater intrigue is found in the footprints of Bank and Fenchurch Street stations.  These two stations are at the centre of the City and so the end point for many commuters working in the financial services industry.  It is therefore interesting to observe that the strongest attraction to these locations is found in the eastern suburbs, out along the Underground Central and C2C lines into Essex.  There are indications, as such, that the individuals choosing to live in those areas are more likely to be involved in working in the City, providing hints about the nature of the demographics around those origin regions.

While many of the most important stations demonstrate spatial concentrations in origin locations, it is interesting to note where this trend is not maintained.  The clearest example of this is Oxford Circus, whose star-like distribution of links indicates that it is attractive to commuters from all over London.  Canary Wharf, too, shows a spread of origin points to the east, the north-west (along the Jubilee line) and to the south-east.  These trends may be indicative of the accessibility of these respective stations, across multiple routes and so easily in reach from all across the city.

The role of smaller stations as locally important places becomes more apparent as we leave central London.  Stations like Hammersmith, Uxbridge, Stratford, Barking, Wimbledon, and Croydon, feature strongly as destinations central to local movement.  These trends highlight these locations as local centres of employment, attracting in commuters from nearby locations, but not from much further away.

Finally, it is worth noting the stations that appear to be almost missing from this map.  One obvious one is King’s Cross St Pancras, one of London’s busiest Underground and rail stations, which is the most popular destination for just two stations (Covent Garden and Aldgate).  The reason for this is that this may not be where people end their trips.  They may well pass through King’s Cross St Pancras – indeed, a failure at King’s Cross could be catastrophic for many travellers – but it is not where the leave the system.  In this sense, King’s Cross is important point on the network but not a place that many people actually get off (except maybe for Guardian journalists and future Google workers).

 

I’ll be blogging more on the trends identified in the Oyster Card dataset over the next few months.  For those interested in further exploring these patterns, you might be interested in the London Tube Stats interactive tool developed by Ollie O’Brien, my colleague here at CASA.  Ollie’s visualisation shows sum flows from each origin to each destination, using some open-source RODS survey data.

 

Smart London and Future Data

Since my last blog post back in February 2013, I have written, submitted and defended (!) a PhD thesis, and moved jobs.  It’s been a busy year, but hopefully 2014 will see a revisit of the heady days of 2012, where blog posts were fresh and a-plenty.  In case you possibly want to talk to me, I’m now installed as a Research Associate at UCL CASA working on the MECHANICITY project (although still honorarily linked in with my friends and colleagues over at the UCL SpaceTimeLab).  Now onto business matters.

 

One thing I’ve been involved with since I moved over the CASA is contributing to a new UCL-led book on the future of London.  Imagining the Future City: London 2062 does, as you might have gathered from its title, explore how London might look in, you guessed it, 2062.  It’s been pulled together by Sarah Bell and James Paskins, and features quite a wide range of interesting contributions from all across UCL.

It’s fully open access so do check it out. Available here as a PDF, or here as an e-pub (whatever that is).  Of course, the first thing to strike you will be what a beautiful front cover image they’ve selected, and surely remark at the skilled hand of the creator – oh yeah, that was by me

The CASA-led contribution was mainly contributed by Mike Batty, but with input from Richard Milton, Jon Reades and myself.  We specifically address how the inevitable growth in the volume and breadth of data might impact on how we understand, model and manage London moving into the future.  Our ability to understand the intricacies of how cities work has never been greater, with larger datasets allowing us to explore patterns of behaviour at a highly granular scale.  This is essentially what we spend our time doing at CASA, and I’ll try to highlight more examples of this work over the coming months.

A Big Data Backlash?

What I think is interesting to consider (that isn’t so much touched upon within the chapter) is how this trend may develop, moving into the future.  There is a general assumption that data will become bigger and bigger, expanding ever further our understanding, and potentially our control too, of the city.  Yet I remain sceptical about the extents to which citizens will continue to accept external agencies overseeing their everyday behaviours and movements.

While the NSA PRISM debacle hasn’t prompted, as far as I can see, any significant widespread discontent, small shifts towards privacy-conscious organisations (for example, growth in DuckDuckGo use) twinned with a growing unease around the actions of larger organisations (for example, Facebook leavers) are an indication that people are at least beginning to think about how much others know about them. Whether this sentiment expands more widely will remain to be seen.  A perfectly valid alternative argument may be that there is a entire generation growing up now who have never not known the existence of the Internet, a factor that potentially influences their opinion of what is and isn’t considered private.  Equally, many may, and probably do, consider a reduction in privacy to be acceptable given increasing functionality and service.  It will be interesting to observe how far this trade-off can be pushed over the coming decades.

Video Time

These are some of the topics I tried to convey in the video interview I gave as part of the London 2062 book launch, as you can watch below.  Big credit to Rob Eagle at UCL Comms for some excellent editing work, moulding my ramblings into something comprehensible!

[youtube http://www.youtube.com/watch?v=5VPwEBTBcLU]