As it’s been a while since I last posted, I thought I’d put up something I prepared for a Royal Society Smart Cities and Transportation workshop next week. I’ve focussed on data collected at the individual-level, and the opportunities the data present for better understanding cities, and the challenges the maximisation of these resources face. There are no doubt alternative perspectives, arguments that go deeper beyond this very short piece, and methodological issues too to contend with. Feel free to add your thoughts in the comments at the end.

 

As the creation, capture and accumulation of granular datasets becomes increasingly engrained within the urban environment, the potential for analysing urban processes in finer and finer detail increases. New forms of data are being generated at spatial, temporal and individual-level scales that surpass all that have gone before. These data transcend the boundaries that previously imposed on analyses of cities – traffic flow can be captured on a second-by-second basis road-by-road, crime incidents are habitually recorded with a longitude and latitude, and commuting patterns can be captured live through the movements of mobile phones. Through the development of a wealth of new methods, machine learning approaches are able to derive deeper insight from these data, revealing new patterns and understanding of cities than have been available before. It is, however, increasing granularity individual behaviours that offers the greatest promise, and poses the biggest challenges for future urban data analysis.

Data derived insights around the individual offer a chance to better understand the behavioural heterogeneity within the population across a range of domains, as well as revealing the complex interconnectivity of urban systems. Capturing these details at finer level could allow us to better measure and model cities, allowing us to improve our current conceptions on how we understand, manage and organise our cities.

The opportunities presented by individual-level analyses are plentiful. Longitudinal data allow us to learn how individuals adjust behaviour over different periods of time and under different conditions, and how they adapt to longer-term changes to the city. Within domains such as transportation, conventional models lack strong behavioural insights, failing to capture behavioural heterogeneity or measure how individual experiences and perceptions influence behaviour. The new lessons we can potentially learn from these data can not only aid our longer term models of urban futures, but contribute towards our management of cities on a day-to-day basis.

The individual-oriented nature of these analyses are able to transcend disciplinary boundaries through which cities have previously been understood and managed. At present, we lack a deep knowledge around the integration of different urban systems, and the influence of the urban realm upon these connections. We might, for example, be interested in the influence of travel on shopping behaviour, or on health, or crime patterns, but the potential interconnections extend far and wide. While conventional surveys provide good localised insight into these behaviours and systems, only through large scale data collection can these interconnectivities be observed across the whole population and entire urban area. The improved understanding of the people and systems that make up the urban realm offers considerable potential for those operating and optimising cities.

Despite the promise, there are considerable challenges to capitalising on these opportunities – underlined primarily by the fact that many of the datasets that could advance our understanding of cities already exist. At the individual scale, longitudinal travel behaviour can be captured by smart card transactions, many retail transactions are captured via loyalty cards, and mobile phones tracked from cell tower to cell tower. There is, however, little opportunity for joined up thinking, as many of these datasets exist within silos, accessible to interested parties only in exchange for a considerable fee. The potential for asking new questions, discovering new insights, and crossing urban systems and disciplines is restricted by commercial confidentiality. Crossing these boundaries requires leadership and openness from business and government, where too often, siloed within their own priorities, perspectives and worldview, a wider vision or motivation for an improved city is lacking.

Beyond structural challenges, however, there are questions of morality, and how far data collection and analysis should be deployed for the purpose of urban development. When one starts to generate data at the individual level, the risk of de-anonymising individuals becomes very real. Data analysts have already proven this in various contexts, using datasets cleared for public release – from the identification of individuals from the movements of their mobile phones, to the identification Netflix users from their viewing habits, to establishing whether celebrities tipped their taxi driver or not. These analyses may have been conducted for benign reasons, but they illustrate the point that the opportunities for revealing identities from data traces sharply increase as data collection reaches individual-level granularity. The questions therefore become how far should these analyses extend, what constraints (if any) should be placed on data collection and analysis to ensure anonymity, and how should methods and results be communicated to the public. At present, there is little guidance from government and seemingly little leadership beyond. Without due consideration given to the treatment of these issues, there is a risk that public trust in data collectors and analysts will be eroded, risking the imposition of limiting constraints on how these data are exploited in future.

Map showing the most popular destinations by origin, derived from a large dataset of morning peak Oyster Card trips

I haven’t written much on this blog about the work I’m currently doing at UCL CASA.  As a Research Associate working on the Mechanicity with Mike Batty, I’m tasked with drawing meaning out of a massive dataset of Oyster Card tap ins and tap outs across London’s public transport network.  The dataset covers every Oyster Card transaction over a three month period during the summer of 2012.  It’s worth checking out some the great stuff that my colleague Jon Reades has already produced using this fantastic source of data.

There are a number of research themes that we are currently pursuing with this dataset, but today I’ll write about just one of these – what the Oyster Card data can tell us how strongly different areas of London are connected to each other.

Most Popular Destinations

For this initial exploration I just want to keep it simple, and use quite a basic metric for assessing how associated two places are.  What we do here is look at the most popular destination station for each origin location.  So, using the big dataset of Oyster Card transactions, we pull out the most likely end point for any traveller beginning their journey at any given station on London’s public transport network.

We are focussing here on only Underground, Overground and rail travel in London, obviously by Oyster Card alone.  Bus trips are unfortunately not covered because of the way the Oyster Card works.  Yes that mean you will need to pay for those Bus Tours to New York from Halifax outright. Within this dataset I have extracted only the most popular destinations for each origin between 7am and 10am on weekday mornings.  The dataset covers a total of 48.9 million journeys over 49 weekdays, so averaging at around 1 million morning peak trips per day.  In focussing only on the morning commuter influx into London, we exclude any ambiguity that might come with including bidirectional flows of travellers.

The map below shows the connections formed between all London stations and their most popular destinations.  A link has been drawn between the two places, and the link and points coloured according to the destination.  Each destination is given a unique colour.  If you click on the image below you’ll get a full screen version, and be able to switch to an annotated version of the map.

Map showing the most popular destinations by origin, derived from a large dataset of morning peak Oyster Card trips

Map showing the most popular destinations by origin, derived from a large dataset of morning peak Oyster Card trips

Map showing the most popular destinations by origin, derived from a large dataset of morning peak Oyster Card tripsThe map itself is made using Gephi – an open-source network analysis package with some excellent visualisation capabilities – and is supported with a bit of good old data crunching to get at these popular destination figures.

What Does The Map Show?

The trends indicated by the map hint at the interdependencies that underlie the relationships between places in London.  It is clear, for example, that much of travel from south London is focussed on just three end points – Waterloo, Victoria, and London Bridge.  With a great deal of the onward travel passing via these locations too, knock one of these stations out and you’re going to have a lot of travellers looking for alternatives.

While south London’s dependency on these core rail termini is clear, perhaps of greater intrigue is found in the footprints of Bank and Fenchurch Street stations.  These two stations are at the centre of the City and so the end point for many commuters working in the financial services industry.  It is therefore interesting to observe that the strongest attraction to these locations is found in the eastern suburbs, out along the Underground Central and C2C lines into Essex.  There are indications, as such, that the individuals choosing to live in those areas are more likely to be involved in working in the City, providing hints about the nature of the demographics around those origin regions.

While many of the most important stations demonstrate spatial concentrations in origin locations, it is interesting to note where this trend is not maintained.  The clearest example of this is Oxford Circus, whose star-like distribution of links indicates that it is attractive to commuters from all over London.  Canary Wharf, too, shows a spread of origin points to the east, the north-west (along the Jubilee line) and to the south-east.  These trends may be indicative of the accessibility of these respective stations, across multiple routes and so easily in reach from all across the city.

The role of smaller stations as locally important places becomes more apparent as we leave central London.  Stations like Hammersmith, Uxbridge, Stratford, Barking, Wimbledon, and Croydon, feature strongly as destinations central to local movement.  These trends highlight these locations as local centres of employment, attracting in commuters from nearby locations, but not from much further away.

Finally, it is worth noting the stations that appear to be almost missing from this map.  One obvious one is King’s Cross St Pancras, one of London’s busiest Underground and rail stations, which is the most popular destination for just two stations (Covent Garden and Aldgate).  The reason for this is that this may not be where people end their trips.  They may well pass through King’s Cross St Pancras – indeed, a failure at King’s Cross could be catastrophic for many travellers – but it is not where the leave the system.  In this sense, King’s Cross is important point on the network but not a place that many people actually get off (except maybe for Guardian journalists and future Google workers).

 

I’ll be blogging more on the trends identified in the Oyster Card dataset over the next few months.  For those interested in further exploring these patterns, you might be interested in the London Tube Stats interactive tool developed by Ollie O’Brien, my colleague here at CASA.  Ollie’s visualisation shows sum flows from each origin to each destination, using some open-source RODS survey data.

 

Identifying Communities in Traffic Flow

One recent bit of research I have been working on has been looking at the application of community detection algorithms to traffic flow in London.

The idea is that within the traffic system exist a number of sub-systems of highly interconnected roads.  To a certain extent, these sub-systems are engineered into the system.  Transport for London, for example, specifically manage and maintain 23 key routes into and around central London, known as ‘corridors’.  However, to what extent do further systems exist outside of these defined zones?

Community detection algorithms were developed to identify clusters within a network dataset.  These methods are most often applied to examples within the social network sphere, in the identification of cliques, where a cluster demonstrates high inter-connectivity, with lower connectivity with the rest of the network.  My thinking behind this bit of work was that we might be able to identify similar characteristics in traffic flow, where we can observed high coupling between clusters of nodes.

The map below visualises the modules (distinguished by colour) identified through the application of community detection methods to a topological representation of the road network.  Node connectivity is established using a dataset of 1.5 million private hire cab routes through London.

NodeModularity_GrLondon_3_1k_newcred

The resulting visualisation, apart from being quite pretty (thank Gephi for that), reveal some interesting trends.  To a certain extent, a number of expected patterns in traffic flow are prevalent, with some of the ‘corridors’ into central London, such as the M3, M4 and A2, clearly defined as distinct clusters.  Yet the image also shows how both the M25, the ring road around London, and the North Circular, usually considered as single entities, can be segmentalised into modules defined by their usage.

We also see further interesting patterns in central London too, where certain regions – specifically Knightsbridge, Soho, Shoreditch the City and Hyde Park – are clearly defined as distinct modules.  These would appear to be areas of high internal movement, and thus a clear product of cab usage patterns.

These results, while presented only in their initial stages, demonstrate how measures of network characteristics can help us to understand dynamic patterns of movement in the city.

 

Edit

Thanks to all for the interest in this work!

Just by way of follow up, the image below shows a zoom in on Central London, demonstrating more clearly some of the regions mentioned above.  I’ve annotated this version for people who may not be familiar with London.

CentralLondonModularity_02_annotated

 

London 2012: Using Fear to Tame Transportation Demand

One of the biggest advantages, I feel, about studying urban transport phenomena in London is the simple ability to be able look out of the window and see what is actually going on.  This week, the Olympics and its (supposed) transportation chaos, came to London.

What has struck me early on, mainly since the introduction of the Games Lanes last week, is a big reduction in the number of vehicles on the road.  There have been reports of certain inevitable problems in various parts of the capital, but my experience has been a general reduction in demand on most roads (see a couple of photos I took below).  This sentiment has been shared by a number of my colleagues.  There has been no word yet from Transport for London as to whether the data is backing this up.

London 2012: Using Fear to Tame Transportation Demand

Second, the big public transport problems predicted at certain stations and at certain times, have no yet come to fruition.  Warnings were issued widely this morning about potential overcrowding at a number of stations, yet early reports suggest that this is far from the reality – the Guardian highlight a number of citizen reports of empty Tube seats and quiet stations this morning.

London 2012: Using Fear to Tame Transportation Demand

Typical fear-inducing GetAheadOfTheGames literature (copyright Transport for London 2012)

It appears that the strategy has worked.  In fact, one might even suggest that it has worked better than expected.  I would say that this is partly down to the impact of irrationality, specifically the impact of fear.  Individuals, scared of potentially having to wait considerable amounts of time at stations only to cram into packed Tube trains, or fearful of long queues on the roads, have changed their habitual plans en masse.

Social Phenomena

The effect has gone to demonstrate, at least to me, the impact that small changes in the behaviour of many individuals can have on the nature of the city.  As individuals, we make a choice, we carry out that action, and we are mostly unaware of the impact that decision has on shaping broader phenomena.  Yet, in observing the patterns these many individuals make, we can begin to see how individual and social attitudes impact on shaping transportation flows.

This relationship, specifically the impact that fear has had in the context of the Olympics, appears to have caught some analysts on the hop.  INRIX, a big transport data provider, predicted earlier in the year the ‘perfect traffic storm‘ in traffic demand during the first few days of the Games (reported in more detail here).  This patently failed to happen.  The models INRIX employed in making these predictions clearly failed to make consideration for the impact that fear would play in reducing traffic demand.  This approach is far from uncommon where transport demand modelling is concerned.

The Games have a long way to run yet, and we may well see a counter movement occur in time as people begin to realise that transportation isn’t as bad as first expected.  But I think the impact that fear has held on shaping, at least, the first few days of transportation flows makes for interesting viewing.

I’ve always had a problem with the pervasive assumption in transportation research that everyone takes the shortest metric distance path when travelling between A and B.  This idea doesn’t seem to have any solid foundations in research, and intuitively it doesn’t make much sense – how do you even know what the shortest distance path is anyway?

So a good deal of my research has looked into what people really do. I’m not going to reveal all here – journal papers are generally more important than blogs in assuring future employment – but I’ll share one interesting finding.

The data I have used relates to 700,000 taxi routes through London (you might remember I blogged about this dataset previously).  For each of these routes, between origin and destination, I have also calculated an optimum path, according to a range of metrics, one being distance.  Then, as far as this blog post goes, I have compared each route and calculated the percentage match between the real route and the optimum shortest distance journey.

Realistic?

So is the shortest distance path a decent representation of reality?  No.

On average, the shortest distance path is able to estimate only 39.8% of each route.  Pretty poor when you consider that it is often used solely in predicting the behaviour of many individuals.

Not only this, the data shows that the shortest distance path is followed in entirety only very rare occasions.  Only 5% of real journeys show a match with 90% of their equivalent shortest distance path, with this value only rising to 13% when that threshold is dropped to 75%.

Minimising Distance

So, do people have no consideration for distance when they route through the city?  Well, no, that isn’t quite the case.

The graph below shows a scatter plot of real distances against actual distances.  As you can see, the relationship and resulting R-square is pretty good.

DistanceVsOptimalAll.PNG.scaled1000

Note: Overly long routes (three times optimal distance) have been removed.

It appears that people therefore appear to minimise distance – or they at least do not at least go extremely far from the minimal – but do not generally take the optimal shortest distance path.

This is research I’m still pulling together, but I hope this post has interest to the wider community.  For anyone that is interested, do get in touch and I’ll let you know when the paper on this may be out.

‘Modelling Movement in the City: The Influence of Individuals’ was the title of a talk I gave at the AGILE conference in Avignon, France last week.  For the conference I actually initially prepared a poster that never ended up seeing the light of day – except for now that is.

The poster presents some recent work I carried out through agent-based simulation, demonstrating how different behavioural models influence the formation of macroscopic patterns.  As you can see from the results, the impact of mere basic assumptions hold a significant impact upon the unfolding network picture.

Probably now going to write this up as a journal paper, but hopefully putting the poster up here won’t mess with any copyright stuff – please let me know if it might!

London Driver Survey

March 20th, 2012 | Posted by edmanley in Cities | Transportation - (0 Comments)

As part of building a fuller understanding of the way people move around the city by car, I’ve developed a survey to start delving into some of the lesser understood issues.

The survey looks at the extent of use of GPS and similar devices, behaviour around congested areas of the network and usage of traffic information.  The results will contribute towards the building of a better model of driver behaviour.

You can find the survey here – http://goo.gl/UDrFI

Please pass it on to all of the motorists in London that you know!

253843542_200

At the upcoming AAG conference in New York, I’ll be presenting a recent prototype that links agent-based simulation with current traffic flow models.

The basic premise is that any cognitive decision associated with movement around cities should be modelled at the level of the individual.  However, it is not always necessary that all movement be represented individually.  Doing so potentially wastes limited computational power, especially important where modelling many complex agents.

Instead, my new simulation utilises traffic flow modelling to constrain the movement of individual agents.  Individuals choose where they move individually, but physical movement itself is modelled collectively.  The higher the traffic flow on a single route, the slower each agent on that route will travel.  This approach is more efficient and allows a much larger scale of complex agent-based simulation.

I’ll provide more detail at AAG next Sunday, but the basic result is as above.

The simulation demonstrates traffic flows across central London.  There are 30000 agents of varying behavioural characteristics moving around this space.  Their movement decisions impact on the state of the network.

KEY:  The redder colours represent high traffic saturation aka queues and congestion, the blues and greens represent quiet or free flowing traffic conditions.

 

Mapping Taxi Routes in London

One major aspect of my research is spent looking into how people choose their routes around the city.  And to aid me in this, I managed to acquire a massive dataset of taxi GPS data from a private hire firm in London.  I’ve spent the last few months cleaning up the data, removing errors, deriving probable routes from the point data and extracting route properties.

It’s been a big job, but worth it.  I now have the route data of over 700,000 taxi journeys, from exact origin to destination, over the months of December, January and February 2010-11.  I’m now moving on to the actual analysis of this data, and am beginning to answer some of these questions concerning real-world route choice.  In the meantime, I thought I’d share one striking image that I extracted through this work.

The image below represents an aggregate of journeys on each segment of road on the London road network.  The higher levels of flow are illustrated in red, falling to orange, yellow, then white, with the lowest flow values shown in grey.

The most popular routes are along Euston Road, Park Lane and Embankment, which may be somewhat expected, but make for a stark constrast with respect to the flow of most traffic in London.  The connection with Canary Wharf comes out strongly, an indication of the company’s portfolio, though route choice here is interesting with selection of the The Highway more popular than Commercial Road.

Real insight will come with the full analysis of the route data, something that should be completed in January.  Until then, though, I’ll just leave you with this pretty something to look at.

Top 2%

At the very broadest scope, Space Syntax can be said to investigate the relationship between movement and the configuration and connectivity of space.  In the past, while much favour has been found in the approach, critics have been distrustful of the axial line concept and of the representation of road segments as nodes in a network.  The construction of the network too, the process of drawing a network of longest lines of sight, has been seen to be unscientific.  Although I personally feel this to be a weak argument against Space Syntax in general, it’s acceptance into the wider research community may be hampered by this fact.

By way of a response to this argument, either intentionally or otherwise, there has been a movement towards segment-to-segment angularity (known as Angular Choice) as a predictor of movement.  The method is described by Turner in this paper, but in summary it is a calculation of betweenness on each network segment using the angular deviation between segments as the weight on which to calculate a shortest path.  The higher scoring segments, therefore, are those which are on a larger number of shortest angular paths passing over them.

One implication of this approach is that it a better fit for through-movement, that is an indicator of the routes we’re likely to use when moving from A to B.  This fits with what has been identified in other literature (particularly spatial cognition) where least angular change is identified as a driver of choice, notably in favour of pure metric distance.

So with a view to better understanding this relationship between the reality and angular choice, I wanted to compare the networks we find in the city and those indicated by this measure.  The first step was to draw out what traffic planners view as the most important roads on the network.  These are the roads identified in network as ‘Motorways’ and ‘A Roads’ (e.g. the ‘main’ roads), and as defined by the Department for Transport.  These were extracted and are as shown below:

This slideshow requires JavaScript.

The top 2% of these measures immediately draw out many of the most used and most well-known roads in London.  The M25 is prominent, as is the North Circular and various corridor roads into the city.  At 5% there is more definition of some of the other key roads, and by 10% we have a network that is quite similar to the map of ‘main’ roads in London.

By way of a statistically breakdown, the top 2% of values of the Choice measure predicts 76.3% of all ‘Motorway’ segments and 28.4% of all ‘A Roads’.  By 10%, these values have risen to 87.4% and 75.4% of all segments, respectively.  It is therefore clear that there is a correlation between this network measure and the definitions applied to the network.

I realise that this is a somewhat unrefined piece of work but I’d welcome any comments and am happy to share more on my method and results for those who are interested.