July 2018 – The Urban Genome Project

A core idea of urban genetics is that cities are constantly making new versions of themselves. The degree to which the new version resembles the previous one varies. If there is no resemblance, then there is no genetic code to speak of; it is a world without memory or inheritance. If there is absolute fidelity, there is no evolution to speak of; it is a world without variation.

Actual cities exist somewhere on this continuum between volatility and stability, ferment and fidelity. They have a basic genetic code, inscribed in their physical forms and explicit or implicit expectations for how they are to be used, and by whom. These make tomorrow tend to resemble today. However, the script is always being rewritten, sometimes in small ways that accumulate over time into a larger transformation, sometimes in big changes that dramatically threaten to unsettle established routines.

This post explores these themes via the case of Toronto, using two methods: Markov modeling and sequence analysis. These are different ways of investigating temporality, which bring different aspects to the fore in ways discussed below. There are some significant overlaps to the kinds of trajectories Piccard can uncover. However, focusing on a single city for a somewhat shorter time period allows us to use a wider array of more fine-grained variables. Overall, the analysis reveals the urban genetic codes at work in Toronto that recreate its larger patterns of behavior, even as considerable change occurs within and along their borders.

The analysis is in part inspired by the work of Elizabeth Delmelle on “mapping the DNA of urban neighborhoods.” She has a nice summary of her procedure, reproduced here:

Here we follow a similar approach, using Toronto census data harmonized and merged from 1996 to 2011 to 2001 census tract boundaries. With 2016 data becoming available we hope to update the results soon.

This exercise used the following variables: population density; proportion age 25-34; 65+; married; renters; moved in the past year; black; south Asian; Chinese; Arab; Italian; Portuguese; Greek; management occupations; business, finance, and admin occupations; natural and applied science occupations; arts, culture, sports, and recreation occupations; service occupations; blue collar occupations; auto commuting to work; BA or higher degree; non-religious; detached housing, high rises, and median household income.

One could of course experiment with different variables, and any suggestions about others to consider would be welcome.

The next step, following Delmelle, is to standardize all the variables within each year. So there is a standard score for e.g. % renter in 96, 01, 06, 11 – relative to that year.

After that follows a kmeans cluster analysis. In order to be able to discern more subtle variations, we use k = 15 clusters, but compare these to the 3 cluster solution. This allows us to see the bigger more tectonic movements within which the smaller changes are occurring.

The below radar plots summarize the results, in groups of 5

Interpreting latent dimensions revealed by cluster analysis is always tricky. Here are tentative readings, which are open to revision:

Elite suburb
Chinese predominant ethnoburb
Working class Portuguese
Arab predominant
Working class Italian
Middle class creative
Dense lower status immigrant
Young urban professional
South Asian/Black predominant ethnoburb
Bobo
Middle class suburban
Older urban professional
Mixed lower middle class
Greek
Black working class

Now consider radar plots when k = 3.

The labels of the plot show a working interpretation, which corresponds to the three major cleavages we found in the context of studying the Political Order of the City.

Cluster 1: marginality. Larger concentrations of renters, blacks, south Asians; somewhat high concentrations of Arabs, Italians and Portuguese; high unemployment; few managers and artists; more blue collar and service workers; moderate reliance on driving; low income, very low education, highly religious.

Cluster 2: creative city. Highly dense, large concentrations of youth, singles, renters, movers (churn), arts and culture occupations, university graduates, non-religious; fewer blue collar occupations, drivers; somewhat low income.

Cluster 3: establishment-suburbia. Lower density, older, married, homeowners, low churn, low unemployment, more managers, business, sciences, lots of driving, high income and education.

Overall, in line with findings in the “The Political Order of the City,” there are a few main ordering parameters in Toronto. The deepest is urban-suburban. The urban side of this equation basically runs from young/creative class to more middle-class creative professionals. The suburban goes from establishment to marginal, with the marginalized areas including more recent immigrant and visible minority groups. There is some mixing and overlap and alignment, but these are the overall parameters: urban-suburban/establishment-marginal/new-old immigrant.

The next step diverges from Delmelle, but we’ll return to her approach below. Now we generate first order Markov transition probabilities for k =15 clusters. These are based on tracts’ trajectories through these 15 clusters across the 4 years. An example of a sequence would look like this:

5350001: middle creative class/ middle creative class/ middle creative class/ middle creative class

This tract remained in the same cluster across the entire period. Combing these sequences for all tracts allows us to determine the probability of transitioning from any of the 15 states to the others at each transition. These are first order Markov processes, so they are only about the probability of a move from a given state to another at any time (i.e. they don’t capture position in a sequence, such as the difference between BAAA and AAAB).

Below is a network graph of the main result. With so many clusters it is difficult to easily visualize; this is a first rough solution. Thicker lines show higher probabilities (except for self-loops), and numbers below .05 are suppressed.

While exploring 2nd/higher order Markov chains to capture position in a sequence would be important to do before developing definitive interpretations, in any case we can make some preliminary observations now. You can readily see for instance which clusters are more stable/well defined in the self-loop probabilities. A lower probability shows a less clearly delineated cluster, or one that is relatively unlikely to re-generate/reproduce itself. Some older European immigrant groups (Greek, Portuguese) are like this (low reproduction), and mostly transition into some “creative city” form. If we think of this process as gentrification, then the most likely targets are older European immigrant neighborhoods. By contrast, newer immigrant groups are mostly transitioning in/out of middle/lower middle class suburban areas, which also show a high degree of volatility.

Some provisional reflections on this diagram:

the most stable part of the city is the “young creative urban professional” areas. We could make a contrast between different forms of stabilibity/creativity/volatility. The most “creative” neighborhoods in terms of population – young, artistic, creative class – are highly highly highly stable, pretty much locked in, without much transition into other neighborhood forms. There is a kind of stability and closed-in character to this form of creativity. By contrast, other parts of the city exhibit much more volatility, shifting back and forth and morphing into one another. So maybe we can say that there is the type of creativity that thrives within a stable type of “creative” urban context. And there is also a way in which the urban form/fabric/shape itself can be in transitional states, which is another maybe deeper and potentially challenging form of creativity from which these other “creative” areas are relatively cut off. Portugali’s reflections below are relevant here.
Related is the idea that the degree of structure itself is something that can vary. Some parts of the city exhibit fairly stable patterns of transition. But others are more volatile. So rather than assume there is an equally powerful genetic code at work everywhere, an important question is about how much of a given urban environment is encoded at all, and to what degree. Above and below we see that much of Toronto is already very very deeply encoded with particular urban genetics, so that they develop in very regular sequences. Whereas there is more volatility elsewhere.
Another issue concerns the notion of “turning points.” This relates to gentrification. Overall, we can see that gentrification is fairly rare. In Toronto, it is mostly happening in Portuguese and Greek areas. Lots if not most of the city’s marginalized communities have little to no transition probability to “creative class” type of areas. However, the fact that it is rare means that when and if it does happen, it is highly significant, and noticed as such. In other words, it is useful to try and isolate big transitions from one state to another, and even it is rare in terms of how often or how extensive it is, it may have an outsized significance in experience.

We can get another view of these relationships by breaking the graphs into sub-graphs, in particular ego-network graphs. This gives us the view from one node. Here we readily see that the big graph is really a composite of a few largely separate sub-graphs. There are parts of the city that are in interaction with one another quite a bit, but which do not interact with other parts very much.

For example, here is the ego-network graph of the “creative professionals.” Note they are largely in interaction with other “creative professional” type groups, elite suburbs, and Greek areas – and nothing else.

And here is “elite suburbs:

And Middle Creative Class:

And here are young urban professionals:

Overall, these graphs indicate that the big graph is really a composite of local processes that don’t communicate much with each other. We might think of these higher-order clusters as ecological niches, constructed in such a way that make it more likely for the more local formations to persist and replicate there. A network representation of the three-cluster solution helps to visualize this:

The three major “ecological niches” of the city each have over a 90% chance of reproducing themselves, with more movement occurring between marginal and establishment (suburban) areas than between either and the “creative city” core.

Altogether, the Markov models illustrate one of the broader points of this post: the major ordering principles of the city reproduce themselves and maintain the differences between them, even as there can remain a relatively large amount of movement to and from different local formations within them. In this way each new version of the city is at once the same and subtly different from its predecessor. This is the backdrop for urban evolution.

That said, there is movement across the city’s major ecological niches, and these interstitial zones may be particularly potent sources for emerging forms of urban life. Switching from the Markov approach to a sequence analytical point of view helps to visualize these spaces, since the sequence approach can be more readily mapped.

Before that, however, let’s pause and reflect for a moment about the different approaches. For further reading, I’d recommend the work of Andrew Abbott, who pioneered optimal matching sequence methods in sociology, and also wrote several very good essays on the more philosophical issues involved in the study of temporality.

These are worth studying in more detail, but the gist for now can be stated in brief: when we look for sequences, we take a narrative view of our subject, and make the narrative/trajectory itself as the object of interest. For instance, say we are studying life course trajectories. Then our object of interest could be two “narratives” that define different life courses:

get good grades in school, get married, get a job, have kids, retire.
Get bad grades, become an alcoholic, bounce around from job to job, multiple divorces, get sick, die young.

From the sequence/trajectory point of view, we are interested in (1) and (2) as wholes (though obviously many more are possible and these are not necessarily normative or normal), and we want to know where/who is more likely to go through one or the other. There are statistical techniques for identifying variables that distinguish them.

For cities, the corresponding approach is to treat a given neighborhood at any moment as “stretched out” in time: it is coming from and going to somewhere; it has what Louis Wirth called a “career.” We can think of this as “context,” but temporal not spatial context. Similarly, we would think of cities as a collection of narratives, so that at any moment various trajectories are unfolding. The trajectory mining approach seeks to map out the main narrative principles that are operating between and within cities.

The Markov type approach is different. The analytical question is about generative processes: what gives rise to these sequences. The sequence itself is a particular outcome of a series of transitions, e.g. from school to marriage to work. If those transitions recur in regular ways, we’ll see a recurrent trajectory, but the more fundamental question from this point of view is the rules that give rise to those patterns. Then we follow along with questions about where the rules come from, and to what/who they are more/less likely to apply.

In a way, this is another version of the classic micro-macro issue, but now in time rather than space. Trajectories are more holistic patterns; Markov processes are bottom-up generative processes. Both are valid and interesting, but the real opportunity for progress theoretically and empirically as always is to somehow join them and understand their interactions.

To that end, let us now consider Toronto from the point of view of its main trajectories. For simplicity, we’ll use the 3 cluster solution. We are working with sequences through the clusters for each tract, which for example look like this:

CTUID01 96 01 06 11

1 5350001.00 2 2 2 2

2 5350002.00 2 2 2 3

3 5350004.00 2 2 2 2

4 5350005.00 1 2 2 2

5 5350007.01 2 2 2 2

6 5350007.02 1 1 2 1

We feed this information into the TraMineR package in R, and from this you can find the most common sequences.

Once again we see overall a very high degree of stability. Most tracts stay the same through each phase.

Then we measure sequence similarity, with optimal matching, and, following Delmelle, cluster the resulting sequences. For the present exercise, we’ll look at 6 clusters of sequences.

For each cluster, there is a large majority of stable areas, and then another cluster of mostly volatile areas, moving back and forth. They don’t (at least at first glance) clearly follow one trajectory, some go one way, others go the other. If we kept following the dendogram down far enough (examining more clusters) maybe things would fall into clearer directional groups, or maybe a different clustering approach would produce that, but so far it doesn’t readily appear. So in sum the overall picture is stability-volatility, with little clear direction of movement from one to the other ecological niche.

Since these trajectory-clusters are assigned to each tract, we can map them. To be clear, the color shows the trajectory that tract is (more) on. So red areas are areas that remain “creative city” throughout, whereas the purple areas are more volatile establishment-suburban areas.

Overall, there is a very high degree of spatial clustering here. The temporality of these clusters is firmly grounded in space. The genetic code of the city replicates itself through time.

At the same time, amidst these “high fidelity” zones (creative city, marginal areas, establishment suburbs) there and smaller zones of volatility. Eyeballing it, overall the zones of volatility seem to be interstitial, i.e. at the borders between stable zones. There are some exceptions, for example in what looks to be North York Town Centre area, which was classified as “volatile creative city.” It isn’t at the border, but could indicate a more “downtown” creative city form possibly emerging in a more suburban area. There’s also some pink downtown, which indicates some potentially more suburban-leaning parts of downtown. This might point toward the urbanization of suburbia and suburbanification of the urban – that is, new crossings and rewirings, even as the city continues to “remember,” inherit, and replicate its past.

This picture of interstitial zones of transformation gains additional interest if we compare it to some of the results from the work of Juval Portugali (of course the classic Chicago School maps are also of interest). Portugali is very interested in the power of complexity theory to illuminate urban processes, and has developed some intriguing simulation models to capture the imultaneous stability and openness of cities under the banner of “self-organization.”

Consider some relevant excerpts from Self-Organization and the City:

The whole description is a typical process of self-organization by means of captivity…one can observe, first, a steady state and second, that instability is spatially captured in the boundary belts and spatially confined to the Blue areas…the high-level instability at the initial stages of the game is gradually being imprisoned until a steady state situation is reached. In the latter, instability remains spatially captive in the boundary belts, thus enabling the smooth reproduction of a stable segregative city….The global pattern of the city remains intact, and the system reproduces itself as a stable segregative city by capturing its instabilities, partly concentrated in the boundary belts and partly distributed in dots within the Blue areas.
Thus, in urban scenarios such as in Game 1, changes occur rarely, global processes such as boundary movements are slow, and zones of low-level instability are formed on the boundaries between stable-homogeneous territories. An interesting feature of the above results is that the locally unstable boundary zones exhibit global spatio-temporal stability.
…The areas in between – the boundaries – are thus the most critical areas in the city for socio-spatial changes (and in society at large).

This description of an artificially generated city is remarkably similar to the one we have drawn from real-world data on Toronto

Lastly, as an (even more) speculative enterprise, we can look at the steady state equilibrium the Markov model predicts, and compare that to the initial distribution (in 1996). In other words, if the city continues to generate itself anew as it has over these past years, this is where it is likely to settle (of course it is unlikely to ever settle!). Larger differences are highlighted in yellow to indicate clusters/formations that are predicted to change the most in the future.

To repeat, this is a far to simple model to build a reliable forecast upon. It will be very valuable to explore methods such as higher order Markov chains and Dynamic Bayesian Networks. That said, a simple model is a good place to start. We can use it for example to envision alternative scenarios for how the city might unfold, if it were to change in subtle ways.

To do this, we can simply change the probabilities in our transition matrix, run the model with these imagined generative potentialities, and compare the steady state this produces to the one based upon the probabilities from the actual data. For this experiment, I chose to increase the probability that “arab predominant immigrant,” “Asian ethnoburb,” and “Black Working” class features would appear in “creative professional,” “young urban professional,” “middle creative class,” and “elite suburb” areas.

In reality, the probability of these transitions is essentially 0 — they are in disconnected parts of the larger graph above. In the scenario, I increased the probability to 1% and reduced the probability that the three “receiving” areas reproduce themselves by 3%. In other words, the probability of a change remains very small. This is a strategic decision: it allows us to observe how a very small change might have larger consequences, if it occurs at a key threshold point. In particular, it asks whether even a tiny point of contact among highly disconnected and divided parts of the city can lead to substantial changes. In a complex dynamic interacting system, small qualitative changes at critical points can potentially make a qualitative difference.

Results are here:

The first column shows the same steady state as above, derived from the actual trends in Toronto we have observed. The second one shows what the steady state would be in the scenario we are envisioning — in other words, if we were to alter the city’s genetic code. The last shows the difference, and highlights larger differences in yellow.

There is much to discuss here, but I will highlight a few items of interest. First, note how much the young urban professional share of the population is affected: it declines from 15% to 7% of the population, dropping by over half [or, to put it differently, it would have been expected to increase from 5 to 7% rather than from 5 to 15%]; the Creative Professional area goes from 12 to 7% . A tiny change that barely connects these disconnected and divided areas drastically reduces the isolation of these parts of the city, and helps others to retain their foothold.

Note too that the Mixed Lower Middle class parts of the city increase by a substantial amount, from 10 to 13%. While the transition probabilities for this cluster weren’t changed, its presence in the city grew as result of changes happening in other parts of the city. This again is a mark of a complex system: small changes in one part reverberate in others.