I am re-digesting, like a cow, some of the intellectual food I figured out recently. I return to the specific strand of my research to be found in the unpublished manuscript ‘The Puzzle of Urban Density And Energy Consumption’, and I want to rummage a bit inside one specific issue, namely the meaning which I can attach to the neural activation function in the quantitative method I use.
Just to give a quick sketch of the landscape, I work through a general hypothesis that our human civilization is based on two factories: the factory of food in the countryside, and the factory of new social roles in cities. The latter produce new social roles by creating demographic anomalies, i.e. by packing humans tightly together, in abnormally high density. Being dense together makes us interact more with each other, which, whilst not always pleasant, stimulates our social brains and makes us figure out new interesting s**t, i.e. new social roles.
I made a metric of density in population, which is a coefficient derived from the available data of the World Bank. I took the coefficient of urbanization (World Bank 1[1]), and I multiplied it by the headcount of population (World Bank 4[2]). This is how I got the number of people living in cities. I divided it by the surface of urban land (World Bank 2[3]), and I got the density of population in cities, which I further label as ‘DU’. Further, I gather that the social difference between cities and the countryside, hence the relative impact of cities as breeding ground for new social roles, is determined by the difference in the depth of demographic anomalies created by the urban density of population. Therefore, I took the just-calculated coefficient DU and I divided it by the general density of population, or ‘DG’ (World Bank 5[4]). This is how I ended up the with the coefficient ‘DU/DG’, which, mathematically, denominates the density of urban population in units of general density in population.
I simulate an artificial reality, where we, humans, optimize the coefficient ‘DU/DG’ as our chief collective orientation. We just want to get it right. Enough human density in cities to be creative, and yet enough space for each human being able to practice mindfulness when taking a #2 in the toilet. We optimize ourselves being dense together in cities on the base of 7 input characteristics of ours, namely:
Population – this is a typical scale variable. The intuition behind it is that size matters, and that’s why in most socio-economic research, when we really mean business in quantitative terms, we add such variables, pertinent to the size of the social entity studied. Urbanization occurring in a small country, like Belgium (with all my due respect for Belgians), is likely to occur differently from urbanization in India or in the U.S. In this specific case, I assume that a big population, like hundreds of millions of people, has to move more resources around to accommodate people in cities, as compared to a population counted in dozens of millions. |
Urban population absolute – same tune, a scale variable, more specifically pertinent to the headcount of urban populations. |
Gross Domestic Product (GDP, constant 2010 US$) – scale variable, once again, but this time it is about the real output of the economy. In my approach, the GDP is not exactly a measure of the wealth produced, but more of an appraisal of total productive activity in the humans living around. This is why I use constant prices. That shaves off the price-and-relative-wealth component, and leaves GDP as a metric pertinent to how much tradable surpluses do humans create in a given place and time. |
Broad money (% of GDP) – this is essentially the opposite to the velocity of money, and it corresponds to another strand in my research. I discovered and I keep studying the fact that in the presence of quick technological change, human societies stuff themselves up with abnormally high amounts of cash (or cash equivalents, for that matter). It holds for entire countries as well as for individual businesses. You can find more on that in my article ‘Technological change as a monetary phenomenon’. I guess that when humans make more new social roles in cities, technologies change faster. |
Energy use (kg of oil equivalent per capita) – this is one of the fundamental variables I frequently work with. I guess I included it in this particular piece of research just in case, in order to be able to connect with my research on the market of energy. |
Agricultural land (km2) – the surface of agricultural land available is a logical correlate of urban population. A given number of people in cities need a given amount of food, which, in turn, can be provided by a given surface of agricultural land. |
Cereal yield (kg per hectare) – logically complementary to the surface of agricultural land. Yield per hectare in France is different from what an average hectare can contribute in Nigeria, and that is likely to be correlated with urbanization. |
You can get the raw data I used UNDER THIS LINK. It covers Australia, Brazil, Canada, China, Colombia, France, Gabon, Germany, Ghana, India, Malaysia, Mexico, Mozambique, Namibia, New Zealand, Nigeria, Norway, Poland, Russian Federation, United Kingdom, and the United States. All that lot observed over the window in time stretching from 1961 all the way to 2015.
I make that data into a neural network, which means that I make h(tj) = x1(tj)*R* E[xi(tj-1)] + x2(tj)*R* E[x2(tj-1)] + … + xn(tj)*R* E[xn(tj-1)], as explained in my update titled ‘Representative for collective intelligence’, with x1, x2,…, x7 input variables described above, grouped in 21 social entities (countries), and spread over 2015 – 1961= 54 years. After the curation of data for empty cells, I have m = 896 experimental rounds in the (alleged) collective intelligence, whose presence I guess behind the numbers. I made that lot learn how to squeeze the partly randomized input, controlled for internal coherence, into the mould of the desired output of the coefficient xo = DU/DG. I ran the procedure of learning with 4 different methods of estimating the error of optimization. Firstly, I computed that error the way we do it in basic statistics, namely e1 = xo – h(tj). The mixed-up input is simply subtracted from expected output. In the background, I assume that the locally output xo is an expected value in statistical terms, i.e. it is the mean value of some hypothetical Gaussian distribution, local and specific to that concrete observation. With that approach to error, there is no neural activation as such. It is an autistic neural network, which does not discriminate input as for its strength. It just reacts.
As I want my collective intelligence to be smarter than your average leech, I make three more estimations of errors, with the input h(tj) passing through a neural activation function. I start with the ReLU rectifier, AKA max[0, h(tj)], and, correspondingly, with e2 = xo – ReLU[h(tj)]. Then I warm up, and I use neural activation via hyperbolic tangent tanh[h] = (e2h – 1) / (e2h + 1), and I compute e3 = xo – tanh[h(tj)]. The hyperbolic tangent is a transcendental number generated by periodical observation of a hyperbola, and that means that hyperbolic tangent has no functional correlation to its input. Neural activation with hyperbolic tangent creates a projection of input into a separate, non-correlated space of states, like cultural transformation of cognitive input into symbols, ideologies and whatnot. Fourthly and finally, I use the sigmoid function (AKA logistic function) sig(h) = 1 / (1 + e-h) which can be read as smoothed likelihood that something happens, i.e. that input h(tj) has full power. The corresponding error is e4 = xo – sig[h(tj)].
From there, I go my normal way. I create 4 artificial realities out of my source dataset. Each of these realities assumes that humans strive to nail down the right social difference between cities and the countryside, as measured with the DU/DG coefficient. Each of these realities is generated with a different way of appraising how far we are from the desired DU/DG, this with four different ways of computing the error: e1, e2, e3, and e4. The expected states of both the source empirical dataset, and sets representative for those 4 alternative realities, are given by their respective vectors of mean values, i.e. mean DU/DG, mean population etc. Those vectors of means are provided in Table 1 below. The source dataset shows a mean DU/DG = 41,14, which means that cities in this dataset display, on average across countries, 41 times greater a density of population than the general density of population. Mean empirical population is 149,6 million people, with mean urban population being 67,34 million people. Yes, we have China and India in the lot, and they really pump those scale numbers up.
Table 1 – Vectors of mean values in the source empirical set and in the perceptrons simulating alternative realities, optimizing the coefficient DU/DG
Perceptrons pegged on DU/DG | |||||
Variable | Source dataset | error = xo – h | error = xo – ReLu(h) | error = xo – tanh(h) | error = xo – sigmoid(h) |
DU/DG | 41,14 | 36,38 | 4,91 | 61,56 | 324,29 |
Population | 149 625 587,07 | 125 596 355,00 | (33 435 417,00) | 252 800 741,00 | 1 580 356 431,00 |
GDP (constant 2010 US$) | 1 320 025 624 972,08 | 1 025 700 000 000,00 | (922 220 000 000,00) | 2 583 780 000 000,00 | 18 844 500 000 000,00 |
Broad money (% of GDP) | 57,50 | 54,13 | 31,80 | 71,99 | 258,38 |
Urban population absolute | 67 349 480,42 | 54 311 459,20 | (31 977 590,00) | 123 331 287,00 | 843 649 729,00 |
Energy use (kg of oil equivalent per capita) | 2 918,69 | 2 769,76 | 1 784,11 | 3 558,16 | 11 786,15 |
Agricultural land km2 | 1 227 301,86 | 1 135 064,25 | 524 611,51 | 1 623 345,71 | 6 719 245,69 |
Cereal yield (kg per hectare) | 3 153,31 | 3 010,54 | 2 065,68 | 3 766,31 | 11 653,77 |
One of the first things which jumps to the eye in Table 1 – at least to my eye – is that one of the alternative realities, namely that based on the ReLU activation function, is an impossible reality. There are negative populations in this one, and this is not a livable state of things. I don’t know about you, my readers, but I would feel horrible knowing that I am a minus. People can’t be negative by default. By the way, in this specific dataset, the ReLU looks like almost identical to the basic difference e1 = xo – h(tj). Yet, whilst making an alternative reality with no neural transformation of quasi-randomized input, thus making it with e1 = xo – h(tj), creates something pretty close to the original empirics.
Another alternative reality which looks sort of sketchy is the one based on neural activation via the sigmoid function. This one transforms the initial mean expected values into their several-times-multiples. Looks like the sigmoid is equivalent, in this case, to powering the collective intelligence of societies studied with substantial doses of interesting chemicals. That particular reality is sort of a wild dream, like what it would be like to produce almost 4 times more cereal yield per hectare, having more than 4 times more agricultural land, and over 10 times more people in cities. The surface of available land being finite as it is, 4 times more agricultural land and 10 times more people in cities would mean cities tiny in terms of land surface, probably all in height, both under and above ground, with those cities being 324 times denser with humans than the general landscape. Sounds familiar, a bit like sci fi movies.
Four different ways of pitching input variables against the expected output of optimal DU/DG coefficient produce four very different alternative realities. Out of these four, one is impossible, one is hilarious, and we stay with two acceptable ones, namely that based on no proper neural activation at all, and the other one using the hyperbolic tangent for assessing the salience of things. Interestingly, errors estimated as e1 = xo – h(tj) are essentially correlated with the input variables, whilst those assessed as e3 = xo – tanh[h(tj)] are essentially uncorrelated. It means that in the former case one can more or less predict how satisfied the neural network will be with the local input, and that prediction can be reliably made a priori. In the latter case, with the hyperbolic tangent, there is no way to know in advance. In this case, neural activation is a true transformation of reality.
Table 2 below provides the formal calculation of standardized Euclidean distance between all the 4 alternative realities and the real world of tears we live in. By standardized Euclidean I mean: E = {[(meanX – meanS)2]0,5} / meanX. The ‘/ meanX’ part means that divide the basic Euclidean distance by the mean value which serves me as benchmark, i.e. the empirical one. That facilitates subsequent averaging of those variable-specific Euclidean distances into one metric of mathematical similarity between entire vectors of values.
Table 2 – Vectors of standardized Euclidean distances between the source set X and the perceptrons simulating alternative realities, optimizing the coefficient DU/DG
error = xo – h | error = xo – ReLu(h) | error = xo – tanh(h) | error = xo – sigmoid(h) | |
DU/DG] | 0,115597874 | 0,88065496 | 0,496346621 | 6,882843342 |
Population | 0,160595741 | 1,223460557 | 0,68955555 | 9,562073386 |
GDP (constant 2010 US$) | 0,22296963 | 1,698637953 | 0,957371093 | 13,27585923 |
Broad money (% of GDP) | 0,058672324 | 0,446981172 | 0,251923403 | 3,493424228 |
Urban population absolute | 0,193587555 | 1,474800842 | 0,831213637 | 11,52644748 |
Energy use (kg of oil equivalent per capita) | 0,051026181 | 0,388730845 | 0,219092892 | 3,038163202 |
Agricultural land km2 | 0,075154787 | 0,572548914 | 0,322694736 | 4,474810971 |
Cereal yield (kg per hectare) | 0,045275 | 0,344916834 | 0,194398841 | 2,695730596 |
Average | 0,115359886 | 0,87884151 | 0,495324596 | 6,868669054 |
Interestingly, whilst alternative reality based on neural activation through the ReLU function creates impossibly negative populations, its overall Euclidean similarity to the source dataset is not as big as it could seem. The impossible alternative is specific just to some variables.
Now, what does it all have to do with anything? How is that estimation of error representative for collective intelligence in human societies? Good question. I am doing my best to give some kind of answer to it. Quantitative socio-economic variables represent valuable collective outcomes, and thus are informative about alternative orientations in collective action. The process of learning how to nail those valuable outcomes down consumes said orientation in action. Assuming that figuring out the right proportion of demographic anomaly in cities, as measured with DU/DG, is a valuable collective outcome, four collective orientations thereon have been simulated. One goes a bit haywire (negative populations), and yet it shows a possible state of society which attempts to sort of smooth out the social difference between cities and the countryside, with DU/DG being ten times lower than reality. Another one goes fantasque, with huge numbers and a slightly sci-fi-ish shade. The remaining two look like realistic alternatives, one essentially predictable with e1 = xo – h(tj), and another one essentially unpredictable, with e3 = xo – tanh[h(tj)].
I want my method to serve as a predictive tool for sketching the possible scenarios of technological change, in particular as regards the emergence and absorption of radically new technologies. On the other hand, I want my method to be of help when it comes to identifying the possible Black Swans, i.e. the rather unlikely and yet profoundly disturbing states of nature. As I look at those 4 alternative realities my perceptron has just made up (it’s not me, its him! Well, it…), I can see two Black Swans. The one made with the sigmoid activation function shows a possible direction which, for example, African countries could follow, should they experience rapid demographic growth. This particular Black Swan is a hypothetical situation, when population grows like hell. This automatically puts enormous pressure on agriculture. More people need more food. More agriculture requires more space and there is fewer left for cities. Still, more people around need more social roles, and we need to ramp up the production thereof in very densely packed urban populations, where the sheer density of human interaction makes our social brains just race for novelty. This particular Black Swan could be actually a historical reconstruction. It could be representative for the type of social change which we know as civilisational revival: passage from the nomad life to the sedentary one, like a dozen of thousands of years ago, reconstruction of social tissue after the fall of the Western Roman Empire in Europe, that sort of stuff.
Another Black Swan is made with the ReLU activation function and simulates a society, where cities lose their function as factories of new social roles. It is the society in downsizing. It is actually a historical reconstruction, too. This is what must have happened when the Western Roman Empire was collapsing, and before the European civilization bounced back.
Well, well, well, that s**t makes sense… Amazing.
[1] World Bank 1: https://data.worldbank.org/indicator/SP.URB.TOTL.IN.ZS
[2] World Bank 4: https://data.worldbank.org/indicator/SP.POP.TOTL
[3] World Bank 2: https://data.worldbank.org/indicator/AG.LND.TOTL.UR.K2
[4] World Bank 5: https://data.worldbank.org/indicator/EN.POP.DNST