The collective of individual humans being any good at being smart

I am working on two topics in parallel, which is sort of normal in my case. As I know myself, instead of asking “Isn’t two too much?”, I should rather say “Just two? Run out of ideas, obviously”. I keep working on a proof-of-concept article for the idea which I provisionally labelled “Energy Ponds” AKA “Project Aqueduct”, on the one hand. See my two latest updates, namely ‘I have proven myself wrong’ and ‘Plusieurs bouquins à la fois, comme d’habitude’, as regards the summary of what I have found out and written down so far. As in most research which I do, I have come to the conclusion that however wonderful the concept appears, the most important thing in my work is the method of checking the feasibility of that concept. I guess I should develop on the method more specifically.

On the other hand, I am returning to my research on collective intelligence. I have just been approached by a publisher, with a kind invitation to submit the proposal for a book on that topic. I am passing in review my research, and the available literature. I am wondering what kind of central thread I should structure the entire book around. Two threads turn up in my mind, as a matter of fact. The first one is the assumption that whatever kind of story I am telling, I am actually telling the story of my own existence. I feel I need to go back to the roots of my interest in the phenomenon of collective intelligence, and those roots are in my meddling with artificial neural networks. At some point, I came to the conclusion that artificial neural networks can be good simulators of the way that human societies figure s**t out. I need to dig again into that idea.

My second thread is the theory of complex systems AKA the theory of complexity. The thing seems to be macheting its way through the jungle of social sciences, those last years, and it looks interestingly similar to what I labelled as collective intelligence. I came by the theory of complexity in three books which I am reading now (just three?). The first one is a history book: ‘1177 B.C. The Year Civilisation Collapsed. Revised and Updated’, published by Eric H. Cline with Princeton University Press in 2021[1]. The second book is just a few light years away from the first one. It regards mindfulness. It is ‘Aware. The Science and Practice of Presence. The Groundbreaking Meditation Practice’, published by Daniel J. Siegel with TarcherPerigee in 2018[2]. The third book is already some sort of a classic; it is ‘The Black Swan. The impact of the highly improbable’ by Nassim Nicolas Taleb with Penguin, in 2010.   

I think it is Daniel J. Siegel who gives the best general take on the theory of complexity, and I allow myself to quote: ‘One of the fundamental emergent properties of complex systems in this reality of ours is called self-organization. That’s a term you might think someone in psychology or even business might have created—but it is a mathematical term. The form or shape of the unfolding of a complex system is determined by this emergent property of self-organization. This unfolding can be optimized, or it can be constrained. When it’s not optimizing, it moves toward chaos or toward rigidity. When it is optimizing, it moves toward harmony and is flexible, adaptive, coherent, energized, and stable’. (Siegel, Daniel J.. Aware (p. 9). Penguin Publishing Group. Kindle Edition).  

I am combining my scientific experience with using AI as social simulator with the theory of complex systems. I means I need to UNDERSTAND, like really. I need to understand my own thinking, in the first place, and then I need to combine it with whatever I can understand from other people’s thinking. It started with a simple artificial neural network, which I used to write my article ‘Energy efficiency as manifestation of collective intelligence in human societies’ (Energy, 191, 116500, https://doi.org/10.1016/j.energy.2019.116500 ).  I had a collection of quantitative variables, which I had previously meddled with using classical regression. As regression did not really bring much conclusive results, I had the idea of using an artificial neural network. Of course, today, neural networks are a whole technology and science. The one I used is the equivalent of a spear with a stone tip as compared to a battle drone. Therefore, the really important thing is the fundamental logic of neural networking as compared to regression, in analyzing quantitative data.

When I do regression, I come up with a function, like y = a1*x1 + a2*x2 + …+ b, I trace that function across the cloud of empirical data points I am working with, and I measure the average distance from those points to the line of my function. That average distance is the average (standard) error of estimation with that given function. I repeat the process as many times as necessary to find a function which both makes sense logically and yields the lowest standard error of estimation. The central thing is that I observe all my data at once, as if it was all happening at the same time and as if I was observing it from outside. Here is the thing: I observe it from outside, but when that empirical data was happening, i.e. when the social phenomena expressed in my quantitative variables were taking place, everybody (me included) was inside, not outside.

How to express mathematically the fact of being inside the facts measured? One way is to take those empirical occurrences one by one, sort of Denmark in 2005, and then Denmark in 2006, and then Germany in 2005 etc. Being inside the events changes my perspective on what is the error of estimation, as compared to being outside. When I am outside, error means departure from the divine plan, i.e. from the regression function. When I am inside things that are happening, error happens as discrepancy between what I want and expect, on the one hand, and what I actually get, on the other hand. These are two different errors of estimation, measured as departures from two different functions. The regression function is the most accurate (or as accurate as you can get) mathematical explanation of the empirical data points. The function which we use when simulating the state of being inside the events is different: it is a function of adaptation.      

Intelligent adaptation means that we are after something: food, sex, power, a new Ferrari, social justice, 1000 000 followers on Instagram…whatever. There is something we are after, some kind of outcome we try to optimize. When I have a collection of quantitative variables which describe a society, such as energy efficiency, headcount of population, inflation rates, incidence of Ferraris per 1 million people etc., I can make a weak assumption that any of these can express a desired outcome. Here, a digression is due. In science and philosophy, weak assumptions are assumptions which assume very little, and therefore they are bloody hard to discard. On the other hand, strong assumptions assume a lot, and that makes them pretty good targets for discarding criticism. In other words, in science and philosophy, weak assumptions are strong and strong assumptions are weak. Obvious, isn’t it? Anyway, I make that weak assumption that any phenomenon we observe and measure with a numerical scale can be a collectively desired outcome we pursue.

Another assumption I make, a weak one as well, is sort of hidden in the word ‘expresses’. Here, I relate to a whole line of philosophical and scientific heritage, going back to people like Plato, Kant, William James, Maurice Merleau-Ponty, or, quite recently, Michael Keane (1972[3]), as well as Berghout & Verbitskiy (2021[4]). Very nearly everyone who seriously thought (or keeps thinking, on the account of being still alive) about human cognition of reality agrees that we essentially don’t know s**t. We make cognitive constructs in our minds, so as to make at least a little bit of sense of the essentially chaotic reality outside our skin, and we call it empirical observation. Mind you, stuff inside our skin is not much less chaotic, but this is outside the scope of social sciences. As we focus on quantitative variables commonly used in social sciences, the notion of facts becomes really blurred. Have you ever shaken hands with energy efficiency, with Gross Domestic Product or with the mortality rate? Have you touched it? No? Neither have I. These are highly distilled cognitive structures which we use to denote something about the state of society.

Therefore, I assume that quantitative, socio-economic variables express something about the societies observed, and that something is probably important if we collectively keep record of it. If I have n empirical variables, each of them possibly represents collectively important outcomes. As these are distinct variables, I assume that, with all the imperfections and simplification of the corresponding phenomenology, each distinct variable possibly represents a distinct type of collectively important outcome. When I study a human society through the lens of many quantitative variables, I assume they are informative about a set of collectively important social outcomes in that society.

Whilst a regression function explains how many variables are connected when observed ex post and from outside, an adaptation function explains and expresses the way that a society addresses important collective outcomes in a series of trials and errors. Here come two fundamental differences between studying a society with a regression function, as opposed to using an adaptation function. Firstly, for any collection of variables, there is essentially one regression function of the type:  y = a1*x1 + a2*x2 + …+ an*xn + b. On the other hand, with a collection of n quantitative variables at hand, there is at least as many functions of adaptation as there are variables. We can hypothesize that each individual variable x is the collective outcome to pursue and optimize, whilst the remaining n – 1 variables are instrumental to that purpose. One remark is important to make now: the variable informative about collective outcomes pursued, that specific x, can be and usually is instrumental to itself. We can make a desired Gross Domestic Product based on the Gross Domestic Product we have now. The same applies to inflation, energy efficiency, share of electric cars in the overall transportation system etc. Therefore, the entire set of n variables can be assumed instrumental to the optimization of one variable x from among them.   

Mathematically, it starts with assuming a functional input f(x1, x2, …, xn) which gets pitched against one specific outcome xi. Subtraction comes as the most logical representation of that pitching, and thus we have the mathematical expression ‘xi – f(x1, x2, …, xn)’, which informs about how close the society observed has come to the desired outcome xi. It is technically possible that people just nail it, and xi = f(x1, x2, …, x­n), whence xi – f(x1, x2, …, x­n) = 0. This is a perfect world, which, however, can be dangerously perfect. We know those societies of apparently perfectly happy people, who live in harmony with nature, even if that harmony means hosting most intestinal parasites of the local ecosystem. One day other people come, with big excavators, monetary systems, structured legal norms, and the bubble bursts, and it hurts.

Thus, on the whole, it might be better to hit xi ≠ f(x1, x2, …, x­n), whence xi – f(x1, x2, …, x­n) ≠ 0. It helps learning new stuff. The ‘≠ 0’ part means there is an error in adaptation. The functional input f(x1, x2, …, x­n) hits above or below the desired xi. As we want to learn, that error in adaptation AKA e = xi – f(x1, x2, …, xn) ≠ 0, makes any practical sense when we utilize it in subsequent rounds of collective trial and error. Sequence means order, and a timeline. We have a sequence {t0, t1, t2, …, tm} of m moments in time. Local adaptation turns into ‘xi(t) – ft(x1, x2, …, x­n)’, and error of adaptation becomes the time-specific et = xi(t) – ft(x1, x2, …, x­n) ≠ 0. The clever trick consists in taking e(t0) = xi(t0) – ft0(x1, x2, …, x­n) ≠ 0 and combining it somehow with the next functional input ft1(x1, x2, …, x­n). Mathematically, if we want to combine two values, we can add them up or multiply them. We keep in mind that division is a special case of multiplication, namely x * (1/z). We I add up two values, I assume they are essentially of the same kind and sort of independent from each other. When, on the other hand, I multiply them, they become entwined so that each of them reproduces the other one. Multiplication ‘x * z’ means that x gets reproduced z times and vice versa. When I have the error of adaptation et0 from the last experimental round and I want to combine it with the functional input of adaptation ft1(x1, x2, …, x­n) in the next experimental round, that whole reproduction business looks like a strong assumption, with a lot of weak spots on it. I settle for the weak assumption then, and I assume that ft1(x1, x2, …, x­n) becomes ft0(x1, x2, …, x­n) + e(t0).

The expression ft0(x1, x2, …, x­n) + e(t0) makes any functional sense only when and after we have e(t0) = xi(t0) – ft0(x1, x2, …, x­n) ≠ 0. Consequently, the next error of adaptation, namely e(t1) = xi(t1) – ft1(x1, x2, …, x­n) ≠ 0 can come into being only after its predecessor et0 has occurred. We have a chain of m states in the functional input of the society, i.e. {ft0(x1, x2, …, x­n) => ft1(x1, x2, …, x­n) => … => ftm(x1, x2, …, x­n)}, associated with a chain of m desired outcomes {xi(t0) => xi(t1) => … => xi(tm)}, and with a chain of errors in adaptation {e(t0) => e(t1) => …=> e(tm)}. That triad – chain of functional inputs, chain of desired outcomes, and the chain of errors in adaptation – makes for me the closest I can get now to the mathematical expression of the adaptation function. As errors get fed along the chain of states (as I see it, they are being fed forward, but in the algorithmic version, you can backpropagate them), those errors are some sort of dynamic memory in that society, the memory from learning to adapt.

Here we can see the epistemological difference between studying a society from outside, and explaining its workings with a regression function, on the one hand, and studying those mechanisms from inside, by simulation with an adaptation function, on the other hand. Adaptation function is the closest I can get, in mathematical form, to what I understand by collective intelligence. As I have been working with that general construct, I progressively zoomed in on another concept, namely that of intelligent structure, which I define as a structure which learns by experimenting with many alternative versions of itself whilst staying structurally coherent, i.e. by maintaining basic coupling between particular components.

I feel like comparing my approach to intelligent structures and their collective intelligence with the concept of complex systems, as discussed in the literature I have just referred to. I returned, therefore, to the book entitled ‘1177 B.C. The Year Civilisation Collapsed. Revised and Updated’, by Eric H. Cline, Princeton University Press, 2021. The theory of complex systems is brought forth in that otherwise very interesting piece in order to help formulating an answer to the following question: “Why did the great empires of the Late Bronze Age, such as Egypt, the Hittites, or the Myceneans, collapse all in approximately the same time, around 1200 – 1150 B.C.?”.  The basic assertion which Eric Cline develops on and questions is that the entire patchwork of those empires in the Mediterranean, the Levant and the Middle East was one big complex system, which collapsed on the account of having overkilled it slightly in the complexity department.

I am trying to reconstruct the definition of systemic complexity such as Eric Cline uses it in his flow of logic. I start with the following quote: Complexity science or theory is the study of a complex system or systems, with the goal of explaining the phenomena which emerge from a collection of interacting objects’. If we study a society as a complex system, we need to assume two things. There are many interacting objects in it, for one, and their mutual interaction leads to the emergence of some specific phenomena. Sounds cool. I move on, and a few pages later I find the following statement: ‘In one aspect of complexity theory, behavior of those objects is affected by their memories and “feedback” from what has happened in the past. They are able to adapt their strategies, partly on the basis of their knowledge of previous history’. Nice. We are getting closer. Entities inside a complex system accumulate memory, and they learn on that basis. This is sort of next door to the three sequential chains: states, desired outcomes, and errors in adaptation, which I coined up.

Further, I find an assertion that a complex social system is typically “alive”, which means that it evolves in a complicated, nontrivial way, whilst being open to influences from the environment. All that leads to the complex system to generate phenomena which can be considered as surprising and extreme. Good. This is the moment to move to the next book:  ‘The Black Swan. The impact of the highly improbable’ by Nassim Nicolas Taleb , Penguin, 2010. Here comes a lengthy quote, which I bring here for the sheer pleasure of savouring one more time Nassim Taleb’s delicious style: “[…] say you attribute the success of the nineteenth-century novelist Honoré de Balzac to his superior “realism,” “insights,” “sensitivity,” “treatment of characters,” “ability to keep the reader riveted,” and so on. These may be deemed “superior” qualities that lead to superior performance if, and only if, those who lack what we call talent also lack these qualities. But what if there are dozens of comparable literary masterpieces that happened to perish? And, following my logic, if there are indeed many perished manuscripts with similar attributes, then, I regret to say, your idol Balzac was just the beneficiary of disproportionate luck compared to his peers. Furthermore, you may be committing an injustice to others by favouring him. My point, I will repeat, is not that Balzac is untalented, but that he is less uniquely talented than we think. Just consider the thousands of writers now completely vanished from consciousness: their record does not enter into analyses. We do not see the tons of rejected manuscripts because these writers have never been published. The New Yorker alone rejects close to a hundred manuscripts a day, so imagine the number of geniuses that we will never hear about. In a country like France, where more people write books while, sadly, fewer people read them, respectable literary publishers accept one in ten thousand manuscripts they receive from first-time authors”.

Many people write books, few people read them, and that creates something like a flow of highly risky experiments. That coincides with something like a bottleneck of success, with possibly great positive outcomes (fame, money, posthumous fame, posthumous money for other people etc.), and a low probability of occurrence. A few salient phenomena are produced – the Balzacs – whilst the whole build-up of other writing efforts, by less successful novelists, remains in the backstage of history. That, in turn, somehow rhymes with my intuition that intelligent structures need to produce big outliers, at least from time to time. On the one hand, those outliers can be viewed as big departures from the currently expected outcomes. They are big local errors. Big errors mean a lot of information to learn from. There is an even further-going, conceptual coincidence with the theory and practice of artificial neural networks. A network can be prone to overfitting, which means that it learns too fast, sort of by jumping prematurely to conclusions, before and without having worked through the required work through local errors in adaptation.

Seen from that angle, the function of adaptation I have come up with has a new shade. The sequential chain of errors appears as necessary for the intelligent structure to be any good. Good. Let’s jump to the third book I quoted with respect to the theory of complex systems: ‘Aware. The Science and Practice of Presence. The Ground-breaking Meditation Practice’, by Daniel J. Siegel, TarcherPerigee, 2018. I return to the idea of self-organisation in complex systems, and the choice between three different states: a) the optimal state of flexibility, adaptability, coherence, energy and stability b) non-optimal rigidity and c) non-optimal chaos.

That conceptual thread concurs interestingly with my draft paper: ‘Behavioral absorption of Black Swans: simulation with an artificial neural network’ . I found out that with the chain of functional input states {ft0(x1, x2, …, x­n) => ft1(x1, x2, …, x­n) => … => ftm(x1, x2, …, x­n)} being organized in rigorously the same way, different types of desired outcomes lead to different patterns of learning, very similar to the triad which Daniel Siegel refers to. When my neural network does its best to optimize outcomes such as Gross Domestic Product, it quickly comes to rigidity. It makes some errors in the beginning of the learning process, but then it quickly drives the local error asymptotically to zero and is like ‘We nailed it. There is no need to experiment further’. There are other outcomes, such as the terms of trade (the residual fork between the average price of exports and that of imports), or the average number of hours worked per person per year, which yield a curve of local error in the form of a graceful sinusoid, cyclically oscillating between different magnitudes of error. This is the energetic, dynamic balance. Finally, some macroeconomic outcomes, such as the index of consumer prices, can make the same neural network go nuts, and generate an ever-growing curve of local error, as if the poor thing couldn’t learn anything sensible from looking at the prices of apparel and refrigerators. The (most) puzzling thing in all that differences in pursued outcomes are the source of discrepancy in the patterns of learning, not the way of learning as such. Some outcomes, when pursued, keep the neural network I made in a state of healthy adaptability, whilst other outcomes make it overfit or go haywire.  

When I write about collective intelligence and complex system, it can come as a sensible idea to read (and quote) books which have those concepts explicitly named. Here comes ‘The Knowledge Illusion. Why we never think alone’ by Steven Sloman and Philip Fernbach, RIVERHEAD BOOKS (An imprint of Penguin Random House LLC, Ebook ISBN: 9780399184345, Kindle Edition). In the introduction, titled ‘Ignorance and the Community of Knowledge’, Sloman and Fernbach write: “The human mind is not like a desktop computer, designed to hold reams of information. The mind is a flexible problem solver that evolved to extract only the most useful information to guide decisions in new situations. As a consequence, individuals store very little detailed information about the world in their heads. In that sense, people are like bees and society a beehive: Our intelligence resides not in individual brains but in the collective mind. To function, individuals rely not only on knowledge stored within our skulls but also on knowledge stored elsewhere: in our bodies, in the environment, and especially in other people. When you put it all together, human thought is incredibly impressive. But it is a product of a community, not of any individual alone”. This is a strong statement, which I somehow distance myself from. I think that collective human intelligence can be really workable when individual humans are any good at being smart. Individuals need to have practical freedom of action, based on their capacity to figure s**t out in difficult situations, and the highly fluid ensemble of individual freedoms allows the society to make and experiment with many alternative versions of themselves.

Another book is more of a textbook. It is ‘What Is a Complex System?’ by James Landyman and Karoline Wiesner, published with Yale University Press (ISBN 978-0-300-25110-4, Kindle Edition). In the introduction (p.15), Landyman and Wiesner claim: “One of the most fundamental ideas in complexity science is that the interactions of large numbers of entities may give rise to qualitatively new kinds of behaviour different from that displayed by small numbers of them, as Philip Anderson says in his hugely influential paper, ‘more is different’ (1972). When whole systems spontaneously display behaviour that their parts do not, this is called emergence”. In my world, those ‘entities’ are essentially the chained functional input states {ft0(x1, x2, …, x­n) => ft1(x1, x2, …, x­n) => … => ftm(x1, x2, …, x­n)}. My entities are phenomenological – they are cognitive structures which fault of a better word we call ‘empirical variables’. If the neural networks I make and use for my research are any good at representing complex systems, emergence is the property of data in the first place. Interactions between those entities are expressed through the function of adaptation, mostly through the chain {e(t0) => e(t1) => …=> e(tm)} of local errors, concurrent with the chain of functional input states.

I think I know what the central point and thread of my book on collective intelligence is, should I (finally) write that book for good. Artificial neural networks can be used as simulators of collective social behaviour and social change. Still, they do not need to be super-performant network. My point is that with the right intellectual method, even the simplest neural networks, those possible to program into an Excel spreadsheet, can be reliable cognitive tools for social simulation.


[1] LCCN 2020024530 (print) | LCCN 2020024531 (ebook) | ISBN 9780691208015 (paperback) | ISBN 9780691208022 (ebook) ; Cline, Eric H.. 1177 B.C.: 6 (Turning Points in Ancient History, 1) . Princeton University Press. Kindle Edition.

[2] LCCN 2018016987 (print) | LCCN 2018027672 (ebook) | ISBN 9780143111788 | ISBN 9781101993040 (hardback) ; Siegel, Daniel J.. Aware (p. viii). Penguin Publishing Group. Kindle Edition.

[3] Keane, M. (1972). Strongly mixing measures. Inventiones mathematicae, 16(4), 309-324. DOI https://doi.org/10.1007/BF01425715

[4] Berghout, S., & Verbitskiy, E. (2021). On regularity of functions of Markov chains. Stochastic Processes and their Applications, Volume 134, April 2021, Pages 29-54, https://doi.org/10.1016/j.spa.2020.12.006

2 thoughts on “The collective of individual humans being any good at being smart

Leave a Reply