My NORMAL and my WEIRD

I go deep into mathematical philosophy, i.e. into the kind of philosophy, where I spare some of my words and use mathematical notation instead, or, if you want, the kind of maths where I am sort of freestyling. I go back to my mild obsession about using artificial neural networks as representation of collective intelligence, and more specifically to one of my networks, namely that representing social roles in the presence of an exogenous disturbance.

I am currently encouraging my students to come up with ideas of new businesses, and I gently direct their attention towards the combined influence of a long-term trend – namely that of digital technologies growing in and into the global economy – with a medium-term one which consists in cloud computing temporarily outgrowing other fields of digital innovation, and finally a short-term, Black-Swan-type event, namely the COVID-19 pandemic.  I want to re-use, and generalize the interpretation of the neural network I presented in my update from May 25th, 2020, titled ‘The perfectly dumb, smart social structure’, in order to understand better the way that changes of various temporal span can overlap and combine.

I start with introducing a complex, probabilistic definition of, respectively, continuity and change. Stuff happens all the time and happening can be represented mathematically as the probability thereof. Let’s suppose that my life is really simple, and it acquires meaning through the happening of three events: A, B, and C. I sincerely hope there are no such lives, yet you never know, you know. In Poland, we have that saying that a true man needs to do three things in life: build a house, plant a tree and raise a son. See, just three important things over the whole lifetime of a man. I hope women have more diverse existential patterns. 

Over the 365 days of a year, event A happened 36 times, event B happened 12 times, and event C took place 120 times. As long as I keep the 365 days of the year as my basic timeline, those three events have displayed the following probabilities of them happening: P(A) = 36/365 = 0.0986, P(B) = 12/365 = 0.0329, and P(C) = 12/365 = 0.3288. With these three probabilities computed, I want to make reasonable expectations as for the rest of my life, and define what is normal, and what is definitely weird, and therefore interesting. I define three alternative versions of the normal and the weird, using three alternative maths. Firstly, I use the uniform distribution, and the binomial one, to represent the most conservative approach, where I assume that normal equals constant, and anything else than constant is revolution. Therefore NORMAL = {P(A) = 0.0986 and P(B) = 0.0329 and P(C) = 0.3288} and WEIRD = {P(A) ≠ 0.0986 or P(B) ≠ 0.0329 or P(C) ≠ 0.3288}. Why do I use ‘and’ in the definition of NORMAL and ‘or’ in that of WEIRD? That’s sheer logic. My NORMAL is all those three events happening exactly at the probabilities computed for the base year. All the 3 of them need to be happening precisely at those levels of incidence. All of them means A and B and C. Anything outside that state is WEIRD, therefore WEIRD happens when even one out of three goes haywire. It can be P(A) or P(B) or P(C), whatever.

Cool. You see? Mathematics don’t bite. Not yet, I mean. Let’s go a few steps further. In mathematical logic, conjunction ‘and’ is represented as multiplication, therefore with symbols ‘x’ or ‘*’.  On the other hand, logical alternative ‘or’ is equivalent to arithmetical addition, i.e. to ‘+’. In other words, my NORMAL = {P(A) = 0.0986 * P(B) = 0.0329 * P(C) = 0.3288} and WEIRD = {P(A) ≠ 0.0986 + P(B) ≠ 0.0329 + P(C) ≠ 0.3288}. The NORMAL is just one value, namely  0,0986 * 0,0329 * 0,3288 = 0,0011, whilst the WEIRD is anything else.

From one thing you probably didn’t like in elementary school, i.e. arithmetic, I will pass to another thing you just as probably had an aversion to, i.e. to geometry. I have those three things that matter in my life, A, B, and C, right? The probability of each of them happening can be represented as an axis in a 3-dimensional, finite space. That space is finite because probabilities are shy and never go above and beyond 100%. Each of my three dimensions maxes out at 100% and that’s it. My NORMAL is just one point in that manifold, and all other points are WEIRD. I computed my probabilities at four digits after the decimal point, and my NORMAL is just one point, which represents 0,0001 out of 1,00 on each axis. Therefore, on each axis I have 1 – 0,0001 = 0,9999 alternative probability of WEIRD stuff happening. I have three dimensions in my existence, and therefore the total volume of the WEIRD makes 0,9999 * 0,9999 * 0,9999 = 0,99993 = 0,9997.

Let’s check with arithmetic. My NORMAL = {P(A) = 0.0986 * P(B) = 0.0329 * P(C) = 0.3288} = 0,0011, right? This is the arithmetical probability of all those three probabilities happening. If I have just two events in my universe, the NORMAL and the WEIRD, the probability of WEIRD is equal to the arithmetical difference between 1,00 and the probability of NORMAL, thus P(WEIRD) = 1,00 – 0,0011 = 0,9989. See? The arithmetical probability of WEIRD is greater than the geometrical volume of WEIRD. Not much, I agree, yet the arithmetical probability of anything at all happening in my life outgrows the geometrical volume of all the things happening in my life by 0,9989 – 0,9997 = 0,0008. Still, there are different interpretations. If I see the finite space of my existence as an arithmetical product of three dimensions, it means I see it as a cube, right? That, in turn, means that I allow my universe to have angles and corners. Yet, if I follow the intuition of Karl Friedrich Gauss and I perceive my existence as a sphere around me (see The kind of puzzle that Karl Friedrich was after), that sphere should have a diameter of 100% (whatever happens happens), and therefore a radius of 50%, and a total volume V = (4/3)*π*(r3) = (4/3)*π*(0,53) =  0,5236. In plain, non-math human it means that after I smooth my existence out by cutting all the corners and angles, I stay with just a bit more than one half of what can possibly, arithmetically happen.

WTF? Right you would be to ask. Let’s follow a bit in the footsteps of Karl Friedrich Gauss. That whole story of spherical existence might be sensible. It might be somehow practical to distinguish all the stuff that can possibly happen to me from the things which I can reasonably expect to happen. I mean, volcanoes do not erupt every day, right? Most planes land, right? There is a difference between outliers of the possible and the mainstream of existential occurrence. The normal distribution is a reasonably good manner of partitioning between those two realms, as it explicitly distinguishes between the expected and all the rest. The expected state of things is one standard deviation away from the average, both up and down, and that state of things can be mathematically apprehended as mean-reverted value. I have already messed around with it (see, for example ‘We really don’t see small change’ ).

When I contemplate my life as the normal distribution of what happens, I become a bit more lucid, as compared to when I had just that one, privileged state of things, described in the preceding paragraphs. When I go Gaussian, I distinguish between the stuff which is actually happening to me, on the one hand, and the expected average state of things. I humbly recognize that what I can reasonably expect is determined by the general ways of reality rather than my individual expectations, which I just as humbly convert into predictions. Moreover, I accept and acknowledge that s**t happens, as a rule, things change, and therefore what is happening to me is always some distance from what I can generally expect. Mathematically, that last realization is symbolized by standard deviation from what is generally expected.

All that taken into account, my NORMAL is an interval: (μ – σ) ≤  NORMAL ≤ (μ + σ), and my WEIRD is actually two WEIRDS, the WEIRD ≤ (μ – σ) and the WEIRD ≥  (μ + σ).

The thing about reality in normal distribution is that it is essentially an endless timeline. Stuff just keeps happening and we normalize our perception thereof by mean-reverting everything we experience. Still, life is finite, our endeavours and ambitions usually have a finite timeframe, and therefore we could do with something like the Poisson process, to distinguish the WEIRD from the NORMAL. Besides, it would still be nice to have a sharp distinction between the things I want to happen, and the things that I can reasonably expect to happen, and the Poisson process addresses this one, too.

Now, what all that stuff about probability has to do with anything? First of all, things that are happening right now can be seen as the manifestation of a probability of them happening. That’s the deep theory by Pierre Simon, marquis de Laplace, which he expressed in his ‘Philosophical Essay on Probabilities’: whatever is happening is manifesting an underlying probability of happening, which, in turn, is a structural aspect of reality. Thus, when I run my simplistic perceptron in order to predict the impact of Black-Swan-type disruptions on human behaviour ( see ‘The perfectly dumb, smart social structure’), marquis de Laplace would say that I uncover an underlying structural proclivity, dormant in the collective intelligence of ours. Further down this rabbit hole, I can claim that however we, the human civilization, react to sudden stressors, that reaction is always a manifestation of some flexibility, hidden and available in our collective cultural DNA.  

The baseline mechanism of collective learning in a social structure can be represented as a network of conscious agents moving from one complex state to another inside a state space organized as a Markov chain of states, i.e. each current state of the social structure is solely the outcome of transformation in the preceding state(s). This transformation is constrained by two sets of exogenous phenomena, namely a vector of desired social outcomes to achieve, and a vector of subjectively aleatory stressors acting like Black-Swan events (i.e. both impossible to predict accurately by any member of the society, and profoundly impactful).

The current state of the social structure is a matrix of behavioural phenomena (i.e. behavioural states) in individual conscious agents comprised in that structure. Formal definition of a conscious agent is developed, and then its application to social research is briefly discussed. Consistently with Hoffman et al. 2015[1] and Fields et al. 2018[2], conscious existence in the world is a relation between three essential, measurable spaces: states of the world or W, conscious experiences thereof or X, and actions, designated as G. Each of these is a measurable space because it is a set of phenomena accompanied by all the possible transformations thereof. States of the world are a set, and this set can be recombined through its specific σ-algebra. The same holds for experiences and actions. Conscious existence (CE) consists in consciously experiencing states of the world and taking actions on the grounds of that experience, in a 7-tuple defined along the following dimensions:

  1. States of the world W
  2. Experiences X
  3. Actions G
  4. Perception P defined as a combination of experiences with states of the world, therefore as a Markovian kernel P: W*X → X
  5. Decisions D defined as a Markovian kernel transforming experiences into actions, or D: X*G → G
  6. Consequences A of actions, defined as a Markovian kernel that transforms actions into further states of the world, or A: G*W →W.
  7. Time t

Still consistently with Hoffman et al. 2015 (op. cit.) and Fields et al. 2018 (op. cit.) it is assumed that Conscious Agents (CA), are individuals autonomous enough to align idiosyncratically their perception P, decisions D, and actions G in the view of maximizing the experience of positive payoffs among the consequences A of their actions. This assumption, i.e. maximization of payoffs, rather than quest for truth in perception, is both strongly substantiated by the here-cited authors, and pivotal for the rest of the model and for the application of artificial neural networks discussed further. Conscious Agent perceive states of the world as a combination of rewards to strive for, threats to avoid, and neutral states, not requiring attention. The capacity to maximize payoffs is further designated as Conscious Agents’ fitness to environment, and is in itself a complex notion, entailing maximization of rewards strictly spoken, minimization of exposure to threats, and a passive attitude towards neutral states. Fitness in Conscious Agents is gauged, and thus passed onto consecutive generations against a complex environment made of rewards and threats of different recurrence over time. The necessity to eat is an example of extremely recurrent external stressor. Seasonal availability of food exemplifies more variant a stressor, whilst a pandemic, such as COVID-19, or a natural disaster, are incidental stressors of subjectively unpredictable recurrence, in the lines of Black-Swan events (Taleb 2007[3]; Taleb & Blyth 2011[4]). 

Conscious Agents are imperfect in their maximization of payoffs, i.e. in the given population, a hierarchy of fitness emerges, and the fittest CA’s have the greatest likelihood to have offspring. Therefore, an evolutionary framework is added, by assuming generational change in the population of Conscious Agents. Some CA’s die out and some new CA’s come to the game. Generational change does not automatically imply biological death and birth. It is a broader phenomenological category, encompassing all such phenomena where the recombination of individual traits inside a given population of entities contributes to creating new generations thereof. Technologies recombine and give an offspring in the form of next-generation solutions (e.g. transistors and printed circuits eventually had offspring in the form of microchips). Business strategies recombine and thus create conditions for the emergence of new business strategies. Another angle of theoretical approach to the issue of recombination in the fittest CA’s is the classical concept of dominant strategy, and that of dynamic equilibrium by John Nash (Nash 1953[5]). When at least some players in a game develop dominant strategies, i.e. strategies that maximize payoffs, those strategies become benchmarks for other players.

The social structure, such as theoretically outlined above, learns by trial and error. It is to stress that individual Conscious Agents inside the structure can learn both by trial and error and by absorption of pre-formed knowledge. Yet, the structure as a whole, in the long temporal horizon, forms its own knowledge by experimenting with itself, and pre-formed knowledge, communicated inside the structure, is the fruit of past experimentation. Collective learning occurs on the basis of a Markov-chain-based mechanism: the structure produces a range of versions of itself, each endowed with a slightly different distribution of behavioural patterns, expressed in the measurable space of actions G, as formalized in the preceding paragraphs. Following the same logic, those behavioural patterns loop with states of the world through consequences, perception, experience, and decisions.

The social structure experiments and learns by producing many variations of itself and testing their fitness against the aggregate vector of external stressors, which, in turn, allows social evolutionary tinkering (Jacob 1977[6]) through tacit coordination, such that the given society displays social change akin to an adaptive walk in rugged landscape (Kauffman & Levin 1987[7]; Kauffman 1993[8]). Each distinct state of the given society is a vector of observable properties, and each empirical instance of that vector is a 1-mutation-neighbour to at least one other instance. All the instances form a space of social entities. In the presence of external stressor, each such mutation (each entity) displays a given fitness to achieve the optimal state, regarding the stressor in question, and therefore the whole set of social entities yields a complex vector of fitness to cope with the stressor. The assumption of collective intelligence means that each social entity is able to observe itself as well as other entities, so as to produce social adaptation for achieving optimal fitness. Social change is an adaptive walk, i.e. a set of local experiments, observable to each other and able to learn from each other’s observed fitness. The resulting path of social change is by definition uneven, whence the expression ‘adaptive walk in rugged landscape’.

There is a strong argument that such adaptive walks occur at a pace proportional to the complexity of social entities involved. The greater the number of characteristics involved, the greater the number of epistatic interactions between them, and the more experiments it takes to have everything more or less aligned for coping with a stressor. Formally, with n significant epistatic traits in the social structure, i.e. with n input variables in the state space, the intelligent collective needs at least m ≥ n +1 rounds of learning in order to develop adaptation. A complete round of learning occurs when the intelligent collective achieves two instrumental outcomes, i.e. it measures its own performance against an expected state, and it feeds back, among individual conscious agents, information about the gap from expected state. For the purposes of the study that follows it is assumed that temporization matters, for a social structure, to the extent that it reflects its pace of collective learning, i.e. the number of distinct time periods t, in the 7-tuple of conscious existence, is the same as the number m of experimental rounds in the process of collective learning.    

Epistatic traits E of a social structure are observable as recurrent patterns in actions G of Conscious Agents. Given the formal structure of conscious existence such as provided earlier (i.e. a 7-tuple), it is further assumed that variance in actions G, thus in behavioural patterns, is a manifestation of underlying variance in experiences X, perception P, decisions D, and consequences A.

A set of Conscious Agents needs to meet one more condition in order to be an intelligent collective: internal coherence, and the capacity to modify it for the purpose of collective learning. As regards this specific aspect, the swarm theory is the main conceptual basis (see for example: Stradner et al. 2013[9]). Internal coherence of a collective is observable as the occurrence of three types in behavioural coupling between Conscious Agents: fixed, random, and correlated. Fixed coupling is a one-to-one relationship: when Conscious Agent performs action Gi(A), Conscious Agent B always responds by action Gi(B). Fixed coupling is a formal expression of what is commonly labelled as strictly ritual. By opposition, random coupling occurs when the Conscious Agent B can have any response to action in Conscious Agent A, without any pattern. Across the spectrum that stretches between fixed coupling and random coupling, correlated coupling entails all such cases when Conscious Agent B chooses from a scalable range of behaviours when responding to action performed by Conscious Agent A, and coincidence in the behaviour of conscious agents A and B explains a significant part of combined variance in their respective behaviour.

It is to note that correlation in behavioural coupling, such as provided in the preceding paragraph, is a behavioural interpretation of the Pearson coefficient of correlation, i.e. it is statistically significant coincidence of local behavioural instances. Another angle is possible, when instead of correlation strictly speaking, we think rather about cointegration, thus about functional connection between expected states (expected values, e.g. mean values in scalable behaviour) in Conscious Agents’ actions. 

A social structure dominated by fixed behavioural coupling doesn’t learn, as behavioural patterns in Conscious Agents are always the same. Should random coupling prevail, it is arguable whether we are dealing with a social structure at all. A reasonably adaptable social structure needs to be dominated by correlated behavioural coupling between conscious agents, and its collective learning can be enhanced by the capacity to switch between different strengths of correlation in behaviours.  

Definition: An Intelligent Collective (IC) is a set of z Conscious Agents, which, over a sequence of m distinct time periods, understood as experimental rounds of learning, whilst keeping significant correlation in behavioural coupling between Conscious Agents’ actions and thus staying structurally stable, produces m such different instances of itself that the last instance in the sequence displays a vector of n epistatic traits significantly different from that observable in the first instance, with the border condition m ≥ n + 1.  

When a set of z Conscious Agents behaves as an Intelligent Collective, it produces a set of n significant epistatic traits, and m ≥ n + 1 instances of itself, over a continuum of m time periods and w distinct and consecutive states of the world. Collective intelligence is observable as the correlation between variance in local states of the world W, on the one hand, and variance in epistatic traits of the social structure. The same remarks, as those made before, hold as regards the general concept of correlation and the possibility of combining it with cointegration.

It is to notice that the border condition m ≥ n +1 has another interesting implication. If we want n epistatic traits to manifest themselves in a population of Conscious Agents, we need at least m ≥ n +1 experimental rounds of learning in that population. The longer is the consciously, and culturally owned history of a social structure, the more complex vector of epistatic traits can this structure develop to cope with external stressors.

Now, we enter the more epistemological realm, namely the question of observability. How are Intelligent Collectives observable, notably in their epistatic traits and in their evolutionary tinkering? From the point of view of a social scientist, observability of the strictly speaking individual behaviour in Conscious Agents, would they be individual persons or other social entities (e.g. businesses) is a rare delicacy, usually accessible at the price of creating a tightly controlled experimental environment. It is usually problematic to generalize observations made in such a controlled setting, so as to make them applicable to the general population. Working with the concept of Intelligent Collective requires phenomenological bridging between the data we commonly have access to, and the process of collectively intelligent evolutionary social tinkering.

Here comes an important, and sometimes arguable assumption: that of normal distribution in the population of Conscious Agents. If any type of behaviour manifests as an epistatic trait, i.e. as important for the ability of the social structure to cope with external stressors, then it is most likely to be an important individual trait, i.e. it is likely to be significantly correlated with the hierarchical position of social entities inside the social structure. This, in turn, allows contending that behavioural patterns associated with epistatic traits are distributed normally in the population of Conscious Agents, and, as such, display expected values, representative thereof. With many epistatic traits at work in parallel, the population of Conscious Agents can be characterized by a vector (a matrix) of mean expected values in scalable and measurable behavioural patterns, which, in turn, are associated with the epistatic traits of the whole population.

This assumption fundamentally connects individual traits to those of the entire population. The set of epistatic traits in the population of Conscious Agents is assumed to be representative for the set of mean expected values in the corresponding epistatic traits at the individual level, in particular Conscious Agents in the population. There are 3 σ – algebras, and one additional structural space, which, together, allow mutual transformation between 3 measurable and structurally stable spaces, namely between: the set of behavioural patterns BP = {bp1, bp2, …, bpn}, the set PBP = {p(bp1), p(bp2), …, p(bpn)} of probabilities as regards the occurrence of those patterns, and the set μBP = {μ(bp1), μ(bp2), …, μ(bpn)} of mean expected values in scalable and observable manifestations of those behavioural patterns.

The 3 σ – algebras are designated as, respectively:

  • the σ – algebra SB, transforming BP into PBP, and it represents the behavioural state of the intelligent collective IC
  • the σ – algebra SE, which transforms BP into μBP and is informative about the expected state of the intelligent collective IC
  • the σ – algebra SD, allowing the transition from PBP to μBP and representing the internal distribution of behavioural patterns inside the intelligent collective IC 

The additional measurable space is V = {v1, v2, …, vn} of observable Euclidean distances between measurable aspects of epistatic traits, thus between probabilities PBP, or between μBP mean expected values. Three important remarks are to make as regards the measurable space V. Firstly, as the whole model serves to use artificial neural networks in an informed manner as a tool of virtual social experimentation, the ‘between’ part in this definition is to be understood flexibly. We can talk about Euclidean distances between probabilities, or distances between mean expected values, yet it is also possible to compute Euclidean distance between a probability and a mean expected value. The Euclidean distance per se does not have a fixed denominator, and, therefore, can exist between magnitudes expressed on different scales of measurement.

Secondly, for the sake of keeping mathematical complexity of the problem at hand within the limits of reasonable, Euclidean distance is further understood as mean Euclidean distance of the given epistatic trait from all the other k = n – 1 epistatic traits, i.e. as


It is also assumed that structural stability of the Intelligent Collective can be measured as, respectively, the mean and the variance in vi, both across the n epistatic traits and m ≥ n +1 experimental rounds. Thirdly, the averaging of Euclidean distances could be, technically, considered as an σ – algebra, as we are in the conceptual construct of state space. Still, it is always the same operation, and it would be always the same σ – algebra, and, as such, logically redundant.

Definition: Collective Intelligence is a two-dimensional σ–algebra CI = {n, m}, which transforms the 7-dimensional state space CA = {W, X, G, P, D, A, t} of individual conscious existence (in Conscious Agents) into the 7-dimensional state space IC = {BP, PBP, μBP, SB, SE, SD, V} of Intelligent Collective, and the transformation occurs by wrapping experience X, actions G, perception P, and decisions D into n epistatic traits of the Intelligent Collective, and by structuring states of the world W, and consequences A over the timeline t observable in the individual conscious existence into m ≥ n + 1 experimental instances of the Intelligent Collective IC so as the last instance IC(m) = {BP(m), PBP(m), μBP(m), SB(m), SE(m), SD(m), v(m)} is significantly different from the first instance IC(1) = {BP(1), PBP(1), μBP(1), SB(1), SE(1), SD(1), v(1)}.    

This definition of Collective Intelligence stays mathematically in the world of Markov chains. Each 7-dimensional state IC = {BP, PBP, μBP, SB, SE, SD, v}of the Intelligent Collective is a transformation of the previous state. Such as formulated above, Collective Intelligence can be referred to and pegged on exogenous phenomena, yet, as such, it can be observed as a phenomenon sui generis.

I got carried away, again. I mean, intellectually. Happens all the time, actually. Time to cool down. If you really want, you can watch, on the top of reading this update, you can watch those videos of mine on the philosophy of science:

The video recorded around 2:30 p.m., August 22nd, 2020, regards the Philosophy of Science. It is both extra-curricular content for all those among my students who want to develop their scientific edge, and my auto-reflection on the general issue of collective intelligence, and the possibility to use artificial neural networks for the study thereof. I dive into three readings: ‘Civilisation and Capitalism’ by Fernand Braudel, ‘Philosophical Essay on Probabilities’ by Pierre Simon, marquis de Laplace, and finally ‘Truth and Method’ by Hans Georg Gadamer. I focus on fundamental distinctions between reality such as it is, on the one hand, our perception, and our understanding thereof. The link is here: (https://youtu.be/Wia0apAOdDQ ).

In the second video, recorded on August 24th, 2020 (https://youtu.be/sCI66lARqAI  ), I am investigating the nature of truth, with three basic readings: Philosophical Essay on Probabilities’ by Pierre Simon, marquis de Laplace, ‘Truth and Method’ by Hans Georg Gadamer, and an article entitled ‘Conscious agent networks: Formal analysis and application to cognition’, by Chris Fields, Donald D. Hoffman, Chetan Prakash, and Manish Singh. I briefly discuss the limitations we, humans, encounter when trying to discover truth about reality.

[1] Hoffman, D. D., Singh, M., & Prakash, C. (2015). The interface theory of perception. Psychonomic bulletin & review, 22(6), 1480-1506.

[2] Fields, C., Hoffman, D. D., Prakash, C., & Singh, M. (2018). Conscious agent networks: Formal analysis and application to cognition. Cognitive Systems Research, 47, 186-213. https://doi.org/10.1016/j.cogsys.2017.10.003

[3] Taleb, N. N. (2007). The black swan: The impact of the highly improbable (Vol. 2). Random house.

[4] Taleb, N. N., & Blyth, M. (2011). The black swan of Cairo: How suppressing volatility makes the world less predictable and more dangerous. Foreign Affairs, 33-39.

[5] Nash, J. (1953). Two-person cooperative games. Econometrica: Journal of the Econometric Society, 128-140.

[6] Jacob, F. (1977). Evolution and tinkering. Science, 196(4295), 1161-1166

[7] Kauffman, S., & Levin, S. (1987). Towards a general theory of adaptive walks on rugged landscapes. Journal of theoretical Biology, 128(1), 11-45

[8] Kauffman, S. A. (1993). The origins of order: Self-organization and selection in evolution. Oxford University Press, USA

[9] Stradner, J., Thenius, R., Zahadat, P., Hamann, H., Crailsheim, K., & Schmickl, T. (2013). Algorithmic requirements for swarm intelligence in differently coupled collective systems. Chaos, Solitons & Fractals, 50, 100-114

Safely narrow down the apparent chaos

There is that thing about me: I like understanding. I represent my internal process of understanding as the interplay of three imaginary entities: the curious ape, the happy bulldog, and the austere monk. The curious ape is the part of me who instinctively reaches for anything new and interesting. The curious ape does basic gauging of that new thing: ‘can kill or hopefully not always?’, ‘edible or unfortunately not without risk?’ etc. When it does not always kill and can be eaten, the happy bulldog is released from its leash. It takes pleasure in rummaging around things, sniffing and digging in the search of adjacent phenomena. Believe me, when my internal happy bulldog starts sniffing around and digging things out, they just pile up. Whenever I study a new topic, the folder I have assigned to it swells like a balloon, with articles, books, reports, websites etc. A moment comes when those piles of adjacent phenomena start needing some order and this is when my internal austere monk steps into the game. His basic tool is the Ockham’s razor, which cuts the obvious from the dubious, and thus, eventually, cuts bullshit off.

In my last update in French, namely in Le modèle d’un marché relativement conformiste, I returned to that business plan for the project EneFin, and the first thing my internal curious ape is gauging right now is the so-called absorption by the market. EneFin is supposed to be an innovative concept, and, as any innovation, it will need to kind of get into the market. It can do so as people in the market will opt for shifting from being just potential users to being the actual ones. In other words, the success of any business depends on a sequence of decisions taken by people who are supposed to be customers.

People are supposed to make decisions regarding my new products or technologies. Decisions have their patterns. I wrote more about this particular issue in an update on this blog, entitled ‘And so I ventured myself into the realm of what people think they can do’, for example. Now, I am interested in the more marketing-oriented, aggregate outcome of those decisions. The commonly used theoretical tool here is the normal distribution(see for example Robertson): we assume that, as customers switch to purchasing that new thing, the population of users grows as a cumulative normal fraction (i.e. fraction based on the normal distribution) of the general population.

As I said, I like understanding. What I want is to really understandthe logic behind simulating aggregate outcomes of customers’ decisions with the help of normal distribution. Right, then let’s do some understanding. Below, I am introducing two graphical presentations of the normal distribution: the first is the ‘official’ one, the second, further below, is my own, uncombed and freshly woken up interpretation.

The normal distribution

 

Normal distribution interpreted

 

So, the logic behind the equation starts biblically: in the beginning, there is chaos. Everyone can do anything. Said chaos occurs in a space, based on the constant e = 2,71828, known as the base of the natural logarithm and reputed to be really handy for studying dynamic processes. This space is ex. Any customer can take any decision in a space made by ‘e’ elevated to the power ‘x’, or the power of the moment. Yes, ‘x’ is a moment, i.e. the moment when we observe the distribution of customers’ decisions.

Chaos gets narrowed down by referring to µ, or the arithmetical average of all the moments studied. This is the expression (x – µ)2or the local variance, observable in the moment x. In order to have an arithmetical average, and have it the same in all the moments ‘x’, we need to close the frame, i.e. to define the set of x’s. Essentially, we are saying to that initial chaos: ‘Look, chaos, it is time to pull yourself together a bit, and so we peg down the set of moments you contain, we draw an average of all those moments, and that average is sort of the point where 50% of you, chaos, is being taken and recognized, and we position every moment xregarding its distance from the average moment µ’.

Thus, the initial chaos ‘e power x’ gets dressed a little, into ‘e power (x – µ)2‘. Still, a dressed chaos is still chaos. Now, there is that old intuition, progressively unfolded by Isaac Newton, Gottfried Wilhelm Leibnizand Abraham de Moivreat the verge of the 17thand 18thcenturies, then grounded by Carl Friedrich Gauss, and Thomas Bayes: chaos is a metaphysical concept born out of insufficient understanding, ‘cause your average reality, babe, has patterns and structures in it.

The way that things structure themselves is most frequently sort of a mainstream fashion, that most events stick to, accompanied by fringe phenomena who want to be remembered as the rebels of their time (right, space-time). The mainstream fashion is observable as an expected value. The big thing about maths is being able to discover by yourself that when you add up all the moments in the apparent chaos, and then you divide the so-obtained sum by the number of moments added, you get a value, which we call arithmetical average, and which actually doesn’t exist in that set of moments, but it sets the mainstream fashion for all the moments in that apparent chaos. Moments tend to stick around the average, whose habitual nickname is ‘µ’.

Once you have the expected value, you can slice your apparent chaos in two, sort of respectively on the right, and on the left of the expected value that doesn’t actually exist. In each of the two slices you can repeat the same operation: add up everything, then divide by the number of items in that everything, and get something expected that doesn’t exist. That second average can have two, alternative properties as for structuring. On the one hand, it can set another mainstream, sort of next door to that first mainstream: moments on one side of the first average tend to cluster and pile up around that second average. Then it means that we have another expected value, and we should split our initial, apparent chaos into two separate chaoses, each with its expected value inside, and study each of them separately. On the other hand, that second average can be sort of insignificant in its power of clustering moments: it is just the average (expected) distance from the first average, and we call it standard deviation, habitually represented with the Greek sigma.

We have the expected distance (i.e. standard deviation) from the expected value in our apparent chaos, and it allows us to call our chaos for further tidying up. We go and slice off some parts of that chaos, which seem not to be really relevant regarding our mainstream. Firstly, we do it by dividing our initial logarithm, being the local variance (x – µ)2, by twice the general variance, or two times sigma power two. We can be even meaner and add a minus sign in front of that divided local variance, and it means that instead of expanding our constant e = 2,71828, into a larger space, we are actually folding it into a smaller space. Thus, we get a space much smaller than the initial ‘e power (x – µ)2‘.

Now, we progressively chip some bits out of that smaller, folded space. We divide it by the standard deviation. I know, technically we multiply it by one divided by standard deviation, but if you are like older than twelve, you can easily understand the equivalence here. Next, we multiply the so-obtained quotient by that funny constant: one divided by the square root of two times π. This constant is 0,39894228 and if my memory is correct is was a big discovery from the part of Carl Friedrich Gauss: in any apparent chaos, you can safely narrow down the number of the realistically possible occurrences to like four tenths of that initial chaos.

After all that chipping we did to our initial, charmingly chaotic ‘e power x‘ space, we get the normal space, or that contained under the curve of normal distribution. This is what the whole theory of probability, and its rich pragmatic cousin, statistics, are about: narrowing down the range of uncertain, future occurrences to a space smaller than ‘anything can happen’. You can do it in many ways, i.e. we have many different statistical distributions. The normal one is like the top dog in that yard, but you can easily experiment with the steps described above and see by yourself what happens. You can kick that Gaussian constant 0,39894228 out of the equation, or you can make it stronger by taking away the square root and just keep two times π in its denominator; you can divide the local variance (x – µ)2just by one time its cousin general variance instead of twice etc. I am persuaded that this is what Carl Friedrich Gaussdid: he kept experimenting with equations until he came up with something practical.

And so am I, I mean I keep experimenting with equations so as to come up with something practical. I am applying all that elaborate philosophy of harnessed chaos to my EneFinthing and to predicting the number of my customers. As I am using normal distribution as my basic, quantitative screwdriver, I start with assuming that however many customers I got, that however many is always a fraction (percentage) of a total population. This is what statistical distributions are meant to yield: a probability, thus a fraction of reality, elegantly expressed as a percentage.

I take a planning horizon of three years, just as I do in the Business Planning Calculator, that analytical tool you can download from a subpage of https://discoversocialsciences.com. In order to make my curves smoother, I represent those three years as 36 months. This is my set of moments ‘x’, ranging from 1 to 36. The expected, average value that does not exist in that range of moments is the average time that a typical potential customer, out there, in the total population, needs to try and buy energy via EneFin. I have no clue, although I have an intuition. In the research on innovative activity in the realm of renewable energies, I have discovered something like a cycle. It is the time needed for the annual number of patent applications to double, with respect to a given technology (wind, photovoltaic etc.). See Time to come to the ad rem, for example, for more details. That cycle seems to be 7 years in Europe and in the United States, whilst it drops down to 3 years in China.

I stick to 7 years, as I am mostly interested, for the moment, in the European market. Seven years equals 7*12 = 84 months. I provisionally choose those 84 months as my average µfor using normal distribution in my forecast. Now, the standard deviation. Once again, no clue, and an intuition. The intuition’s name is ‘coefficient of variability’, which I baptise ßfor the moment. Variability is the coefficient that you get when you divide standard deviation by the mean average value. Another proportion. The greater the ß, the more dispersed is my set of customers into different subsets: lifestyles, cities, neighbourhoods etc. Conversely, the smaller the ß, the more conformist is that population, with relatively more people sailing in the mainstream. I casually assume my variability to be found somewhere in 0,1 ≤ ß ≤ 2, with a step of 0,1. With µ = 84, that makes my Ω (another symbol for sigma, or standard deviation) fall into 0,1*84 ≤ Ω ≤ 2*84 <=> 8,4 ≤ Ω ≤ 168. At ß = 0,1 => Ω = 8,4my customers are boringly similar to each other, whilst at ß = 2 => Ω = 168they are like separate tribes.

In order to make my presentation simpler, I take three checkpoints in time, namely the end of each consecutive year out of the three. Denominated in months, it gives: the 12thmonth, the 24thmonth, and the 36thmonth. I Table 1, below, you can find the results: the percentage of the market I expect to absorb into EneFin, with the average time of behavioural change in my customers pegged at µ = 84, and at various degrees of disparity between individual behavioural changes.

Table 1 Simulation of absorption in the market, with the average time of behavioural change equal to µ = 84 months

Percentage of the market absorbed
Variability of the population Standard deviation with µ = 84 12th month 24 month 36 month
0,1 8,4 8,1944E-18 6,82798E-13 7,65322E-09
0,2 16,8 1,00458E-05 0,02% 0,23%
0,3 25,2 0,18% 0,86% 2,93%
0,4 33,6 1,02% 3,18% 7,22%
0,5 42 2,09% 5,49% 10,56%
0,6 50,4 2,92% 7,01% 12,42%
0,7 58,8 3,42% 7,80% 13,18%
0,8 67,2 3,67% 8,10% 13,28%
0,9 75,6 3,74% 8,09% 13,02%
1 84 3,72% 7,93% 12,58%
1,1 92,4 3,64% 7,67% 12,05%
1,2 100,8 3,53% 7,38% 11,50%
1,3 109,2 3,41% 7,07% 10,95%
1,4 117,6 3,28% 6,76% 10,43%
1,5 126 3,14% 6,46% 9,93%
1,6 134,4 3,02% 6,18% 9,47%
1,7 142,8 2,89% 5,91% 9,03%
1,8 151,2 2,78% 5,66% 8,63%
1,9 159,6 2,67% 5,42% 8,26%
2 168 2,56% 5,20% 7,91%

I think it is enough science for today. That sunlight will not enjoy itself. It needs me to enjoy it. I am consistently delivering good, almost new science to my readers, and love doing it, and I am working on crowdfunding this activity of mine. As we talk business plans, I remind you that you can download, from the library of my blog, the business plan I prepared for my semi-scientific project Befund  (and you can access the French versionas well). You can also get a free e-copy of my book ‘Capitalism and Political Power’ You can support my research by donating directly, any amount you consider appropriate, to my PayPal account. You can also consider going to my Patreon pageand become my patron. If you decide so, I will be grateful for suggesting me two things that Patreon suggests me to suggest you. Firstly, what kind of reward would you expect in exchange of supporting me? Secondly, what kind of phases would you like to see in the development of my research, and of the corresponding educational tools?

Support this blog

€10.00

Those a’s and b’s to put inside (a + b) when doing (a + b) power (p+q)

My editorial

I am finishing compiling notes for that article on the role of monetary systems in the transition towards renewable energies, at least I hope I am. This is a bit of a strange frame of mind when I hope I am. Could I be hoping I am not? Interesting question. Anyway, one of the ways I make sure I understand what I am writing about is to take a classic, whom I previously kind of attached to this particular piece of science I am trying to make, and I kind of filter my own thoughts and findings through that particular classic’s thoughts and findings. This time, Thomas Bayes is my classic. Didn’t have much to do with renewable energies, you would say? Weeeell, he was a philosopher and a mathematician, but he lived (and died) in the 18th century, when Europe was being powered by wind and water, thus, as a matter of fact, he had much to do with renewable energies. At the end of the 18th century, in my homeland – Southern Poland, and back in the day is was Austrian Galicia – there was one watermill per 382 people, on average.

And so I am rereading the posthumous article, attributed to reverend Thomas Bayes, received by Mr John Canton, an editor of ‘Philosophical Transactions’ at the Royal Society. On the 23rd of December, 1763, John Canton read a letter, sent from Newington-Green, on the 10th of November, by Mr Richard Price. The letter was being accompanied by an attachment, in the form of a dissertation on ‘the doctrine of chances’, allegedly found by Mr Price in the notes of a defunct friend, Thomas Bayes. The friend had been defunct for two years, at the time, which is quite intriguing in itself. Anyway, Mr Richard Price presented the dissertation as Thomas Bayes’ work, and this is how Bayesian statistics were born  (Bayes, Price 1763[1]). Just as a reminder: in Thomas Bayes’ world, we are talking about having p successes and q failures in p + q trials, in the presence of one single success being probable at the rate ‘a’, and the probability of a single failure being ‘b’. The general way of thinking about it, in this specific universe, is that we take the sum of probabilities, like (a + b), and we give it some depth by elevating it to the power p + q. We create a space of probability through developing the Newtonian binomial (a + b)p+q.

At this point it is useful to dig a little bit into the logic of the Newtonian binomial. When I do (a + b)p+q , Isaac Newton tells me to kind of climb a ladder towards q, one step at a time, and so I am climbing that ladder of failure. First, I consider full success, so my p successes are exactly equal to my n trials, and my failure count is q = 0. In this most optimistic case, the number of different ways I can have that full score of successes is equal to the binomial coefficient (pq/q!) = (p0/0!) = 1/1 = 1. I have just one way of being successful in every trial I take, whatever the number of trials, and whatever the probability of a single success. The probability attached to that one-million-dollar shot is (pq/q!)*ap. See that second factor, the ap.? The more successes I want the least probability I have them all. A probability is a fraction smaller than 1. When I elevate it to any integer, it gets smaller. If the probability of a single success is like fifty-fifty, thus a = 0,5, and I want 5 successes on 5 trials, and I want no failures at all, I can expect those five bull’s eyes with a probability of (50/0!)*0,55 = 0,55 = 0,03125. Now, if I want 7 successes on 7 trials, zero failures, my seven-on-seven-shots-in-the-middle-probability is equal to (70/0!)*0,57 = 0,57 = 0,0078125. See? All I wanted was two more points scored, seven on seven instead of five on five, and this arrogant Newtonian-Bayesian approach sliced my odds by four times.

Now, I admit I can tolerate one failure over n trials, and the rest has to be just pure success, and so my q = 1. I repeat the same procedure: (p1/1!)*ap-1b1. With the data I have just invented, 4 successes on 5 trials, with 0,5 odds of having a single success, so with a = b = 0.5, I have (41/1!) = 4 ways of having that precise compound score. Those 4 ways give me, at the bottom line, a compound probability of (41/1!)*0,54*0,51 = 4*0,54*0,51 = 0,125. Let’s repeat, just to make it sink. Seven trials, two failures, five successes, one success being as probable as one failure, namely a = b = 0,5. How many ways of having 5 successes and 2 failures do I have over 7 trials? I have (52/2!) = 12,5 them ways. How can I possibly have 12,5 ways of doing something? This is precisely the corkscrewed mind of Thomas Bayes: I have between 12 and 13 ways of reaching that particular score. The ‘between’ has become a staple of the whole Bayesian theory.

Now, I return to my sheep, as the French say. My sheep are renewable (energies). Let’s say I have statistics telling me that in my home country, Poland, I have 12,52% of electricity being generated from renewable sources, A.D. 2014. If I think that generating a single kilowatt-hour the green way is a success, my probability of single success, so P(p=1) = a = 0,1252. The probability of a failure is P(q=1) = b = 1 – 0,1252 = 0,8748. How many kilowatt-hours do I generate? Maybe just enough for one person, which, once again averaged, was 2495,843402 kg of oil equivalent or 29026,65877 kilowatt hour per year per capita (multiplied the oil of by 11,63 to get the kilowatt hours). Here, Thomas Bayes reminds me gently: ‘Mr Wasniewski, I wrote about the probability of having just a few successes and a few failures over a few plus a few equals a few total number trials. More than 29 thousands of those kilowatt-hours or whatever it is you want, it is really hard to qualify under ‘a few’. Reduce.’ Good, so I reduce into megawatt hours, and that gives me like n = 29.

Now, according to Thomas Bayes’ logic, I create a space of probabilities by doing (0,1252 + 0,8748)29. The biggest mistake I could make at this point would be to assume that 0,1252 + 0,8748 = 1, which is true, of course, but most impractical for creating spaces of probability. The right way of thinking about it is that I have two distinct occurrences, one marked 0,1252, the other marked 0,8748, and I project those occurrences into a space made of 29 dimensions. In this interesting world, where you have between six and eight ways of being late or being tall, I have like patches of probability. Each of those patches reflects my preferences. You want to have 5 megawatt hours, out of those 29, generated from renewable sources, Mr Wasniewski? As you please, that will make you odds of ((529-5/(29-5)!)*0,12525*0,874829-5 = 1,19236E-13 of reaching this particular score. The problem, Mr Wasniewski, is that you have only 0,000000096 ways of reaching it, which is a bit impractical, as ways come. Could be impossible to do, as a matter of fact.

So, when I create my multiverse of probability the Thomas Bayes way, some patches of probability turn out to be just impracticable. If I have like only 0,000000096 ways of doing something, I have a locked box, with the key to the lock being locked inside the box. No point in bothering about it. When I settle for 10 megawatt hours successfully generated from renewable sources, against 19 megawatt hours coming from them fossil fuels, the situation changes. I have ((1029-10)/(29-10)!) = 82,20635247, or rather between 82 and 83, although closer to 82 ways of achieving this particular result. The cumulative probability of 10 successes, which I can score in those 82,20635247 ways, is equal to ((1029-10)/(29-10)!)*0,125210*0,874829-10 =  0,0000013. Looks a bit like the probability of meeting an alien civilisation whilst standing on my head at 5 a.m. in Lisbon, but mind you, this is just one patch of probability, and I have more than 82 ways of hitting it. My (0,1252 + 0,8748)29 multiverse contains 29! = 8,84176E+30 such patches of probability, some of them practicable, like 10 megawatt hours out of 29, others not quite, like 5 megawatt hours over 29. Although Thomas Bayes wanted to escape the de Moivre – Laplace world of great numbers, he didn’t truly manage to. As you can see, patches of probability on the sides of this multiverse, with very few successes or very few failures, seem blinking red, like the ‘Occupied’ sign on the door to restrooms. Only those kind of balanced ones, close to successes and failures scoring close to fifty-fifty, yield more than one way of hitting them. Close to the mean, man, you’re safe and feasible, but as you go away from the mean, you can become less than one, kind of.

Thus, if I want to use the original Bayesian method in my thinking about the transition towards renewable energies, it is better to consider those balanced cases, which I can express in the form of just a few successes and a few failures. As tail events enter into my scope of research, so when I am really honest about it, I have to settle for the classical approach based on the mean, expected values, de Moivre – Laplace way. I can change my optic to use the Bayesian method more efficiently, though. I consider 5 local projects, in 5 different towns, and I want to assess the odds of at least 3 of them succeeding. I create my multiverse of probabilities as (0,1252 + 0,8748)3+2=5, which has the advantage of containing just 5! = 120 distinct patches of probability. Kind of more affordable. Among those 120 patches of probability, my target, namely 3 successful local projects out of 5 initiated, amounts to (32/2!) = 4,5 ways of doing it (so between 4 and 5), and all those alternative ways yield a compound probability of (32/2!)*0,12523*0,87472 = 0,006758387. Definitely easier to wrap my mind around it.

I said, at the beginning of the today’s update, that I am using Thomas Bayes’ theory as a filter for my findings, just to check my logic. Now, I see that the results of my quantitative tests, those presented in previous updates, should be transformed into simple probabilities, those a’s and b’s to put inside (a + b) when doing (a + b)p+q. My preferences as for successes and failures should be kept simple and realistic, better below 10.

[1] Mr. Bayes, and Mr Price. “An essay towards solving a problem in the doctrine of chances. by the late rev. mr. bayes, frs communicated by mr. price, in a letter to john canton, amfrs.” Philosophical Transactions (1683-1775) (1763): 370-418

A race across target states, or Bayes and Nakamoto together

My editorial

And so I continue prodding my idea of local, green energy systems, with different theories of probability. The three inside me – my curious ape, my austere monk, and my happy bulldog – are having a conversation with two wise men: reverend Thomas Bayes, and Satoshi Nakamoto. If you need to keep track of my last updates, you can refer to ‘Time puts order in the happening’ as well as to ‘Thomas Bayes, Satoshi Nakamoto et bigos’. And so I am at the lemmas formulated by Thomas Bayes, and at the basic analytical model proposed by Nakamoto. Lemma #1 by Thomas Bayes says: ‘The probability that the point o will fall between any two points in the line AB is the ratio of the distance between the two points to the whole line AB’. Although Thomas Bayes provides a very abundant geometric proof to this statement, I think it is one of those things you just grasp intuitively. My chances of ever being at the coast of the Pacific Ocean are greater than those of ever visiting one tiny, coastal village in the Hawaii, just because the total coastline of the Pacific is much bigger an expanse than one, tiny, Hawaiian village. The bigger is my target zone in relation to the whole universe of probability, the greater is my probability of hitting the target. Now, in lemma #2, we read pretty much the same, just with some details added: ‘The ball W having been thrown, and the line os drawn, the probability of the event M in a single trial is the ratio of Ao to AB’.

I think a little reminder is due in relation to those two Bayesian lemmas. As for the detailed Bayes’s logic, you can refer to Bayes, Price 1763[1], and I am just re-sketching the landscape, now. The whole universe of probability, in Thomas Bayes’s method, is a flat rectangle ABCD, with corners being named clockwise, starting from A at the bottom right, as if that whole universe started around 4 o’clock. AB is kind of width of anything that can happen. Although this universe is a rectangle, it is essentially unidimensional, and AB is that dimension. I throw two balls, W and O. I throw W as the first, at the point where it lands in the rectangle ABCD becomes a landmark. I draw a line through that point, perpendicular to AB, crossing AB at the point o, and CD and the point s. The line os becomes the Mississippi river of that rectangle: from now on, two sub-universes emerge. There is that sub-universe of M happening, or success, namely of the second ball, the O, landing between the lines os and AD (in the East). On the other hand, there are all those strange things that happen on the other side of the line os, and those things are generally non-M, and they are failures to happen. The probability of the second ball O hitting M, or landing between the lines os and AD, is equal to p, or p = P(M). The probability of the ball O landing west of Mississippi, between the lines os and BC, is equal to q, and this is the probability of a single failure.

On the grounds of those two lemmas, Thomas Bayes states one of the most fundamental propositions of his whole theory, namely proposition #8: ‘If upon BA you erect a figure BghikmA, whose property is this, that (the base BA being divided into any two parts, as Ab and Bb and at the point of division b a perpendicular being erected and terminated by the figure in m; and y, x, r representing respectively the ratio of bm, Ab, and Bb to AB, and E being the coefficient of the term in which occurs ap*bq when the binomial [a + b]p + q is expanded) y = E*xp*rq. I say that before the ball W is thrown, the probability the point o should fall between f and b, any two points named in the line AB, and that the event M should happen p times and fail q [times] in p + q = n trials, is the ratio of fghikmb, the part of the figure BghikmA intercepted between the perpendiculars fg, bm, raised upon the line AB, to CA the square upon AB’.

Right, I think that with all those lines, points, sections, and whatnot, you could do with some graphics. Just click on this link to the original image of the Bayesian rectangle and you will see it as I tried to recreate it from the original. I think I did it kind of rectangle-perfectly. Still, according to my teachers of art, at school, my butterflies could very well be my elephants, so be clement in your judgment. Anyway, this is the Bayesian world, ingeniously reducing the number of dimensions. How? Well, in a rectangular universe ABCD, anything that can happen is basically described by the powers ABBC or BCAB. Still, if I assume that things happen just kind of on one edge, the AB, and this happening is projected upon the opposite edge CD, and the remaining two edges, namely BC and DA, just standing aside and watching, I can reduce a square problem to a linear one. I think this is the whole power of geometry in mathematical thinking. Whilst it would be foolish to expect rectangular universes in our everyday life, it helps in dealing with dimensions.

Now, you can see the essence of the original Bayesian approach: imagine a universe of occurrences, give it some depth by adding dimensions, then give it some simplicity by taking some dimensions away from it, and map your occurrences in thus created an expanse of things that can happen. Now, I jump to Satoshi Nakamoto and his universe. I will quote, to give an accurate account of the original logic: ‘The success event is the honest chain being extended by one block, increasing its lead by +1, and the failure event is the attacker’s chain being extended by one block, reducing the gap by -1. The probability of an attacker catching up from a given deficit is analogous to a Gambler’s Ruin problem. Suppose a gambler with unlimited credit starts at a deficit and plays potentially an infinite number of trials to try to reach breakeven. We can calculate the probability he ever reaches breakeven, or that an attacker ever catches up with the honest chain, as follows:

p = probability an honest node finds the next block

q = probability the attacker finds the next block

qz = probability the attacker will ever catch up from z blocks behind

Now, I rephrase slightly the original Nakamoto’s writing, as the online utilities I am using on my mutually mirroring blogs – https://discoversocialsciences.com and https://researchsocialsci.blogspot.com – are not really at home with displaying equations. And so, if p ≤ q, then qz = 1. If, on the other hand, p > q, my qz = (q/p)z. As I mentioned it in one of my previous posts, I use the original Satoshi Nakamoto’s thinking in the a contrario way, where my idea of local green energy systems is the Nakamoto’s attacker, and tries to catch up, on the actual socio-economic reality from z blocks behind. For the moment, and basically fault of a better idea, I assume that my blocks can be carved in time or in capital. I explain: catching from z blocks behind might mean catching in time, like from a temporal lag, or catching up across the expanse of the capital market. I take a local community, like a town, and I imagine its timeline over the 10 years to come. Each unit of time (day, week, month, year) is one block in the chain. Me, with my new idea, I am the attacker, and I am competing with other possible ideas for the development and/or conservation of that local community. Each idea, mine and the others, tries to catch over those blocks of time. The Nakamoto’s logic allows me to guess the right time frame, in the first place, and my relative chances in competition. Is there any period of time, over which I can reasonably expect my idea to take over the whole community, sort of qz = 1 ? This value z can also be my time advantage over other projects. If yes, this will be my maximal planning horizon. If not, I just simulate my qz with different extensions of time (different values of z), and I try to figure out how does my odds change as z changes.

If, instead of moving through time, I am moving across the capital market, my initial question changes: is there any amount of capital, like any amount z of capital chunks, which makes my qz = 1 ? If yes, what is it? If no, what schedule of fundraising should I adopt?

Mind you, this is a race: the greater my z, the lower my qz. The more time I have to cover in order to have my project launched, the lower my chances to ever catch on. This is a notable difference between the Bayesian framework and that by Satoshi Nakamoto. The former says: your chances to succeed grow as the size of your target zone grows in relation to everything that can possibly happen. The more flexible you are, the greater are your chances of success. On the other hand, in the Nakamoto’s framework, the word of wisdom is different: the greater your handicap over other projects, ideas, people and whatnot, in terms of time or resources to grab, the lower your chances of succeeding. The total wisdom coming from that is: if I want to design a business plan for those local, green energy systems, I have to imagine something flexible (a large zone of target states), and, in the same time, something endowed with pretty comfortable a pole position over my rivals. I guess that, at this point, you will say: good, you could have come to that right at the beginning. ‘Be flexible and gain some initial advantage’ is not really science. This is real life. Yes, but what I am trying to demonstrate is precisely the junction between the theory of probability and real life.

[1] Mr. Bayes, and Mr Price. “An essay towards solving a problem in the doctrine of chances. by the late rev. mr. bayes, frs communicated by mr. price, in a letter to john canton, amfrs.” Philosophical Transactions (1683-1775) (1763): 370-418

Time puts order in the happening

My editorial

I am developing on what I have done so far. The process, I believe, is called ‘living’, in general, but I am approaching just a tiny bit of it, namely my latest developments on making a local community run at 100% on green energy (see my latest updates “Conversations between the dead and the living (no candles)” and ‘Quelque chose de rationnellement prévisible’). I am working with the logic of Bayesian statistics, and more specifically with the patient zero of this intellectual stream, reverend Thomas Bayes in person (Bayes, Price 1763[1]). I have those four conditions, which, taken together, define my success:

Q(RE) = S(RE) = D(E) << 100% of energy from local green sources

and

P(RE) ≤ PP(E) << price of renewable energy, within individual purchasing power

and

ROA ≥ ROA* << return on assets from local green installations superior or equal to a benchmark value

and

W/M(T1) > W/M(T0) << a local virtual currency based on green energy takes on the market, progressively

Now, as I study the original writing by Thomas Bayes, and as I read his geometrical reasoning, I think I should stretch a little the universe of my success. Stretching universes allows a better perspective. Thomas Bayes defines the probability of a p successes and q failures in p + q = n trials as E*ap*bq, where a and b are the simple probabilities of, respectively, p and q happening just once, and E is the factor of ap*bq, when you expand the binomial (a + b)p+q. That factor is equal to E = pq/q!, by the way. Thank you, Isaac Newton. Thank you, Blaise Pascal. Anyway, if I define my success as just one success, so if I take p = 1, it makes no sense. That Bayesian expression tends to yield a probability of success equal to 100%, in such cases, which, whilst comforting in some way, sounds just stupid. A universe made of one hypothetical success, and nothing but failures fault of success, seems a bit rigid for the Bayesian approach.

And so I am thinking about applying those four conditions to individuals, and not necessarily to whole communities. I mean, my success would be one person fulfilling all those conditions. Let’s have a look. Conditions 1 and 2, no problem. One person can do Q(RE) = S(RE) = D(E), or consume as much energy as they need and all that in green. One person can also easily P(RE) ≤ PP(E) or pay for that green energy no more than their purchasing power allows. With condition 4, it becomes tricky. I mean, I can imagine that one single person uses more and more of the Wasun, or that local cryptocurrency, and that more and more gets bigger and bigger when compared to the plain credit in established currency that the same person is using. Still, individual people hold really disparate monetary balances: just compare yourself to Justin Bieber and you will see the gap. In monetary balances of significantly different a size, structure can differ a lot, too. Thus, whilst I can imagine an individual person doing W/M(T1) > W/M(T0), that would take a lot of averaging. As for condition 3, or ROA ≥ ROA*, I think that it just wouldn’t work at the individual level. Of course, I could do all that sort of gymnastics like ‘what if the local energy system is a cooperative, what if every person in the local community has some shares in it, what if their return on those shares impacted significantly their overall return on assets etc.’ Honestly, I am not feeling the blues, in this case. I just don’t trust too many whatifs at once. ROA is ROA, it is an accounting measure, I like it solid and transparent, without creative accounting.

Thus, as I consider stretching my universe, some dimensions look more stretchable than others. Happens all the time, nothing to inform the government about, and yet educative. The way I formulate my conditions of success impacts the way I can measure the odds of achieving it. Some conditions are more flexible than others, and those conditions are more prone to fancy mathematical thinking. Those stiff ones, i.e. not very stretchable, are something the economists don’t really like. They are called ‘real options’ or ‘discreet variables’ and they just look clumsy in a model. Anyway, I am certainly going to return to that stretching of my universe, subsequently, but now I want to take a dive into the Bayesian logic. In order to get anywhere, once immersed, I need to expand that binomial: (a + b)p+q. Raising anything to a power is like meddling with the number of dimensions the thing stretches along. Myself, for example, raised to power 0.75, or ¾, means that first, I gave myself a three-dimensional extension, which I usually pleasantly experience, and then, I tried to express this three-dimension existence with a four-dimensional denominator, with time added to the game. As a result, after having elevated myself to power 0.75, I end up with plenty of time I don’t know what to do with. Somehow familiar, but I don’t like it. Dimensions I don’t know what to do with look like pure waste to me. On the whole, I prefer elevating myself to integers. At least, I stay in control.

This, in turn, suggests a geometrical representation, which I indeed can find with Thomas Bayes. In Section II of this article, Thomas Bayes starts with writing the basic postulates: ‘Postulate 1. I suppose the square table or plane ABCD to be so levelled that if either of the balls O or W be thrown upon it, there shall be the same probability that it rests upon any one equal part of the plane or another, and that it must necessarily rest somewhere upon it. Postulate 2. I suppose that the ball W will be first thrown, and through the point where it rests a line ‘os’ shall be drawn parallel to AD, and meeting CD and AB in s and o; and that afterwards the ball O will be thrown p + q = n times, and that its resting between AD and os after a single throw be called the happening of the event M in a single trial’. OK, so that’s the original universe by reverend Bayes. Interesting. A universe is defined, with a finite number of dimensions. Anyway, as I am an economist, I will subsequently reduce any number of dimensions to just two, as reverend Bayes did. As my little example of elevating myself to power 0.75 showed, there is no point in having more dimensions than you can handle. Two is fine.

In that k-dimensional universe, two events happen, in a sequence. The first one is the peg event: it sets a reference point, and a reference tangent. That tangent divides the initial universe into two parts, sort of on the right of the Milky Way as opposed to all those buggers on the left of it. The, the second event happens, and this one is me in action: I take n trials with p successes and q failures. Good. As I am quickly thinking about it, it gives me always one extra dimension over the k dimensions in my universe. That extra dimension is order rather than size. In the original notation by Thomas Bayes, he has two dimensions in his square, and then time happens, and two events happen in that time. Time puts order in the happening of the two events. Hence, that extra dimension should be sort of discrete, with well-defined steps and no available states in between. I have two states of my k-dimensional universe: state sort of 1 with just the peg event in it, and sort of state 2, with my performance added inside. State 1 narrows down the scope of happening in state 2, and I want to know the odds of state 2 happening within that scope.

Now, I am thinking about ball identity. I mean, what could make that first, intrepid ball W, which throws itself head first to set the first state of my universe. From the first condition, I take the individual demand for energy: D(E). The second condition yields individual purchasing power regarding energy PP(E), the third one suggests the benchmark value regarding the return on assets ROA*. I have a bit of a problem with the fourth condition, but after some simplification I think that I can take time, just as reverend Bayes did. My W ball will be the state of things at the moment T0, regarding the monetary system, or W/M(T0). Good, so my universe can get some order through four moves, in which I set four peg values, taken from the four conditions. The extra dimension in my universe is precisely the process of setting those benchmarks.

[1] Mr. Bayes, and Mr Price. “An essay towards solving a problem in the doctrine of chances. by the late rev. mr. bayes, frs communicated by mr. price, in a letter to john canton, amfrs.” Philosophical Transactions (1683-1775) (1763): 370-418

Conversations between the dead and the living (no candles)

My today’s editorial

I have been away from blogging for two days. I have been finishing that article about technological change seen from an evolutionary perspective, and I hope I have finished, at least as the raw manuscript. If you are interested, you can download it from  Research Gate or from my own website with Word Press. Now, as the paper is provisionally finished, I feel like having an intellectual stroll, possibly in the recent past. I am tempted to use those evolutionary patterns of thinking to something I had been quite busy with a few months ago, namely to the financial tools, including virtual currencies, as a means to develop new technologies. I had been particularly interested in the application of virtual currencies to the development of local power systems based on renewable energies, but in fact, I can apply the same frame of thinking to any technology, green energy or else. Besides, as I was testing various empirical models to represent evolutionary change in technologies, monetary variables frequently poked their head through some hole, usually as correlates to residuals.

So, I return to money. For those of my readers who would like to refresh their memory or simply get the drift of that past writing of mine, you can refer, for example, to ‘Exactly the money we assume’  or to  ‘Some insights into Ethereum whilst insulating against bullshit’, as well as to other posts I placed around that time. Now, I want to move on and meddle a bit with Bayesian statistics, and more exactly with the source method presented in the posthumous article by reverend Thomas Bayes (Bayes, Price 1763[1]), which, by the way, you can get from the JSTOR library via this link . I want to both wrap my mind around Thomas Bayes’s way of thinking, and refresh my own thinking about monetary systems. I have that strange preference to organize conversations between the dead and the living (no candles), so I feel like put reverend Bayes in conversation with Satoshi Nakamoto, the semi-mythical founding father of the Bitcoin movement, whose article, that you can download by this link, from my Word Press website, contains some mathematical analysis, based on the Poisson probability.

My initial question, the one I had been wrestling with this Spring, was the following: how can a local community develop a local system of green energy, and a local virtual currency, and how can these two help the development or the transformation of said local community? Why do I bother, posthumously, revered Thomas Bayes with this question? Well, because this is what he stated as the purpose of his article. In the general formulation of the problem, he wrote: ‘Given the number of times in which an unknown event has happened and failed: Required the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability than can be named’. The tricky part in this statement is the ‘unknown’ part. When we studied probabilities at high school (yes, some of us didn’t take a nap during those classes!), one of the first things we were taught to do was to define exactly the event that we want to assess the probability of happening. You remember? Read balls vs. black balls, in a closed box? Rings a bell? Well, Thomas Bayes stated a different problem: how to tackle the probability that something unknown happens? Kind of a red ball cross-bred with a black ball, with a hint of mésalliance with a white cube, in family records. In the last, concluding paragraph of his essay, Thomas Bayes wrote: ‘But what recommends the solution in this Essay is that it is complete in those cases where information is most wanted, and where Mr De Moivre’s solution of the inverse problem can give little or no direction, I mean, in all cases where either p or q are of no considerable magnitude. In other cases, or when both p and q are very considerable, it is not difficult to perceive the truth of what has been here demonstrated, or that there is reason to believe in general that the chances for the happening of an event are to the chances for its failure in the same ratio with that of p to q. But we shall be greatly deceived if we judge in this manner when either p or q are small. And though in such cases the Data are not sufficient to discover the exact probability of an event, yet it is very agreeable to be able to find the limits between which it is reasonable to think it must lie, and also to be able to determine the precise degree of assent which is due to any conclusions or assertions relating to them’.

Before I go further: in the original notation by Thomas Bayes, p and q are the respective numbers of successes and failures, and not probabilities. Especially if you are a native French speaker, you might have learnt, at school, p and q as probabilities, so be on your guard. You’d better always be on your guard, mind you. You never know where your feet can lead you. So, I am bothering late reverend Bayes because he was investigating the probability of scoring a relatively small number of successes in a relatively small number of trials. If you try to launch a new technology, locally, how many trials can you have? I mean, if your investors are patient, they can allow some trial and error, but in reasonable amounts. You also never know for sure what does the reasonable amount of trial and error mean for a given investor. You have the unknown event, see? Just as Thomas Bayes stated his problem. So I take my local community, I make a perfect plan, with a plan B possibly up our local sleeve, I take some risks, and then someone from the outside world wants to assess the odds that I succeed. The logic by Thomas Bayes can be a path to follow.

Satoshi Nakamoto, in that foundational article about the idea of the Bitcoin, treated mostly the issues of security. Still, he indirectly gives an interesting insight concerning the introduction of new inventions in an essentially hostile environment. When the simulates a cyberattack on a financial system, he uses the general framework of Poisson probability to assess the odds that an intruder from outside can take over a network of mutually interacting nodes. I am thinking about inverting his thinking, i.e. about treating the introduction of a new technology, especially in a local community, as an intrusion from outside. I could threat Nakamoto’s ‘honest nodes’ as the conservatives in the process, resisting novelty, and the blocks successfully attacked by the intruder would be the early adopters. Satoshi Nakamoto used the Poisson distribution to simulate that process and here he meets reverend Bayes, I mean, metaphorically. The Poisson distribution is frequently called as the ‘probability of rare events’, and uses the same general framework than the original Bayesian development: something takes place n times in total, in p cases that something is something we wish to happen (success), whilst in q cases it is utter s**t happening (failure), and we want to calculate the compound probability of having p successes and q failures in n trials. By the way, if you are interested in the original work by Simeon Denis Poisson, a creative French, who, technically being a mathematician, tried to be very nearly everything else, I am placing on my Word Press site two of his papers: the one published in 1827 and that of 1832 (presented for the first time in 1829).

And so I have that idea of developing a local power system, based on green energies, possibly backed with a local virtual currency, and I want to assess the odds of success.  Both the Bayesian thinking, and the Poisson’s one are sensitive to how we define, respectively, success and failure, and what amount of uncertainty we leave in this definition. In business, I can define my success in various metrics: size of the market covered with my sales, prices, capital accumulated, return on that capital etc. This is, precisely, the hurdle to jump when we pass from the practice of business to its theoretical appraisal: we need probabilities, and in order to have probabilities, we need some kind of event being defined, at least foggily. What’s a success, here? Let’s try the following: what I want is a local community entirely powered with locally generated, renewable energies, in a socially and financially sustainable manner.

‘Entirely powered’ means 100%. This one is simple. Then, I am entering the dark forest of assumptions. Let’s say that ‘socially sustainable’ means that every member of the local community should have that energy accessible within their purchasing power. ‘Financially sustainable’ is trickier: investors can be a lot fussier than ordinary folks, regarding what is a good deal and what isn’t. Still, I do not know, a priori, who those investors could possibly be, and so I take a metric, which leaves a lot of room for further interpretation, namely the rate of return on assets. I prefer the return on assets (ROA) to the rate of return on equity (ROE), because for the latter I would have to make some assumptions regarding the capital structure of the whole thing, and I want as weak a set of assumptions as possible. I assume that said rate of return on assets should be superior or equal to a benchmark value. By the way, weak assumptions in science are the exact opposite of weak assumptions in life. In life, weak assumptions mean I am probably wrong because I assumed too much. In science, weak assumptions are probably correct, because I assumed just a little, out of the whole expanse of what I could have assumed.

Right. Good. So what I have, are the following variables: local demand for energy D(E), local energy supply from renewable sources S(RE), price of renewable energy P(RE), purchasing power regarding energy PP(E), and rate of return on assets (ROA). With these, I form my conditions. Condition #1: the local use of energy is a local equilibrium between the total demand for energy and the supply of energy from renewable sources: Q(RE) = S(RE) = D(E). Condition #2: price of renewable energy is affordable, or: P(RE) ≤ PP(E). Condition #3: the rate of return on assets is greater than or equal to a benchmark value: ROA ≥ ROA*. That asterisk on the right side of that last condition is the usual symbol to show something we consider as peg value. Right, I use the asterisk in other types of elaborate expressions, like s*** or f***. The asterisk is the hell of a useful symbol, as you can see.

Now, I add that idea of local, virtual currency based on green energies. Back in the day, I used to call it ‘Wasun’, a play on words ‘water’ and ‘sun’. You can look up  ‘Smart grids and my personal variance’  or  ‘Les moulins de Wasun’ (in French) in order to catch a bit (again?) on my drift. I want a local, virtual currency being a significant part of the local monetary system. I define ‘significant part’ as an amount likely to alter the supply of credit, in established currency, in the local market. I use that old trick of the supply of credit being equal to the supply of money, and so being possible to symbolize with M. I assign the symbol ‘W’ to the local supply of the Wasun. I take two moments in time: the ‘before’, represented as T0, with T1 standing for the ‘after’. I make the condition #4: W/M(T1) > W/M(T0).

Wrapping it up, any particular event falling into:

Q(RE) = S(RE) = D(E)

P(RE) ≤ PP(E)

ROA ≥ ROA*

W/M(T1) > W/M(T0)

… is a success. Anything outside those triple brackets is a failure. Now, I can take three basic approaches in terms of probability. Thomas Bayes would assume a certain number n of trials, look for the probability of all the four conditions being met in one single trial, and then would ask me how many trials (p) I want to have successful, out of n. Simeon Denis Poisson would rather have taken an interval of time, and then would have tried to assess the probability of having all the four conditions met at least once in that interval of time. Satoshi Nakamoto would make up an even different strategy. He would assume that my project is just one of the many going on in parallel in that little universe, and would assume that other projects try to achieve their own conditions of success, similar to mine or different, as I try to do my thing. The next step would to be to define, whose success would be my failure, and then I would have to compute the probability of my success in the presence of those competing projects. Bloody complicated. I like it. I’m in.

[1] Mr. Bayes, and Mr Price. “An essay towards solving a problem in the doctrine of chances. by the late rev. mr. bayes, frs communicated by mr. price, in a letter to john canton, amfrs.” Philosophical Transactions (1683-1775) (1763): 370-418