# It works again

I intend to work on iterations. My general purpose with learning to program in Python is to create my own algorithms of artificial neural networks, in line with what I have already done in that respect using just Excel. Iteration is the essence of artificial intelligence, to the extent that the latter manifests as an intelligent structure producing many alternative versions of itself. Many means one at a time over many repetitions.

When I run my neural networks in Excel, they do a finite number of iterations. That would be a Definite Iteration in Python, thus the structure based on the ‘for’ expression. I am helping myself  with the tutorial available at https://realpython.com/python-for-loop/ . Still, as programming is supposed to enlarge my Excel-forged intellectual horizons, I want to understand and practice the ‘while’ loop in Python, thus Indefinite Iteration (https://realpython.com/python-while-loop/ ).

Anyway, programming a loop is very different from looping over multiple rows of an Excel sheet. The latter simply makes a formula repeat over many rows, whilst the former requires defining the exact operation to iterate, the input domain which the iteration takes as data, and the output dataset to store the outcome of iteration.

It is time, therefore, to describe exactly the iteration I want to program in Python. As a matter of fact, we are talking about a few different iterations. The first one is the standardisation of my source data. I can use two different ways of standardising it, depending on the neural activation function I use. The baseline method is to standardise each variable over its maximum, and then it fits every activation function I use. It is standardised value of x, AKA s(x), being calculated as s(x) = x/max(x)

If I focus just on the hyperbolic tangent as activation function, I can use the first method, or I can standardise by mean-reversion, where s(x) = [x – avg(x)]/std(x). In a first step, I subtract from x the average expected value of x – this is the the [x – avg(x)] expression – and then I divide the resulting difference by the standard deviation of x, or std(x)

The essential difference between those two modes of standardisation is the range of standardised values. When denominated in units of the max(x), standardised values range in 0 ≥ std(x) ≥ 1. When I standardise by mean-reversion, I have -1 ≥ std(x) ≥ 1.

The piece of programming I start that specific learning of mine with consists in transforming my source Data Frame ‘df’ into its standardised version ’s_df’ by dividing values in each column of df by their maximums. As I think of all that, it comes to my mind what I have recently learnt, namely that operations on Numpy arrays, in Python, are much faster than the same operations on data frames built with Python Pandas. I check if I can make a Data Frame out of an imported CSV file, and then turn it into a Numpy array.

Let’s walse. I start by opening JupyterLab at https://hub.gke2.mybinder.org/user/jupyterlab-jupyterlab-demo-nocqldur/lab and creating a notebook with Python 3 as its kernel. Then, I import the libraries which I expect to use one way or another: NumPy, Pandas, Matplot, OS, and Math. In other words, I do:

>> import numpy as np

>> import pandas as pd

>> import matplotlib.pyplot as plt

>> import math

>> import os

Then, I upload a CSV file and I import it into a Data Frame. It is a database I used in my research on cities and urbanization, its name is ‘DU_DG database.csv’, and, as it is transformed from an Excel file, I take care to specify that separators are semi-columns.

The resulting Data Frame is structured as:

Index([‘Index’, ‘Country’, ‘Year’, ‘DU/DG’, ‘Population’,

‘GDP (constant 2010 US\$)’, ‘Broad money (% of GDP)’,

‘urban population absolute’,

‘Energy use (kg of oil equivalent per capita)’, ‘agricultural land km2’,

‘Cereal yield (kg per hectare)’],

dtype=’object’)

Import being successful (I just check with commands ‘DU_DG.shape’ and ‘DU_DG.head()’), I am trying to create a NumPy array. Of course, there is not much sense in translating names of countries and labels of years into a NumPy array. I try to select numerical columns ‘DU/DG’, ‘Population’, ‘GDP (constant 2010 US\$)’, ‘Broad money (% of GDP)’, ‘urban population absolute’, ‘Energy use (kg of oil equivalent per capita)’, ‘agricultural land km2’, and ‘Cereal yield (kg per hectare)’, by commanding:

>> DU_DGnumeric=np.array(DU_DG[‘DU/DG’,’Population’,’GDP (constant 2010 US\$)’,’Broad money (% of GDP)’,’urban population absolute’,’Energy use (kg of oil equivalent per capita)’,’agricultural land km2′,’Cereal yield (kg per hectare)’])

The answer I get from Python 3 is a gentle ‘f**k you!’, i.e. an elaborate error message.

KeyError                                  Traceback (most recent call last)

/srv/conda/envs/notebook/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)

2656             try:

-> 2657                 return self._engine.get_loc(key)

2658             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: (‘DU/DG’, ‘Population’, ‘GDP (constant 2010 US\$)’, ‘Broad money (% of GDP)’, ‘urban population absolute’, ‘Energy use (kg of oil equivalent per capita)’, ‘agricultural land km2’, ‘Cereal yield (kg per hectare)’)

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)

<ipython-input-18-e438a5ba1aa2> in <module>

—-> 1 DU_DGnumeric=np.array(DU_DG[‘DU/DG’,’Population’,’GDP (constant 2010 US\$)’,’Broad money (% of GDP)’,’urban population absolute’,’Energy use (kg of oil equivalent per capita)’,’agricultural land km2′,’Cereal yield (kg per hectare)’])

/srv/conda/envs/notebook/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)

2925             if self.columns.nlevels > 1:

2926                 return self._getitem_multilevel(key)

-> 2927             indexer = self.columns.get_loc(key)

2928             if is_integer(indexer):

2929                 indexer = [indexer]

/srv/conda/envs/notebook/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)

2657                 return self._engine.get_loc(key)

2658             except KeyError:

-> 2659                 return self._engine.get_loc(self._maybe_cast_indexer(key))

2660         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

2661         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: (‘DU/DG’, ‘Population’, ‘GDP (constant 2010 US\$)’, ‘Broad money (% of GDP)’, ‘urban population absolute’, ‘Energy use (kg of oil equivalent per capita)’, ‘agricultural land km2’, ‘Cereal yield (kg per hectare)’)

Didn’t work, obviously. I try something else. I proceed in two steps. First, I create a second Data Frame out of the numerical columns of DU_DG. I go:

>> DU_DGNumCol=pd.DataFrame(DU_DG.columns[‘DU/DG’, ‘Population’,’GDP (constant 2010 US\$)’, ‘Broad money (% of GDP)’,’urban population absolute’,’Energy use (kg of oil equivalent per capita)’, ‘agricultural land km2’,’Cereal yield (kg per hectare)’])

Python seems to have accepted the command without reserves, and yet something strange happens. Informative commands about that second Data Frame, i.e. DU_DGNumCol, such as ‘DU_DGNumCol.head()’, ‘DU_DGNumCol.shape’ or ‘DU_DGNumCol.info‘ don’t work, as if DU_DGNumCol had no structure at all.

Cool. I investigate. I want to check how does Python see data in my DU_DG data frame. I do ‘DU_DG.describe()’ first, and, to my surprise, I can see descriptive statistics just for columns ‘Index’ and ‘Year’. The legitimate WTF? question pushes me to type ‘DU_DG.info()’ and here is what I get:

<class ‘pandas.core.frame.DataFrame’>

RangeIndex: 896 entries, 0 to 895

Data columns (total 11 columns):

Index                                           896 non-null int64

Country                                         896 non-null object

Year                                            896 non-null int64

DU/DG                                           896 non-null object

Population                                      896 non-null object

GDP (constant 2010 US\$)                         896 non-null object

Broad money (% of GDP)                          896 non-null object

urban population absolute                       896 non-null object

Energy use (kg of oil equivalent per capita)    896 non-null object

agricultural land km2                           896 non-null object

Cereal yield (kg per hectare)                   896 non-null object

dtypes: int64(2), object(9)

memory usage: 77.1+ KB

I think I understand. My numerical data has been imported as object, and I want it to be float values.  Once again, I have the same valuable lesson: before I do anything with my data, in Python, I need to  check and curate it. It is strangely connected to my theory of collective intelligence. Our human perception accepts empirical experience for further processing, especially for collective processing at the level of culture, only if said experience has the right form. We tend to ignore phenomena, which manifest in a form we are not used to process cognitively.

Just by sheer curiosity, I take another dataset and I repeat the whole sequence of import from CSV, and definition of data type. This time, I take a reduced version of Penn Tables 9.1. The full citation due in this case is: Feenstra, Robert C., Robert Inklaar and Marcel P. Timmer (2015), “The Next Generation of the Penn World Table” American Economic Review, 105(10), 3150-3182, available for download at www.ggdc.net/pwt. The ‘reduced’ part means that I took out of the database all the rows (i.e. country <> year observations) with at least one empty cell. I go:

>> PWT=pd.DataFrame(pd.read_csv(‘PWT 9_1 no empty cells.csv’))

…aaaaand it lands. Import successful. I test the properties of PWT data frame:

>> PWT.info()

yields:

<class ‘pandas.core.frame.DataFrame’>

RangeIndex: 3006 entries, 0 to 3005

Data columns (total 44 columns):

country      3006 non-null object

year         3006 non-null int64

rgdpe        3006 non-null float64

rgdpo        3006 non-null float64

pop          3006 non-null float64

emp          3006 non-null float64

emp / pop    3006 non-null float64

avh          3006 non-null float64

hc           3006 non-null float64

ccon         3006 non-null float64

cda          3006 non-null float64

cgdpe        3006 non-null float64

cgdpo        3006 non-null float64

cn           3006 non-null float64

ck           3006 non-null float64

ctfp         3006 non-null float64

cwtfp        3006 non-null float64

rgdpna       3006 non-null float64

rconna       3006 non-null float64

rdana        3006 non-null float64

rnna         3006 non-null float64

rkna         3006 non-null float64

rtfpna       3006 non-null float64

rwtfpna      3006 non-null float64

labsh        3006 non-null float64

irr          3006 non-null float64

delta        3006 non-null float64

xr           3006 non-null float64

pl_con       3006 non-null float64

pl_da        3006 non-null float64

pl_gdpo      3006 non-null float64

csh_c        3006 non-null float64

csh_i        3006 non-null float64

csh_g        3006 non-null float64

csh_x        3006 non-null float64

csh_m        3006 non-null float64

csh_r        3006 non-null float64

pl_c         3006 non-null float64

pl_i         3006 non-null float64

pl_g         3006 non-null float64

pl_x         3006 non-null float64

pl_m         3006 non-null float64

pl_n         3006 non-null float64

pl_k         3006 non-null float64

dtypes: float64(42), int64(1), object(1)

memory usage: 1.0+ MB

>> PWT.describe()

gives nice descriptive statistics. This dataset has been imported in the format I want. I do the same thing I attempted with the DU_DG dataset: I try to convert it into a NumPy array and to check the shape obtained. I do:

>> PWTNumeric=np.array(PWT)

>> PWTNumeric.shape

I get (3006,44), i.e. 3006 rows over 44 columns.

I try to wrap my mind around standardising values in PWT. I start gently. I slice one column out of PWT, namely the AVH variable, which stands for the average number of hours worked per person per year. I do:

>> AVH=pd.DataFrame(PWT[‘avh’])

>> stdAVH=pd.DataFrame(AVH/AVH.max())

Apparently, it worked. I check with ‘stdAVH.describe()’ and I get a nice distribution of values between 0 and 1.

I do the same thing with mean-reversion. I create the ‘mrAVH’ data frame according to the s(x) = [x – avg(x)]/std(x) drill. I do:

>> mrAVH=pd.DataFrame((AVH-AVH.mean())/AVH.std())

…and I get a nice distribution of mean reverted values.

Cool. Now, it is time to try and iterate the same standardisation over many columns in the same Data Frame. I have already rummaged a bit and apparently it is not going to as simple as in Excel. It usually isn’t.

That would be all in that update. A short summary is due. It works again. I mean, learning something and keeping a journal of how exactly I learn, that thing works. I feel that special vibe, like ‘What the hell, even if it sucks, it is interesting’. Besides the technical details of programming, I have already learnt two big things about data analysis in Python. Firstly, however comfortable it is to use libraries such as NumPy or Pandas, being really efficient requires the understanding of small details at the very basic level, e.g. conversion of data types, and, as a matter of fact, the practical workability of different data types, selection of values in a data set, by row and by column, iteration over rows and columns etc. Secondly, once again, data works well in Python when it has been properly curated prior to analysis. Learning quick algorithmic ways to curate that data, without having to do is manually in Excel, is certainly an asset, which I need to learn.

# We haven’t nailed down all our equations yet

As I keep digging into the topic of collective intelligence, and my research thereon with the use of artificial neural networks, I am making a list of key empirical findings that pave my way down this particular rabbit hole. I am reinterpreting them with the new understandings I have from translating my mathematical model of artificial neural network into an algorithm. I am learning to program in Python, which comes sort of handy given I want to use AI. How could I have made and used artificial neural networks without programming, just using Excel? You see, that’s Laplace and his hypothesis that mathematics represent the structure of reality (https://discoversocialsciences.com/wp-content/uploads/2020/10/Laplace-A-Philosophical-Essay-on-Probabilities.pdf ).

An artificial neural network is a sequence of equations which interact, in a loop, with a domain of data. Just as any of us, humans, essentially. We just haven’t nailed down all of our own equations yet. What I can do and have done with Excel was to understand the structure of those equations and their order. This is a logical structure, and as long as I don’t give it any domain of data to feed on, is stays put.

When I feed data into that structure, it starts working. Now, with any set of empirical socio-economic variables I have worked with, so far, there is always 1 – 2 among them which are different from others as output. Generally, my neural network works differently according to the output variable I make it optimize. Yes, it is the output variable, supposedly being the desired outcome to optimize, and not the input variables treated as instrumental in that view, which makes the greatest difference in the results produced by the network.

That seems counterintuitive, and yet this is like the most fundamental common denominator of everything I have found out so far: the way that a simple neural network simulates the collective intelligence of human societies seems to be conditioned most of all by the variables pre-set as the output of the adaptation process, not by the input ones. Is it a sensible conclusion regarding collective intelligence in real life, or is it just a property of the data? In other words, is it social science or data science? This is precisely one of the questions which I want to answer by learning programming.

If it is a pattern of collective human intelligence, that would mean we are driven by the orientations pursued much more than by the actual perception of reality. What we are after would be more important a differentiating factor of your actions than what we perceive and experience as reality. Strangely congruent with the Interface Theory of Perception (Hoffman et al. 2015[1], Fields et al. 2018[2]).

As it is some kind of habit in me, in the second part of this update I put the account of my learning how to program and to Data Science in Python. This time, I wanted to work with hard cases of CSV import, like trouble files. I want to practice data cleansing. I have downloaded the ‘World Economic Outlook October 2020’ database from the website https://www.imf.org/en/Publications/WEO/weo-database/2020/October/download-entire-database . Already when downloading, I could notice that the announced format is ‘TAB delimited’, not ‘Comma Separated’. It downloads as Excel.

To start with, I used the https://anyconv.com/tab-to-csv-converter/ website to do the conversion. In parallel, I tested two other ways:

1. opening in Excel, and then saving as CSV
2. opening with Excel, converting to *.TXT, importing into Wizard for MacOS (statistical package), and then exporting as CSV.

What I can see like right off the bat are different sizes in the same data, technically saved in the same format. The AnyConv-generated CSV is 12,3 MB, the one converted through Excel is 9,6 MB, and the last one, filtered through Excel to TXT, then to Wizard and to CSV makes 10,1 MB. Intriguing.

I open JupyterLab online, and I create a Python 3-based Notebook titled ‘Practice 27_11_2020_part2’.

I prepare the Notebook by importing Numpy, Pandas, Matplotlib and OS. I do:

>> import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import os

I upload the AnyConv version of the CSV. I make sure to have the name of the file right by doing:

>> os.listdir()

…and I do:

Result:

/srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3072: DtypeWarning: Columns (83,85,87,89,91,93,95,98,99,102,103,106,107,110,111,114,115,118,119,122,123,126,127,130,131,134,135,138,139,142,143,146,147,150,151,154,155,158) have mixed types. Specify dtype option on import or set low_memory=False.

interactivity=interactivity, compiler=compiler, result=result)

As I have been told, I add the “low_memory=False” option to the command, and I retype:

Result: the file is apparently imported successfully. I investigate the structure.

>> WEO1.describe()

Result: I know I have 8 rows (there should be much more, over 200), and 32 columns. Something is wrong.

Result: Parser error

I retry, with parameter sep=‘;’ (usually works with Excel)

Result: import successful. Let’s check the shape of the data

>> WEO2.describe()

Result: Pandas can see just the last column. I make sure.

>> WEO2.columns

Result:

Index([‘WEO Country Code’, ‘ISO’, ‘WEO Subject Code’, ‘Country’,

‘Subject Descriptor’, ‘Subject Notes’, ‘Units’, ‘Scale’,

‘Country/Series-specific Notes’, ‘1980’, ‘1981’, ‘1982’, ‘1983’, ‘1984’,

‘1985’, ‘1986’, ‘1987’, ‘1988’, ‘1989’, ‘1990’, ‘1991’, ‘1992’, ‘1993’,

‘1994’, ‘1995’, ‘1996’, ‘1997’, ‘1998’, ‘1999’, ‘2000’, ‘2001’, ‘2002’,

‘2003’, ‘2004’, ‘2005’, ‘2006’, ‘2007’, ‘2008’, ‘2009’, ‘2010’, ‘2011’,

‘2012’, ‘2013’, ‘2014’, ‘2015’, ‘2016’, ‘2017’, ‘2018’, ‘2019’, ‘2020’,

‘2021’, ‘2022’, ‘2023’, ‘2024’, ‘2025’, ‘Estimates Start After’],

dtype=’object’)

I will try to import the same file with a different ‘sep’ parameter, this time as sep=‘\t’

Result: import apparently successful. I check the shape of my data.

>> WEO3.describe()

Result: apparently, this time, no column is distinguished.

When I type:

>> WEO3.columns

…I get

Index([‘WEO Country Code;ISO;WEO Subject Code;Country;Subject Descriptor;Subject Notes;Units;Scale;Country/Series-specific Notes;1980;1981;1982;1983;1984;1985;1986;1987;1988;1989;1990;1991;1992;1993;1994;1995;1996;1997;1998;1999;2000;2001;2002;2003;2004;2005;2006;2007;2008;2009;2010;2011;2012;2013;2014;2015;2016;2017;2018;2019;2020;2021;2022;2023;2024;2025;Estimates Start After’], dtype=’object’)

Now, I test with the 3rd file, the one converted through Wizard.

Result: import successful.

I check the shape.

>> WEO4.describe()

Result: still just 8 rows. Something is wrong.

I do another experiment. I take the original*.XLS from imf.org, and I save it as regular Excel *.XLSX, and then I save this one as CSV.

Result: parser error

I will retry with two options as for the separator: sep=‘;’ and sep=‘\t’. Ledzeee…

Import successful. “WEO5.describe()” yields just one column.

yields successful import, yet all the data is just one long row, without separation into columns.

I check WEO5 and WEO6 with “*.index”, and “*.shape”.

“WEO5.index” yields “RangeIndex(start=0, stop=8777, step=1)”

“WEO6.index” yields “RangeIndex(start=0, stop=8777, step=1)

“WEO5.shape” gives “(8777, 56)”

“WEO6.shape” gives “(8777, 1)”

Depending on the separator given as parameter in the “pd.read_csv” command, I get 56 columns or just 1 column, yet the “*.describe()” command cannot make sense of them.

I try the *.describe” command, thus more specific than the “*.describe()” one.

I can see that structures are clearly different.

I try another trick, namely to assume separator ‘;’ and TAB delimiter.

Result: WEO7.shape yields 8777 rows in just one column.

The provisional moral of the fairy tale is that ‘Data cleansing’ means very largely making sense of the exact shape and syntax of CSV files. Depending on the parametrisation of separators and delimiters, different Data Frames are obtained.

[1] Hoffman, D. D., Singh, M., & Prakash, C. (2015). The interface theory of perception. Psychonomic bulletin & review, 22(6), 1480-1506.

[2] Fields, C., Hoffman, D. D., Prakash, C., & Singh, M. (2018). Conscious agent networks: Formal analysis and application to cognition. Cognitive Systems Research, 47, 186-213. https://doi.org/10.1016/j.cogsys.2017.10.003

# I re-run my executable script

I am thinking (again) about the phenomenon of collective intelligence, this time in terms of behavioural reinforcement that we give to each other, and the role that cities and intelligent digital clouds can play in delivering such reinforcement. As it is usually the case with science, there is a basic question to ask: ‘What’s the point of all the fuss with that nice theory of yours, Mr Wasniewski? Any good for anything?’.

Good question. My tentative answer is that studying human societies as collectively intelligent structures is a phenomenology, which allows some major methodological developments, which, I think, are missing from other methodologies in social sciences. First of all, it allows a completely clean slate at the starting point of research, as regards ethics and moral orientations, whilst it almost inevitably leads to defining ethical values through empirical research. This was my first big ‘Oh, f**k!’ with that method: I realized that ethical values can be reliably studied as objectively pursued outcomes at the collective level, and that study can be robustly backed with maths and empirics.

I have that thing with my science, and, as a matter of fact, with other people’s science too: I am an empiricist. I like prodding my assumptions and make them lose some fat, so as they become lighter. I like having as much of a clean slate at the starting point of my research as possible. I believe that one single assumption, namely that human social structures are collectively intelligent structures, almost automatically transforms all the other assumptions into hypotheses to investigate. Still, I need to go, very carefully, through that one single Mother Of All Assumptions, i.e. about us, humans as a society, being collectively intelligent a structure, in order to nail down, and possibly kick out any logical shortcut.

Intelligent structures learn by producing many alternative versions of themselves and testing those versions for fitness in coping with a vector of constraints. There are three claims hidden in this single claim: learning, production of different versions, and testing for fitness. Do human social structures learn, like at all? Well, we have that thing called culture, and culture changes. There is observable change in lifestyles, aesthetic tastes, fashions, institutions and technologies. This is learning. Cool. One down, two still standing.

Do human social structures produce many different versions of themselves? Here, we enter the subtleties of distinction between different versions of a structure, on the one hand, and different structures, on the other hand. A structure remains the same, and just makes different versions of itself, as long as it stays structurally coherent. When it loses structural coherence, it turns into a different structure. How can I know that a structure keeps its s**t together, i.e. it stays internally coherent? That’s a tough question, and I know by experience that in the presence of tough questions, it is essential to keep it simple. One of the simplest facts about any structure is that it is made of parts. As long as all the initial parts are still there, I can assume they hold together somehow. In other words, as long as whatever I observe about social reality can be represented as the same complex set, with the same components inside, I can assume this is one and the same structure just making copies of itself. Still, this question remains a tough one, especially that any intelligent structure should be smart enough to morph into another intelligent structure when the time is right.

The time is right when the old structure is no longer able to cope with the vector of constraints, and so I arrive to the third component question: how can I know there is adaptation to constraints? How can I know there are constraints for assessing fitness? In a very broad sense, I can see constraints when I see error, and correction thereof, in someone’s behaviour. In other words, when I can see someone sort of making two steps forward and one step back, correcting their course etc., this is a sign of adaptation to constraints. Unconstrained change is linear or exponential, whilst constrained change always shows signs of bumping against some kind of wall. Here comes a caveat as regards using artificial neural networks as simulators of collective human intelligence: they are any good only when they have constraints, and, consequently, when they make errors. An artificial neural network is no good at simulating unconstrained change. When I explore the possibility of simulating collective human intelligence with artificial neural networks, it has marks of a pleonasm. I can use AI as simulator only when the simulation involves constrained adaptation.

F**k! I have gone philosophical in those paragraphs. I can feel a part of my mind gently disconnecting from real life, and this is time to do something in order to stay close to said real life. Here is a topic, which I can treat as teaching material for my students, and, in the same time, make those general concepts bounce a bit around, inside my head, just to see what happens. I make the following claim: ‘Markets are manifestations of collective intelligence in human societies’. In science, this is a working hypothesis. It is called ‘working’ because it is not proven yet, and thus it has to earn its own living, so to say. This is why it has to work.

I pass in review the same bullet points: learning, for one, production of many alternative versions in a structure as opposed to creating new structures, for two, and the presence of constraints as the third component. Do markets manifest collective learning? Ledzzeee… Markets display fashions and trends. Markets adapt to lifestyles, and vice versa. Markets are very largely connected to technological change and facilitate the occurrence thereof. Yes, they learn.

How can I say whether a market stays the same structure and just experiments with many alternative versions thereof, or, conversely, whether it turns into another structure? It is time to go back to the fundamental concepts of microeconomics, and assess (once more), what makes a market structure. A market structure is the mechanism of setting transactional prices. When I don’t know s**t about said mechanism, I just observe prices and I can see two alternative pictures. Picture one is that of very similar prices, sort of clustered in the same, narrow interval. This is a market with equilibrium price, which translates into a local market equilibrium. Picture two shows noticeably disparate prices in what I initially perceived as the same category of goods. There is no equilibrium price in that case, and speaking more broadly, there is no local equilibrium in that market.

Markets with local equilibriums are assumed to be perfectly competitive or very close thereto. They are supposed to serve for transacting in goods so similar that customers perceive them as identical, and technologies used for producing those goods don’t differ sufficiently to create any kind of competitive advantage (homogeneity of supply), for one. Markets with local equilibriums require the customers to be so similar to each other in their tastes and purchasing patterns that, on the whole, they can be assumed identical (homogeneity of demand), for two. Customers are supposed to be perfectly informed about all the deals available in the market (perfect information). Oh, yes, the last one: no barriers to entry or exit. A perfectly competitive market is supposed to offer virtually no minimum investment required for suppliers to enter the game, and no sunk costs in the case of exit.

Here is that thing: many markets present the alignment of prices typical for a state of local equilibrium, and yet their institutional characteristics – such as technologies, the diversity of goods offered, capital requirements and whatnot – do not match the textbook description of a perfectly competitive market. In other words, many markets form local equilibriums, thus they display equilibrium prices, without having the required institutional characteristics for that, at least in theory. In still other words, they manifest the alignment of prices typical for one type of market structure, whilst all the other characteristics are typical for another type of market structure.

Therefore, the completely justified ‘What the hell…?’question arises. What is a market structure, at the end of the day? What is a structure, in general?

I go down another avenue now. Some time ago, I signalled on my blog that I am learning programming in Python, or, as I should rather say, I make one more attempt at nailing it down. Programming teaches me a lot about the basic logic of what I do, including that whole theory of collective intelligence. Anyway, I started to keep a programming log, and here below, I paste the current entry, from November 27th, 2020.

2. plotting
3. saving and retrieving a Jupyter Notebook in JupyterLab

I am practicing with Penn World Tables 9.1. I take the version without empty cells, and I transform it into CSV.

I create a new notebook on JupyterLab. I name it ‘Practice November 27th 2020’.

• Path: demo/Practice November 27th 2020.ipynb

I upload the CSV version of Penn Tables 9.1 with no empty cells.

Path: demo/PWT 9_1 no empty cells.csv

I code libraries:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import os

I check my directory:

>> os.getcwd()

result: ‘/home/jovyan/demo’

>> os.listdir()

result:

[‘jupyterlab.md’,

‘TCGA_Data’,

‘Lorenz.ipynb’,

‘lorenz.py’,

‘notebooks’,

‘data’,

‘jupyterlab-slides.pdf’,

‘markdown_python.md’,

‘big.csv’,

‘Practice November 27th 2020.ipynb’,

‘.ipynb_checkpoints’,

‘Untitled.ipynb’,

‘PWT 9_1 no empty cells.csv’]

Result:

File “<ipython-input-5-32375ff59964>”, line 1

^

SyntaxError: invalid character in identifier

>> I rename the file on Jupyter, into ‘PWT 9w1 no empty cells.csv’.

>> os.listdir()

Result:

[‘jupyterlab.md’,

‘TCGA_Data’,

‘Lorenz.ipynb’,

‘lorenz.py’,

‘notebooks’,

‘data’,

‘jupyterlab-slides.pdf’,

‘markdown_python.md’,

‘big.csv’,

‘Practice November 27th 2020.ipynb’,

‘.ipynb_checkpoints’,

‘Untitled.ipynb’,

‘PWT 9w1 no empty cells.csv’]

Result: imported successfully

>> PWT9w1.describe()

Result: descriptive statistics

# I want to list columns (variables) in my file

>> PWT9w1.columns

Result:

Index([‘country’, ‘year’, ‘rgdpe’, ‘rgdpo’, ‘pop’, ’emp’, ’emp / pop’, ‘avh’,

‘hc’, ‘ccon’, ‘cda’, ‘cgdpe’, ‘cgdpo’, ‘cn’, ‘ck’, ‘ctfp’, ‘cwtfp’,

‘rgdpna’, ‘rconna’, ‘rdana’, ‘rnna’, ‘rkna’, ‘rtfpna’, ‘rwtfpna’,

‘labsh’, ‘irr’, ‘delta’, ‘xr’, ‘pl_con’, ‘pl_da’, ‘pl_gdpo’, ‘csh_c’,

‘csh_i’, ‘csh_g’, ‘csh_x’, ‘csh_m’, ‘csh_r’, ‘pl_c’, ‘pl_i’, ‘pl_g’,

‘pl_x’, ‘pl_m’, ‘pl_n’, ‘pl_k’],

dtype=’object’)

>> PWT9w1.columns()

Result:

TypeError                                 Traceback (most recent call last)

<ipython-input-11-38dfd3da71de> in <module>

—-> 1 PWT9w1.columns()

TypeError: ‘Index’ object is not callable

# I try plotting

>> plt.plot(df.index, df[‘rnna’])

Result:

I get a long list of rows like: ‘<matplotlib.lines.Line2D at 0x7fc59d899c10>’, and a plot which is visibly not OK (looks like a fan).

# I want to separate one column from PWT9w1 as a separate series, and then plot it. Maybe it is going to work.

>> RNNA=pd.DataFrame(PWT9w1[‘rnna’])

Result: apparently successful.

# I try to plot RNNA

>> RNNA.plot()

Result:

<matplotlib.axes._subplots.AxesSubplot at 0x7fc55e7b9e10> + a basic graph. Good.

# I try to extract a few single series from PWT9w1 and to plot them. Let’s go for AVH, PL_I and CWTFP.

>> AVH=pd.DataFrame(PWT9w1[‘avh’])

>> PL_I=pd.DataFrame(PWT9w1[‘pl_i’])

>> CWTFP=pd.DataFrame(PWT9w1[‘cwtfp’])

>> AVH.plot()

>> PL_I.plot()

>> CWTFP.plot()

Result:

It worked. I have basic plots.

# It is 8:20 a.m. I go to make myself a coffee. I will quit JupyterLab for a moment. I saved my today’s notebook on server, and I will see how I can open it. Just in case, I make a PDF copy, and a Python copy on my disk.

I cannot do saving into PDF. An error occurs. I will have to sort it out. I made an *.ipynb copy on my disk.

demo/Practice November 27th 2020.ipynb

# It is 8:40 a.m. I am logging back into JupyterLab. I am trying to open my today’s notebook from path. Does not seem to work. I am uploading my *.ipynb copy. This worked. I know now: I upload the *.ipynb script from my own location and then just double click on it. I needed to re-upload my CSV file ‘PWT 9w1 no empty cells.csv’.

# I check if my re-uploaded CSV file is fully accessible. I discover that I need to re-create the whole algorithm. In other words: when I upload on JupyterLab a *.ipynb script from my disk, I need to re-run all the operations. My first idea is to re-run each executable cell in the uploaded script. That worked. Question: how to automatise it? Probably by making a Python script all in one piece, uploading my CSV data source first, and then run the whole script.

# I like being a mad scientist

I like being a mad scientist. Am I a mad scientist? A tiny bit, yes, ‘cause I do research on things just because I feel like. Mind you, me being that mad scientist I like being happens to be practical. Those rabbit holes I dive into prove to have interesting outcomes in real life.

I feel like writing, and therefore thinking in an articulate way, about two things I do in parallel: science and investment. I have just realized these two realms of activity tend to merge and overlap in me. When I do science, I tend to think like an investor, or a gardener. I invest my personal energy in ideas which I think have potential for growth. On the other hand, I invest in the stock market with a strong dose of curiosity. Those companies, and the investment positions I can open therein, are like animals which I observe, try to figure out how not to get killed by them, or by predators that hunt them, and I try to domesticate those beasts.

The scientific thing I am working on is the application of artificial intelligence to studying collective intelligence in human societies. The thing I am working on sort of at the crest between science and investment is fundraising for scientific projects (my new job at the university).

The project aims at defining theoretical and empirical fundamentals for using intelligent digital clouds, i.e. large datasets combined with artificial neural networks, in the field of remote digital diagnostics and remote digital care, in medical sciences and medical engineering. That general purpose translates into science strictly speaking, and into the prospective development of medical technologies.

There is observable growth in the percentage of population using various forms of digital remote diagnostics and healthcare. Yet, that growth is very uneven across different social groups, which suggests an early, pre-popular stage of development in those technologies (Mahajan et al. 2020[i]). Other research confirms that supposition, as judging by the very disparate results obtained with those technologies, in terms of diagnostic and therapeutic effectiveness (Cheng et al. 2020[ii]; Wong et al. 2020[iii]). There are known solutions where intelligent digital cloud allows transforming the patient’s place of stay (home, apartment) into the local substitute of a hospital bed, which opens interesting possibilities as regards medical care for patients with significantly reduced mobility, e.g. geriatric patients (Ben Hassen et al. 2020[iv]). Already around 2015, creative applications of medical imagery appeared, where the camera of a person’s smartphone served for early detection of skin cancer (Bliznuks et al. 2017[v]). The connection between distance diagnostics with the acquisition and processing of image comes as one of the most interesting and challenging innovations to make in the here-discussed field of technology (Marwan et al. 2018[vi]). The experience of COVID-19 pandemic has already showed the potential of digital intelligent clouds in assisting national healthcare systems, especially in optimising and providing flexibility to the use of resources, both material and human (Alashhab et al. 2020[vii]). Yet, the same pandemic experience has shown the depth of social disparities as regards real actual access to digital technologies supported by intelligent clouds (Whitelaw et al. 2020[viii]). Intelligent digital clouds enter into learning-generative interactions with the professionals of healthcare. There is observable behavioural modification, for example, in students of healthcare who train with such technologies from the very beginning of their education (Brown Wilson et al. 2020[ix]). That phenomenon of behavioural change requires rethinking from scratch, with the development of each individual technology, the ethical and legal issues relative to interactions between users, on the one hand, and system operators, on the other hand (Godding 2019[x]).

Against that general background, the present project focuses on studying the phenomenon of tacit coordination among the users of digital technologies in remote medical diagnostics and remote medical care. Tacit coordination is essential as regards the well-founded application of intelligent digital cloud to support and enhance these technologies. Intelligent digital clouds are intelligent structures, i.e. they learn by producing many alternative versions of themselves and testing those versions for fitness in coping with a vector of external constraints. It is important to explore the extent and way that populations of users behave similarly, i.e. as collectively intelligent structures. The deep theoretical meaning of that exploration is the extent to which the intelligent structure of a digital cloud really maps and represents the collectively intelligent structure of the users’ population.

The scientific method used in the project explores the main working hypothesis that populations of actual and/or prospective patients, in their own health-related behaviour, and in their relations with the healthcare systems, are collectively intelligent structures, with tacit coordination. In practical terms, that hypothesis means that any intelligent digital cloud in the domain of remote medical care should assume collectively intelligent, thus more than just individual, behavioural change on the part of users. Collectively intelligent behavioural change in a population, marked by tacit coordination, is a long-term, evolutionary process of adaptive walk in rugged landscape (Kauffman & Levin 1987[xi]; Nahum et al. 2015[xii]). Therefore, it is something deeper and more durable that fashions and styles. It is the deep, underlying mechanism of social change accompanying the use of digital intelligent clouds in medical engineering.

The scientific method used in this project aims at exploring and checking the above-stated working hypothesis by creating a large and differentiated dataset of health-related data, and processing that dataset in an intelligent digital cloud, in two distinct phases. The first phase consists in processing a first sample of data with a relatively simple, artificial neural network, in order to discover its underlying orientations and its mechanisms of collective learning. The second phase allows an intelligent digital cloud to respond adaptively to users behaviour, i.e to produce intelligent interaction with them. The first phase serves to understand the process of adaptation observable in the second phase. Both phases are explained more in detail below.

The tests of, respectively, orientation and mode of learning, in the first phase of empirical research aim at defining the vector of collectively pursued social outcomes in the population studied. The initially collected empirical dataset is transformed, with the use of an artificial neural network, into as many representations as there are variables in the set, with each representation being oriented on a different variable as its output (with the remaining ones considered as instrumental input). Each such transformation of the initial set can be tested for its mathematical similarity therewith (e.g. for Euclidean distance between the vectors of expected mean values). Transformations displaying relatively the greatest similarity to the source dataset are assumed to be the most representative for the collectively intelligent structure in the population studied, and, consequently, their output variables can be assumed to represent collectively pursued social outcomes in that collective intelligence (see, for example: Wasniewski 2020[xiii]). Modes of learning in that dataset can be discovered by creating a shadow vector of probabilities (representing, for example, a finite set of social roles endorsed with given probabilities by members of the population), and a shadow process that introduces random disturbance, akin to the theory of Black Swans (Taleb 2007[xiv]; Taleb & Blyth 2011[xv]). The so-created shadow structure is subsequently transformed with an artificial neural network in as many alternative versions as there are variables in the source empirical dataset, each version taking a different variable from the set as its pre-set output. Three different modes of learning can be observed, and assigned to particular variables: a) cyclical adjustment without clear end-state b) finite optimisation with defined end-state and c) structural disintegration with growing amplitude of oscillation around central states.

The above-summarised first phase of research involves the use of two basic digital tools, i.e. an online functionality to collect empirical data from and about patients, and an artificial neural network to process it. There comes an important aspect of that first phase in research, i.e. the actual collectability and capacity to process the corresponding data. It can be assumed that comprehensive medical care involves the collection of both strictly health-related data (e.g. blood pressure, blood sugar etc.), and peripheral data of various kinds (environmental, behavioural). The complexity of data collected in that phase can be additionally enhanced by including imagery such as pictures taken with smartphones (e.g. skin, facial symmetry etc.). In that respect, the first phase of research aims at testing the actual possibility and reliability of collection in various types of data. Phenomena such as outliers of fake data can be detected then.

Once the first phase is finished and expressed in the form of theoretical conclusions, the second phase of research is triggered. An intelligent digital cloud is created, with the capacity of intelligent adaptation to users’ behaviour. A very basic example of such adaptation are behavioural reinforcements. The cloud can generate simple messages of praise for health-functional behaviour (positive reinforcements), or, conversely, warning messages in the case of health-dysfunctional behaviour (negative reinforcements). More elaborate form of intelligent adaptation are possible to implement, e.g. a Twitter-like reinforcement to create trending information, or a Tik-Tok-like reinforcement to stay in the loop of communication in the cloud. This phase aims specifically at defining the actually workable scope and strength of possible behavioural reinforcements which a digital functionality in the domain of healthcare could use vis a vis its end users. Legal and ethical implications thereof are studied as one of the theoretical outcomes of that second phase.

I feel like generalizing a bit my last few updates, and to develop on the general hypothesis of collectively intelligent, human social structures. In order to consider any social structure as manifestation of collective intelligence, I need to place intelligence in a specific empirical context. I need an otherwise exogenous environment, which the social structure has to adapt to. Empirical study of collective intelligence, such as I have been doing it, and, as a matter of fact, the only one I know how to do, consists in studying adaptive effort in human social structures.

[i] Shiwani Mahajan, Yuan Lu, Erica S. Spatz, Khurram Nasir, Harlan M. Krumholz, Trends and Predictors of Use of Digital Health Technology in the United States, The American Journal of Medicine, 2020, ISSN 0002-9343, https://doi.org/10.1016/j.amjmed.2020.06.033 (http://www.sciencedirect.com/science/article/pii/S0002934320306173  )

[ii] Lei Cheng, Mingxia Duan, Xiaorong Mao, Youhong Ge, Yanqing Wang, Haiying Huang, The effect of digital health technologies on managing symptoms across pediatric cancer continuum: A systematic review, International Journal of Nursing Sciences, 2020, ISSN 2352-0132, https://doi.org/10.1016/j.ijnss.2020.10.002 , (http://www.sciencedirect.com/science/article/pii/S2352013220301630 )

[iii] Charlene A. Wong, Farrah Madanay, Elizabeth M. Ozer, Sion K. Harris, Megan Moore, Samuel O. Master, Megan Moreno, Elissa R. Weitzman, Digital Health Technology to Enhance Adolescent and Young Adult Clinical Preventive Services: Affordances and Challenges, Journal of Adolescent Health, Volume 67, Issue 2, Supplement, 2020, Pages S24-S33, ISSN 1054-139X, https://doi.org/10.1016/j.jadohealth.2019.10.018 , (http://www.sciencedirect.com/science/article/pii/S1054139X19308675 )

[iv] Hassen, H. B., Ayari, N., & Hamdi, B. (2020). A home hospitalization system based on the Internet of things, Fog computing and cloud computing. Informatics in Medicine Unlocked, 100368, https://doi.org/10.1016/j.imu.2020.100368

[v] Bliznuks, D., Bolocko, K., Sisojevs, A., & Ayub, K. (2017). Towards the Scalable Cloud Platform for Non-Invasive Skin Cancer Diagnostics. Procedia Computer Science, 104, 468-476

[vi] Marwan, M., Kartit, A., & Ouahmane, H. (2018). Security enhancement in healthcare cloud using machine learning. Procedia Computer Science, 127, 388-397.

[vii] Alashhab, Z. R., Anbar, M., Singh, M. M., Leau, Y. B., Al-Sai, Z. A., & Alhayja’a, S. A. (2020). Impact of Coronavirus Pandemic Crisis on Technologies and Cloud Computing Applications. Journal of Electronic Science and Technology, 100059. https://doi.org/10.1016/j.jnlest.2020.100059

[viii] Whitelaw, S., Mamas, M. A., Topol, E., & Van Spall, H. G. (2020). Applications of digital technology in COVID-19 pandemic planning and response. The Lancet Digital Health. https://doi.org/10.1016/S2589-7500(20)30142-4

[ix] Christine Brown Wilson, Christine Slade, Wai Yee Amy Wong, Ann Peacock, Health care students experience of using digital technology in patient care: A scoping review of the literature, Nurse Education Today, Volume 95, 2020, 104580, ISSN 0260-6917, https://doi.org/10.1016/j.nedt.2020.104580 ,(http://www.sciencedirect.com/science/article/pii/S0260691720314301 )

[x] Piers Gooding, Mapping the rise of digital mental health technologies: Emerging issues for law and society, International Journal of Law and Psychiatry, Volume 67, 2019, 101498, ISSN 0160-2527, https://doi.org/10.1016/j.ijlp.2019.101498 , (http://www.sciencedirect.com/science/article/pii/S0160252719300950 )

[xi] Kauffman, S., & Levin, S. (1987). Towards a general theory of adaptive walks on rugged landscapes. Journal of theoretical Biology, 128(1), 11-45

[xii] Nahum, J. R., Godfrey-Smith, P., Harding, B. N., Marcus, J. H., Carlson-Stevermer, J., & Kerr, B. (2015). A tortoise–hare pattern seen in adapting structured and unstructured populations suggests a rugged fitness landscape in bacteria. Proceedings of the National Academy of Sciences, 112(24), 7530-7535, www.pnas.org/cgi/doi/10.1073/pnas.1410631112

[xiii] Wasniewski, K. (2020). Energy efficiency as manifestation of collective intelligence in human societies. Energy, 191, 116500. https://doi.org/10.1016/j.energy.2019.116500

[xiv] Taleb, N. N. (2007). The black swan: The impact of the highly improbable (Vol. 2). Random house

[xv] Taleb, N. N., & Blyth, M. (2011). The black swan of Cairo: How suppressing volatility makes the world less predictable and more dangerous. Foreign Affairs, 33-39

I am changing the path of my writing, ‘cause real life knocks at my door, and it goes ‘Hey, scientist, you economist, right? Good, ‘cause there is some good stuff, I mean, ideas for business. That’s economics, right? Just sort of real stuff, OK?’. Sure. I can go with real things, but first, I explain. At my university, I have recently taken on the job of coordinating research projects and finding some financing for them. One of the first things I did, right after November 1st, was to send around a reminder that we had 12 days left to apply, with the Ministry of Science and Higher Education, for relatively small grants, in a call titled ‘Students make innovation’. Honestly, I was expecting to have 1 – 2 applications max, in response. Yet, life can make surprises. There are 7 innovative ideas in terms of feedback, and 5 of them look like good material for business concepts and for serious development. I am taking on giving them a first prod, in terms of business planning. Interestingly, those ideas are all related to medical technologies, thus something I have been both investing a lot in, during 2020, and thinking a lot about, as a possible path of substantial technological change.

I am progressively wrapping my mind up around ideas and projects formulated by those students, and, walking down the same intellectual avenue, I am making sense of making money on and around science. I am fully appreciating the value of real-life experience. I have been doing research and writing about technological change for years. Until recently, I had that strange sort of complex logical oxymoron in my mind, where I had the impression of both understanding technological change, and missing a fundamental aspect of it. Now, I think I start to understand that missing part: it is the microeconomic mechanism of innovation.

I have collected those 5 ideas from ambitious students at Faculty of Medicine, in my university:

>> Idea 1: An AI-based app, with a chatbot, which facilitates early diagnosis of cardio-vascular diseases

>> Idea 2: Similar thing, i.e. a mobile app, but oriented on early diagnosis and monitoring of urinary incontinence in women.

>> Idea 3: Technology for early diagnosis of Parkinson’s disease, through the observation of speech and motor disturbance.

>> Idea 4: Intelligent cloud to store, study and possibly find something smart about two types of data: basic health data (blood-work etc.), and environmental factors (pollution, climate etc.).

>> Idea 5: Something similar to Idea 4, i.e. an intelligent cloud with medical edge, but oriented on storing and studying data from large cohorts of patients infected with Sars-Cov-2.

As I look at those 5 ideas, surprisingly simple and basic association of ideas comes to my mind: hierarchy of interest and the role of overarching technologies. It is something I have never thought seriously about: when we face many alternative ideas for new technologies, almost intuitively we hierarchize them. Some of them seem more interesting, some others are less. I am trying to dig out of my own mind the criteria I use, and here they are: I hierarchize with the expected lifecycle of technology, and the breadth of the technological platform involved. In other words, I like big, solid, durable stuff. I am intuitively looking for innovations which offer a relatively long lifecycle in the corresponding technology, and the technology involved is sort of two-level, with a broad base and many specific applicational developments built upon that base.

Why do I take this specific approach? One step further down into my mind, I discover the willingness to have some sort of broad base of business and scientific points of attachment when I start business planning. I want some kind of horizon to choose my exact target on. The common technological base among those 5 ideas is some kind of intelligent digital cloud, with artificial intelligence learns on the data that flows in. The common scientific base is the collection of health-related data, including behavioural aspects (e.g. sleep, diet, exercise, stress management).

The financial context which I am operating in is complex. It is made of public financial grants for strictly speaking scientific research, other public financing for projects more oriented on research and development in consortiums made of universities and business entities, still a different stream of financing for business entities alone, and finally private capital to look for once the technology is ripe enough for being marketed.

I am operating from an academic position. Intuitively, I guess that the more valuable science academic people bring to their common table with businesspeople and government people, the better position those academics will have in any future joint ventures. Hence, we should max out on useful, functional science to back those ideas. I am trying to understand what that science should consist in. An intelligent digital cloud can yield mind-blowing findings. I know that for a fact from my own research. Yet, what I know too is that I need very fundamental science, something at the frontier of logic, philosophy, mathematics, and of the phenomenology pertinent to the scientific research at hand, in order to understand and use meaningfully whatever the intelligent digital cloud spits back out, after being fed with data. I have already gone once through that process of understanding, as I have been working on the application of artificial neural networks to the simulation of collective intelligence in human societies. I had to coin up a theory of intelligent structure, applicable to the problem at hand. I believe that any application of intelligent digital cloud requires assuming that whatever we investigate with that cloud is an intelligent structure, i.e. a structure which learns by producing many alternative versions of itself, and testing them for their fitness to optimize a given desired outcome.

With those medical ideas, I (we?) need to figure out what the intelligent structure in action is, how can it possibly produce many alternative versions of itself, and how those alternative thingies can be tested for fitness. What we have in a medically edged digital cloud is data about a population of people. The desired outcome we look for is health, quite simply. I said ‘simply’? No, it was a mistake. It is health, in all complexity. Those apps our students want to develop are supposed to pull someone out of the crowd, someone with early symptoms which they do not identify as relevant. In a next step, some kind of dialogue is proposed to such a person, sort of let’s dig a bit more into those symptoms, let’s try something simple to treat them etc. The vector of health in that population is made, roughly speaking, of three sub-vectors: preventive health (e.g. exercise, sleep, stop eating crap food), effectiveness of early medical intervention (e.g. c’mon men, if you are 30 and can’t have erection, you are bound to concoct some cardio-vascular s**t), and finally effectiveness of advanced medicine, applied when the former two haven’t worked.

I can see at least one salient, scientific hurdle to jump over: that outcome vector of health. In my own research, I found out that artificial neural networks can give empirical evidence as for what outcomes we are really actually after, as collectively intelligent a structure. That’s my first big idea as regards those digital medical solutions: we collect medical and behavioural data in the cloud, we assume that data represents experimental learning of a collectively intelligent social structure, and we make the cloud discover the phenomena (variables) which the structure actually optimizes.

My own experience with that method is that societies which I studied optimize outcomes which look almost too simplistic in the fancy realm of social sciences, such as the average number of hours worked per person per year, the average amount of human capital per person, measured as years of education before entering the job market, or price index in exports, thus the average price which countries sell their exports at. In general, societies which I studied tend to optimize structural proportions, measurables as coefficients in the lines of ‘amount of thingy one divided by the amount of thingy two’.

Checkpoint for business. Supposing that our research team, at the Andrzej Frycz – Modrzewski Krakow University, comes up with robust empirical results of that type, i.e. when we take a million of random humans and their broadly spoken health, and we assume they are collectively intelligent (I mean, beyond Facebook), then their collectively shared experimental learning of the stuff called ‘life’ makes them optimize health-related behavioural patterns A, B, and C. How can those findings be used in the form of marketable digital technologies? If I know the behavioural patterns someone tries to optimize, I can break those patterns down into small components and figure out a way to utilize the way to influence behaviour. It is a common technique in marketing. If I know someone’s lifestyle, and the values that come with it, I can artfully include into that pattern the technology I am marketing. In this specific case, it could be done ethically and for a good purpose, for a change.  In that context, my mind keeps returning to that barely marked trend of rising mortality in adult males in high-income countries, since 2016 (https://data.worldbank.org/indicator/SP.DYN.AMRT.MA). WTF? We’ll live, we’ll see.

The understanding of how collective human intelligence goes after health could be, therefore, the kind of scientific bacon our university could bring to the table when starting serious consortial projects with business partners, for the development of intelligent digital technologies in healthcare. Let’s move one step forward. As I have been using artificial neural network in my research on what I call, and maybe overstate as collective human intelligence, I have been running those experiments where I take a handful of behavioural patterns, I assign them probabilities of happening (sort of how many folks out of 10 000 will endorse those patterns), and I treat those probabilities as instrumental input in the optimization of pre-defined social outcomes. I was going to forget: I add random disturbance to that form of learning, in the lines of the Black Swan theory (Taleb 2007[1]; Taleb & Blyth 2011[2]).

I nailed down three patterns of collective learning in the presence of randomly happening s**t: recurrent, optimizing, and panic mode. The recurrent pattern of collective learning, which I tentatively expect to be the most powerful, is essentially a cycle with recurrent amplitude of error. We face a challenge, we go astray, we run around like headless chickens for a while, and then we figure s**t out, we progressively settle for solutions, and then the cycle repeats. It is like everlasting learning, without any clear endgame. The optimizing pattern is something I observed when making my collective intelligence optimize something like the headcount of population, or the GDP. There is a clear phase of ‘WTF!’(error in optimization goes haywire), which, passing through a somehow milder ‘WTH?’, ends up in a calm phase of ‘what works?’, with very little residual error.

The panic mode is different from the other two. There is no visible learning in the strict sense of the term, i.e. no visible narrowing down of error in what the network estimates as its desired outcome. On the contrary, that type of network consistently goes into the headless chicken mode, and it is becoming more and more headless with each consecutive hundred of experimental rounds, so to say. It happens when I make my network go after some very specific socio-economic outcomes, like price index in capital goods (i.e. fixed assets) or Total Factor Productivity.

Checkpoint for business, once again. That particular thing, about Black Swans randomly disturbing people in their endorsing of behavioural patterns, what business value does it have in a digital cloud? I suppose there are fields of applied medical sciences, for example epidemiology, or the management of healthcare systems, where it pays to know in advance which aspects of our health-related behaviour are the most prone to deep destabilization in the presence of exogenous stressors (e.g. epidemic, or the president of our country trending on Tik Tok). It could also pay off to know, which collectively pursued outcomes act as stabilizers. If another pandemic breaks out, for example, which social activities and social roles should keep going, at all price, on the one hand, and which ones can be safely shut down, as they will go haywire anyway?

[1] Taleb, N. N. (2007). The black swan: The impact of the highly improbable (Vol. 2). Random house.

[2] Taleb, N. N., & Blyth, M. (2011). The black swan of Cairo: How suppressing volatility makes the world less predictable and more dangerous. Foreign Affairs, 33-39.

# Lost in the topic. It sucks. Exactly what I needed.

I keep working on the collective intelligence of humans – which, inevitably, involves working on my intelligent cooperation with other people – in the context of the COVID-19 pandemic. I am focusing on one particular survival strategy which we, Europeans, developed over centuries (I can’t speak for them differently continental folks): the habit of hanging out in relatively closed social circles of knowingly healthy people.

The social logic is quite simple. If I can observe someone for many weeks and months, in a row, I sort have an eye for them. After some time, I know whom that person hangs out with, I can tell when they look healthy, and, conversely, when they look like s**t, hence suspiciously. If I concentrate my social contacts in a circle made of such people, then, even in the absence of specific testing for pathogens, I increase my own safety, and, as I do so, others increase their safety by hanging out with me. Of course, epidemic risk is still there. Pathogens are sneaky, and Sars-Cov-2 is next level in terms of sneakiness. Still, patient, consistent observation of my social contacts, and just as consistent making of a highly controlled network thereof, is a reasonable way to reduce that risk.

That pattern of closed social circles has abundant historical roots. Back in the day, even as recently as in the first half of the 20th century, European societies were very clearly divided in two distinct social orders: that of closed social circles which required introduction, prior to letting anyone in, on the one hand, and the rest of the society, much less compartmentalised. The incidence of infectious diseases, such as tuberculosis or typhoid, was much lower in the former of those social orders. As far as I know, many developing countries, plagued by high incidence of epidemic outbreaks, display such a social model even today.

As I think of it, the distinction between immediate social environment, and the distant one, common in social sciences, might have its roots in that pattern of living in closed social circles made of people whom we can observe on a regular basis. In textbooks of sociology, one can find that statement that the immediate social environment of a person makes usually 20 ÷ 25 people. That might be a historically induced threshold of mutual observability in a closed social circle.

I remember my impressions during a trip to China, when I was visiting the imperial palace in Beijing, and then several Buddhist temples. Each time, the guide was explaining a lot of architectural solutions in those structures as defences against evil spirits. I perceive Chinese people as normal, in the sense they don’t exactly run around amidst paranoid visions. Those evil spirits must have had a natural counterpart. What kind of evil spirit can you shield against by making people pass, before reaching your room, through many consecutive ante rooms, separated by high doorsteps and multi-layered, silk curtains? I guess it is about the kind of evil spirit we are dealing with now: respiratory infections.

I am focusing on the contemporary application of just those two types of anti-epidemic contrivances, namely that of living in close social circles, and that of staying in buildings structurally adapted to shielding against respiratory infections. Both are strongly related to socio-economic status. Being able to control the structure of your social circle requires social influence, which, in turn, and quite crudely, means having the luxury to wait for people who gladly comply with the rules in force inside the circle. I guess that in terms of frequency, our social relations are mostly work-related. The capacity to wait for the sufficiently safe social interactions, in a work environment, means either a job which I can do remotely, like home office, or a professional position of power, when I can truly choose whom I hang out with. If I want to live in an architectural structure with a lot of anterooms and curtains, to filter people and their pathogens, it means a lot of indoor space used just as a filter, not as habitat in the strict sense. Who pays for that extra space? At the end of the day, sadly enough, I do. The more money I have, the more of that filtering architectural space I can afford.

Generally, epidemic protection is costly, and, when used on a regular basis across society, that protection is likely to exacerbate the secondary outcomes of economic inequalities. By the way, as I think about it, the relative epidemic safety we have been experiencing in Europe, roughly since the 1950ies, could be a major factor of another subjective, collective experience, namely that of economic equality. Recently, in many spots of the social space, voices have been rising and saying that equality is not equal enough. Strangely enough, since 2016, we have a rise in mortality among adult males in high-income countries (https://data.worldbank.org/indicator/SP.DYN.AMRT.MA). Correlated? Maybe.

Anyway, I have an idea. Yes, another one. I have an idea to use science and technology as parents to a whole bunch of technological babies. Science is the father, as it is supposed to give packaged genetic information, and that information is the stream of scientific articles on the topic of epidemic safety. Yes, a scientific article can be equated to a spermatozoid. It is relatively small a parcel of important information. It should travel fast but usually it does not travel fast enough, as there is plenty of external censors who cite moral principles and therefore prevent it from spreading freely. The author thinks it is magnificent, and yet, in reality, it is just a building block of something much bigger: life.

Technology is the mother, and, as it is wisely stated in the Old Testament, you’d better know who your mother is. The specific maternal technology here is Artificial Intelligence. I imagine a motherly AI which absorbs the stream of scientific articles on COVID and related subjects, and, generation after generation, connects those findings to specific technologies for enhanced epidemic safety. It is an artificial neural network which creates and updates semantic maps of innovation. I am trying to give the general idea in the picture below.

An artificial neural network is a sequence of equations, at the end of the day, and that sequence is supposed to optimize a vector of inputs so as to match with an output. The output can be defined a priori, or the network can optimize this one too. All that optimization occurs as the network produces many alternative versions of itself and tests them for fitness. What could be those different versions in this case? I suppose each such version would consist in a logical alignment of the type ‘scientific findings <> assessment of risk <> technology to mitigate risk’.

Example: article describing the way that Sars-Cov-2 dupes the human immune system is associated with the risk generated once a person has been infected, and can be mitigated by proper stimulation of our immune system before the infection (vaccine), or by pharmaceuticals administered after the appearance of symptoms (treatment). Findings reported in the article can: a) advance completely new hypotheses b) corroborate existing hypotheses or c) contradict them. Hypotheses can have a strong or a weak counterpart in existing technologies.

The basic challenge I see for that neural network, hence a major criterion of fitness, is the capacity to process scientific discovery as it keeps streaming. It is a quantitative challenge. I will give you an example, with the scientific repository Science Direct (www.sciencedirect.com ), run by the Elsevier publishing group. I typed the ‘COVID’ keyword, and run a search there. In turns out 28 680 peer-reviewed articles have been published this year, just in the journals that belong to the Elsevier group. It has been 28 680 articles over 313 days since the beginning of the year (I am writing those words on November 10th, 2020), which gives 91,63 articles per day.

On another scientific platform, namely that of the Wiley-Blackwell publishing group (https://onlinelibrary.wiley.com/), 14 677 articles and 47 books have been published on the same topic, i.e. The Virus, which makes 14 677/313 = 46,9 articles per day and a new book every 313/47 = 6,66 days.

Cool. This is only peer-reviewed staff, sort of the House of Lords in science. We have preprints, too. At the bioRχiv platform (https://connect.biorxiv.org/relate/content/181 ), there has been 10 412 preprints of articles on COVID-19, which gives 10 412/313 = 33,3 articles per day.

Science Direct, Wiley-Blackwell, and bioRχiv taken together give 171,8 articles per day. Each article contains an abstract of no more than 150 words. The neural network I am thinking about should have those 150-word abstract as its basic food. Here is the deal. I take like one month of articles, thus 30*171,8*150 = 773 100 words in abstracts. Among those words, there are two groups: common language and medical language. If I connect that set of 773 100 words to a digital dictionary, such as Thesaurus used in Microsoft Word, I can kick out the common words. I stay with medical terminology, and I want to connect it to another database of knowledge, namely that of technologies.

You know what? I need to take on something which I should have been taken on already some time ago, but I was too lazy to do it. I need to learn programming, at least in one language suitable for building neural networks. Python is a good candidate. Back in the day, two years ago, I had a go at Python but, idiot of me, I quit quickly. Well, maybe I wasn’t as much of an idiot as I thought? Maybe having done, over the last two years, the walkabout of logical structures which I want to program has been a necessary prelude to learning how to program them? This is that weird thing about languages, programming or spoken. You never know exactly what you want to phrase out until you learn the lingo to phrase it out.

Now, I know that I need programming skills. However strong I cling to Excel, it is too slow and too clumsy for really serious work with data. Good. Time to go. If I want to learn Python, I need an interpreter, i.e. a piece of software which allows me to write an algorithm, test it for coherence, and run it. In Python, that interpreter is commonly called ‘Shell’, and the mothership of Python, https://www.python.org/ , runs a shell at https://www.python.org/shell/ . There are others, mind you: https://www.programiz.com/python-programming/online-compiler/ , https://repl.it/languages/python3 , or https://www.onlinegdb.com/online_python_interpreter .

I am breaking down my research with neural networks into partial functions, which, as it turns out, sum up my theoretical assumptions as regards the connection between artificial intelligence and the collective intelligence of human societies. First things first, perception. I use two types of neural networks, one with real data taken from external databases and standardized over respective maxima for individual variables, another one with probabilities assigned to arbitrarily defined phenomena. The first lesson I need to take – or rather retake – in Python is about the structures of data this language uses.

The simplest data structure in Python is a list, i.a. a sequence of items, separated with commas, and placed inside square brackets, e.g. my_list = [1, 2, 3]. My intuitive association with lists is that of categorization. In the logical structures I use, a list specifies phenomenological categories: variables, aggregates (e.g. countries), periods of time etc. In this sense, I mostly use fixed, pre-determined lists. Either I make the list of categories by myself, or I take an existing database and I want to extract headers from it, as category labels. Here comes another data structure in Python: a tuple. A tuple is a collection of data which is essentially external to the algorithm at hand, immutable, and it can be unpacked or indexed. As I understand, and I hope I understand it correctly, any kind of external raw data I use is a tuple.

Somewhere between a tuple (collection of whatever) and a list (collection of categories), Python distinguishes sets, i.e. unordered collections with no duplicate elements. When I transform a tuple or a list into a set, Python kicks out redundant components.

Wrapping it partially up, I can build two types of perception in Python. Firstly, I can try and extract data from a pre-existing database, grouping it into categories, and then making the algorithm read observations inside each category. For now, the fastest way I found to create and use databases in Python is the sqlite3 module (https://www.tutorialspoint.com/sqlite/sqlite_python.htm ). I need to work on it.

I can see something like a path of learning. I mean, I feel lost in the topic. I feel it sucks. I love it. Exactly the kind of intellectual challenge I needed.

# When a best-friend’s-brother-in-law’s-cousin has a specific technology to market

I am connecting two strands of my work with artificial neural networks as a tool for simulating collective intelligence. One of them consists in studying orientations and values in human societies by testing different socio-economic variables as outcomes of a neural network and checking which of them makes that network the most similar to the original dataset. The second strand consists in taking any variable as the desired output of the network, setting an initially random vector of local probabilities as input, adding a random disturbance factor, and seeing how the network is learning in those conditions.

So far, I have three recurrent observations from my experiments with those two types of neural networks. Firstly, in any collection of real, empirical, socio-economic variables, there are 1 – 2 of them which, when pegged as the desired outcome of the neural network, produce a clone of actual empirical reality and that clone is remarkably closer to said reality than any other version of the same network, with other variables as its output. In other words, social reality represented with aggregate variables, such as average number of hours worked per person per year, or energy consumption per person per year, is an oriented reality. It is more like a crystal than like a snowball.

Secondly, in the presence of a randomly occurring disturbance, neural networks can learn in three essential ways, clearly distinct from each other. They can be nice and dutiful, and narrow down their residual error of estimation, down to a negligible level. Those networks just nail it down. The second pattern is that of cyclical learning. The network narrows down its residual error, and then, when I think all is said and done, whoosh!: the error starts swinging again, with a broadening amplitude, and then it decreases again, and the cycle repeats, over and over again. Finally, a neural network prodded with a random disturbance can go haywire. The chart of its residual error looks like the cardiac rhythm of a person who takes on an increasing effort: its swings in an ever-broadening amplitude. This is growing chaos. The funny thing, and the connection to my first finding (you know, that about orientations) is that the way a network learns depends on the real socio-economic variable I set as its desired outcome. My network nails it down, like a pro, when it is supposed to optimize something related to absolute size of a society: population, GDP, capital stock. Cyclical learning occurs when I make my network optimize something like a structural proportion: average number of hours worked per person per year, density of population per 1 km2 etc. Just a few variables put my network in the panic mode, i.e. the one with increasing amplitude of error. Price index in capital goods is one, Total Factor Productivity is another one. Interestingly, price index in consumer goods doesn’t create much of a panic in my network.

There is a connection between those two big observations. The socio-economic variables with come out as the most likely orientations of human societies are those, which seem to be optimized in that cyclical, sort of circular learning, neither with visible growth in precision, nor with visible panic mode. Our human societies seem to orient themselves on those structural proportions, which they learn and relearn over and over again.

The third big observation I made is that each kind of learning, i.e. whichever of the three signalled above, makes my neural network loosen its internal coherence. I measure that coherence with the local Euclidean distance between variables: j = (1, 2,…, k)[(xi – xj)2]0,5 / k. That distance tends to swing cyclically, as if the network needed to loosen its internal connections in order to absorb a parcel of chaos, and then it tightens back, when chaos is being transformed into order.

I am connecting those essential outcomes of me meddling with artificial neural networks to the research interests I developed earlier this year: the research on cities and their role in our civilisation. One more time, I am bringing that strange thought which came to my mind as I was cycling through the empty streets of my hometown, Krakow, Poland, in the first days of the epidemic lockdown, in March 2020: ‘This city looks dead without people in the streets. I have never seen it as dead as now, even in the times of communism, back in the 1970s. I just wonder, how many human footsteps a day this city needs in order to be truly alive?’. After I had that thought, I started digging and I found quite interesting facts about cities and urban space. Yet, another strand of thinking was growing in my head, the one about the impact of sudden, catastrophic events, such as epidemic outbreaks, on our civilisation. I kept thinking about Black Swans.

I have been reading some history, I have been rummaging in empirical data, I have been experimenting with neural networks, and I have progressively outlined an essential hypothesis, to dig even further into: our social structures absorb shocks, and we do it artfully. Collectively, we don’t just receive s**t from Mother Nature: we absorb it, i.e. we learn how to deal with it. As a matter of fact, we have an amazing capacity to absorb shocks and to create the impression, on the long run, that nothing bad really happened, and that we just keep progressing gloriously. If we think about all the most interesting s**t in our culture, it all comes from one place: shock, suffering, and the need to get over it.

In 2014, I visited an exposition of Roman art (in Barcelona, in the local Museum of Catalonia). Please, do not confuse Roman with Ancient Roman. Roman art is the early medieval one, roughly until and through the 12th century (historians might disagree with me as regards this periodization, but c’mon guys, this is a blog, I can say crazy things here). Roman art covers everything that happened between the collapse of the Western Roman Empire and the first big outbreak of plague in Europe, sort of. And so I walk along the aisles, in that exposition of Roman art, and I see replicas of frescoes, originally located in Roman churches across Europe. All of them sport Jesus Christ, and in all of them Jesus looks like an archetypical Scottish sailor: big, bulky, with a plump, smiling face, curly hair, short beard, and happy as f**k. On all those frescoes Jesus in happy. Can you imagine The Last Supper where Jesus dances on the table, visibly having the time of his life? Well, it is there, on the wall of a small church in Germany.

I will put it in perspective. If you look across the Christian iconography today, Jesus is, recurrently, that emaciated guy, essentially mangled by life, hanging sadly from his cross, and apostles are just the same way (no cross, however), and there is all that memento mori stuff sort of hanging around, in the air. Still, this comes from the times after the first big outbreak of plague in Europe. Earlier on, on the same European continent, for roughly 800 years between the fall of the Western Roman Empire and the first big epidemic hit, Jesus and all his iconography had been in the lines of Popeye The Sailor, completely different from what we intuitively associate Christianism with today.

It is to keep in mind that epidemic diseases have always been around. Traditions such as shaking hands to express trust and familiarity, or spitting in those hands before shaking them to close a business deal, it all comes from those times when any stranger, i.e. someone coming from further than 50 miles away, was, technically, an epidemic threat. For hundreds of years, we had sort of been accepting those pathogens at face value, as the necessary s**t which takes nothing off our joy of life, and then ‘Bang!’, 1347 comes, and we really see how hard an epidemic can hit when that pathogen really means business, and our culture changes deeply.

That’s the truly fundamental question which I want to dig into and discuss: can I at all, and, if so, how can I mathematically model the way our civilisation learns, as a collectively intelligent structure, through and from the experience of COVID-19 pandemic?

Collectively intelligent structures, such as I see them, learn by producing many alternative versions of themselves – each of those versions being like one-mutation neighbour to others –   and then testing each such version as for its fitness to optimize a vector of desired outcomes. I wonder how it can happen now, in this specific situation we are in, i.e. the pandemic? How can a society produce alternative versions of itself? We test various versions of epidemic restrictions. We test various ways of organizing healthcare. We probably, semi-consciously test various patterns of daily social interactions, on the top of official regulations on social mobility. How many such mutations can we observe? What is our desired outcome?

I start from the end. My experiments with neural networks applied as simulators of collective human intelligence suggest that we optimize, most of all, structural proportions of our socio-economic system. The average number of hours worked per person per year, and the amount of human capital accumulated in an average person, in terms of schooling years, come to the fore, by far. Energy consumption per person per year is another important metric.

Why labour? Because labour, at the end of the day, is social interaction combined with expenditure of energy, which, in turn, we have from our food base. Optimizing the amount of work per person, together with the amount of education we need in order to perform that work, is a complex adaptive mechanism, where social structures arrange themselves so as their members find some kind of balance with the grub they can grab from environment. Stands to reason.

Now, one more thing as for the transformative impact of COVID-19 on our civilization. I am participating in a call for R&D tenders, with the Polish government, more specifically with the National Centre for Research and Development (https://www.ncbr.gov.pl/en/ ). They have announced a special edition of the so-called Fast Track call, titled ‘Fast Track – Coronaviruses’. First of all, please pay attention to the plural form of coronaviruses. Second of all, that specific track of R&D goes as broadly as calling for architectural designs supposed to protect against contagion. Yes, if that call is not a total fake (which happens sometimes, when a best-friend’s-brother-in-law’s-cousin has a specific technology to market, for taxpayers’ money), the Polish government has data indicating that pandemic is going to be the new normal.