We haven’t nailed down all our equations yet

As I keep digging into the topic of collective intelligence, and my research thereon with the use of artificial neural networks, I am making a list of key empirical findings that pave my way down this particular rabbit hole. I am reinterpreting them with the new understandings I have from translating my mathematical model of artificial neural network into an algorithm. I am learning to program in Python, which comes sort of handy given I want to use AI. How could I have made and used artificial neural networks without programming, just using Excel? You see, that’s Laplace and his hypothesis that mathematics represent the structure of reality (https://discoversocialsciences.com/wp-content/uploads/2020/10/Laplace-A-Philosophical-Essay-on-Probabilities.pdf ).

An artificial neural network is a sequence of equations which interact, in a loop, with a domain of data. Just as any of us, humans, essentially. We just haven’t nailed down all of our own equations yet. What I can do and have done with Excel was to understand the structure of those equations and their order. This is a logical structure, and as long as I don’t give it any domain of data to feed on, is stays put.

When I feed data into that structure, it starts working. Now, with any set of empirical socio-economic variables I have worked with, so far, there is always 1 – 2 among them which are different from others as output. Generally, my neural network works differently according to the output variable I make it optimize. Yes, it is the output variable, supposedly being the desired outcome to optimize, and not the input variables treated as instrumental in that view, which makes the greatest difference in the results produced by the network.

That seems counterintuitive, and yet this is like the most fundamental common denominator of everything I have found out so far: the way that a simple neural network simulates the collective intelligence of human societies seems to be conditioned most of all by the variables pre-set as the output of the adaptation process, not by the input ones. Is it a sensible conclusion regarding collective intelligence in real life, or is it just a property of the data? In other words, is it social science or data science? This is precisely one of the questions which I want to answer by learning programming.

If it is a pattern of collective human intelligence, that would mean we are driven by the orientations pursued much more than by the actual perception of reality. What we are after would be more important a differentiating factor of your actions than what we perceive and experience as reality. Strangely congruent with the Interface Theory of Perception (Hoffman et al. 2015[1], Fields et al. 2018[2]). 

As it is some kind of habit in me, in the second part of this update I put the account of my learning how to program and to Data Science in Python. This time, I wanted to work with hard cases of CSV import, like trouble files. I want to practice data cleansing. I have downloaded the ‘World Economic Outlook October 2020’ database from the website https://www.imf.org/en/Publications/WEO/weo-database/2020/October/download-entire-database . Already when downloading, I could notice that the announced format is ‘TAB delimited’, not ‘Comma Separated’. It downloads as Excel.

To start with, I used the https://anyconv.com/tab-to-csv-converter/ website to do the conversion. In parallel, I tested two other ways:

  1. opening in Excel, and then saving as CSV
  2. opening with Excel, converting to *.TXT, importing into Wizard for MacOS (statistical package), and then exporting as CSV.

What I can see like right off the bat are different sizes in the same data, technically saved in the same format. The AnyConv-generated CSV is 12,3 MB, the one converted through Excel is 9,6 MB, and the last one, filtered through Excel to TXT, then to Wizard and to CSV makes 10,1 MB. Intriguing.

I open JupyterLab online, and I create a Python 3-based Notebook titled ‘Practice 27_11_2020_part2’.

I prepare the Notebook by importing Numpy, Pandas, Matplotlib and OS. I do:

>> import numpy as np

      import pandas as pd

      import matplotlib.pyplot as plt

      import os

I upload the AnyConv version of the CSV. I make sure to have the name of the file right by doing:

>> os.listdir()


…and I do:

>> WEO1=pd.DataFrame(pd.read_csv(‘AnyConv__WEOOct2020all.csv’))

Result:

/srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3072: DtypeWarning: Columns (83,85,87,89,91,93,95,98,99,102,103,106,107,110,111,114,115,118,119,122,123,126,127,130,131,134,135,138,139,142,143,146,147,150,151,154,155,158) have mixed types. Specify dtype option on import or set low_memory=False.

  interactivity=interactivity, compiler=compiler, result=result)

As I have been told, I add the “low_memory=False” option to the command, and I retype:

>> WEO1=pd.DataFrame(pd.read_csv(‘AnyConv__WEOOct2020all.csv’, low_memory=False))

Result: the file is apparently imported successfully. I investigate the structure.

>> WEO1.describe()

Result: I know I have 8 rows (there should be much more, over 200), and 32 columns. Something is wrong.

I upload the Excel-converted CSV.

>> WEO2=pd.DataFrame(pd.read_csv(‘WEOOct2020all_Excel.csv’))

Result: Parser error

I retry, with parameter sep=‘;’ (usually works with Excel)

>> WEO2=pd.DataFrame(pd.read_csv(‘WEOOct2020all_Excel.csv’,sep=’;’))

Result: import successful. Let’s check the shape of the data

>> WEO2.describe()

Result: Pandas can see just the last column. I make sure.

>> WEO2.columns

Result:

Index([‘WEO Country Code’, ‘ISO’, ‘WEO Subject Code’, ‘Country’,

       ‘Subject Descriptor’, ‘Subject Notes’, ‘Units’, ‘Scale’,

       ‘Country/Series-specific Notes’, ‘1980’, ‘1981’, ‘1982’, ‘1983’, ‘1984’,

       ‘1985’, ‘1986’, ‘1987’, ‘1988’, ‘1989’, ‘1990’, ‘1991’, ‘1992’, ‘1993’,

       ‘1994’, ‘1995’, ‘1996’, ‘1997’, ‘1998’, ‘1999’, ‘2000’, ‘2001’, ‘2002’,

       ‘2003’, ‘2004’, ‘2005’, ‘2006’, ‘2007’, ‘2008’, ‘2009’, ‘2010’, ‘2011’,

       ‘2012’, ‘2013’, ‘2014’, ‘2015’, ‘2016’, ‘2017’, ‘2018’, ‘2019’, ‘2020’,

       ‘2021’, ‘2022’, ‘2023’, ‘2024’, ‘2025’, ‘Estimates Start After’],

      dtype=’object’)

I will try to import the same file with a different ‘sep’ parameter, this time as sep=‘\t’

>> WEO3=pd.DataFrame(pd.read_csv(‘WEOOct2020all_Excel.csv’,sep=’\t’))

Result: import apparently successful. I check the shape of my data.

>> WEO3.describe()

Result: apparently, this time, no column is distinguished.

When I type:

>> WEO3.columns

…I get

Index([‘WEO Country Code;ISO;WEO Subject Code;Country;Subject Descriptor;Subject Notes;Units;Scale;Country/Series-specific Notes;1980;1981;1982;1983;1984;1985;1986;1987;1988;1989;1990;1991;1992;1993;1994;1995;1996;1997;1998;1999;2000;2001;2002;2003;2004;2005;2006;2007;2008;2009;2010;2011;2012;2013;2014;2015;2016;2017;2018;2019;2020;2021;2022;2023;2024;2025;Estimates Start After’], dtype=’object’)

Now, I test with the 3rd file, the one converted through Wizard.

>> WEO4=pd.DataFrame(pd.read_csv(‘WEOOct2020all_Wizard.csv’))

Result: import successful.

I check the shape.

>> WEO4.describe()

Result: still just 8 rows. Something is wrong.

I do another experiment. I take the original*.XLS from imf.org, and I save it as regular Excel *.XLSX, and then I save this one as CSV.

>> WEO5=pd.DataFrame(pd.read_csv(‘WEOOct2020all_XLSX.csv’))

Result: parser error

I will retry with two options as for the separator: sep=‘;’ and sep=‘\t’. Ledzeee…

>> WEO5=pd.DataFrame(pd.read_csv(‘WEOOct2020all_XLSX.csv’,sep=’;’))

Import successful. “WEO5.describe()” yields just one column.

>> WEO6=pd.DataFrame(pd.read_csv(‘WEOOct2020all_XLSX.csv’,sep=’\t’))

yields successful import, yet all the data is just one long row, without separation into columns.

I check WEO5 and WEO6 with “*.index”, and “*.shape”. 

“WEO5.index” yields “RangeIndex(start=0, stop=8777, step=1)”

“WEO6.index” yields “RangeIndex(start=0, stop=8777, step=1)

“WEO5.shape” gives “(8777, 56)”

“WEO6.shape” gives “(8777, 1)”

Depending on the separator given as parameter in the “pd.read_csv” command, I get 56 columns or just 1 column, yet the “*.describe()” command cannot make sense of them.

I try the *.describe” command, thus more specific than the “*.describe()” one.

I can see that structures are clearly different.

I try another trick, namely to assume separator ‘;’ and TAB delimiter.

>> WEO7=pd.DataFrame(pd.read_csv(‘WEOOct2020all_XLSX.csv’,sep=’;’,delimiter=’\t’))

Result: WEO7.shape yields 8777 rows in just one column.

Maybe ‘header=0’? Same thing.

The provisional moral of the fairy tale is that ‘Data cleansing’ means very largely making sense of the exact shape and syntax of CSV files. Depending on the parametrisation of separators and delimiters, different Data Frames are obtained.


[1] Hoffman, D. D., Singh, M., & Prakash, C. (2015). The interface theory of perception. Psychonomic bulletin & review, 22(6), 1480-1506.

[2] Fields, C., Hoffman, D. D., Prakash, C., & Singh, M. (2018). Conscious agent networks: Formal analysis and application to cognition. Cognitive Systems Research, 47, 186-213. https://doi.org/10.1016/j.cogsys.2017.10.003

Leave a Reply