The point of doing manually what the loop is supposed to do

My editorial on You Tube

OK, here is the big picture. The highest demographic growth, in absolute numbers, takes place in Asia and Africa. The biggest migratory flows start from there, as well, and aim at and into regions with much less of human mass in accrual: North America and Europe. Less human accrual, indeed, and yet much better conditions for each new homo sapiens. In some places on the planet, a huge amount of humans is born every year. That huge amount means a huge number of genetic variations around the same genetic tune, namely that of the homo sapiens. Those genetic variations leave their homeland, for a new and better homeland, where they bring their genes into a new social environment, which assures them much more safety, and higher odds of prolonging their genetic line.

What is the point of there being more specimens of any species? I mean, is there a logic to increasing the headcount of any population? When I say ‘any’, is ranges from bacteria to us, humans. After having meddled with the most basic algorithm of a neural network (see « Pardon my French, but the thing is really intelligent » and « Ce petit train-train des petits signaux locaux d’inquiétude »), I have some thoughts about what intelligence is. I think that intelligence is a class, i.e. it is a framework structure able to produce many local, alternative instances of itself.

Being intelligent consists, to start with, in creating alternative versions of itself, and creating them purposefully imperfect so as to generate small local errors, whilst using those errors to create still different versions of itself. The process is tricky. There is some sort of fundamental coherence required between the way of creating those alternative instances of oneself, and the way that resulting errors are being processed. Fault of such coherence, the allegedly intelligent structure can fall into purposeful ignorance, or into panic.

Purposeful ignorance manifests as the incapacity to signal and process the local imperfections in alternative instances of the intelligent structure, although those imperfections actually stand out and wave at you. This is the ‘everything is just fine and there is no way it could be another way’ behavioural pattern. It happens, for example, when the function of processing local errors is too gross – or not sharp enough, if you want – to actually extract meaning from tiny, still observable local errors. The panic mode of an intelligent structure, on the other hand, is that situation when the error-processing function is too sharp for the actually observable errors. Them errors just knock it out of balance, like completely, and the function signals general ‘Error’, or ‘I can’t stand this cognitive dissonance’.

So, what is the point of there being more specimens of any species? The point might be to generate as many specific instances of an intelligent structure – the specific DNA – as possible, so as to generate purposeful (and still largely unpredictable) errors, just to feed those errors into the future instantiations of that structure. In the process of breeding, some path of evolutionary coherence leads to errors that can be handled, and that path unfolds between a state of evolutionary ‘everything is OK, no need to change anything’ (case mosquito, unchanged for millions of years), and a state of evolutionary ‘what the f**k!?’ (case common fruit fly, which produces insane amount of mutations in response to the slightest environmental stressor).

Essentially, all life could be a framework structure, which, back in the day, made a piece of software in artificial intelligence – the genetic code – and ever since that piece of software has been working on minimizing the MSE (mean square error) in predicting the next best version of life, and it has been working by breeding, in a tree-like method of generating variations,  indefinitely many instances of the framework structure of life. Question: what happens when, one day, a perfect form of life emerges? Something like TRex – Megalodon – Angelina Jolie – Albert Einstein – Jeff Bezos – [put whatever or whoever you like in the rest of that string]? On the grounds of what I have already learnt about artificial intelligence, such a state of perfection would mean the end of experimentation, thus the end of multiplying instances of the intelligent structure, thus the end of births and deaths, thus the end of life.

Question: if the above is even remotely true, does that overarching structure of life understand how the software it made – the genetic code – works? Not necessarily. That very basic algorithm of neural network, which I have experimented with a bit, produces local instances of the sigmoid function Ω = 1/(1 + e-x) such that Ω < 1, and that 1 + e-x > 1, which is always true. Still, the thing does it just sometimes. Why? How? Go figure. That thing accomplishes an apparently absurd task, and it does so just by being sufficiently flexible with its random coefficients. If Life In General is God, that God might not have a clue about how the actual life works. God just needs to know how to write an algorithm for making actual life work. I would even say more: if God is any good at being one, he would write an algorithm smarter than himself, just to make things advance.

The hypothesis of life being one, big, intelligent structure gives an interesting insight into what the cost of experimentation is. Each instance of life, i.e. each specimen of each species needs energy to sustain it. That energy takes many forms: light, warmth, food, Lexus (a form of matter), parties, Armani (another peculiar form of matter) etc. The more instances of life are there, the more energy they need to be there. Even if we take the Armani particle out of the equation, life is still bloody energy-consuming. The available amount of energy puts a limit to the number of experimental instances of the framework, structural life that the platform (Earth) can handle.

Here comes another one about climate change. Climate change means warmer, let’s be honest. Warmer means more energy on the planet. Yes, temperature is our human measurement scale for the aggregate kinetic energy of vibrating particles. More energy is what we need to have more instances of framework life, in the same time. Logically, incremental change in total energy on the planet translates into incremental change in the capacity of framework life to experiment with itself. Still, as framework life could be just the God who made that software for artificial intelligence (yes, I am still in the same metaphor), said framework life could not be quite aware of how bumpy could the road be, towards the desired minimum in the Mean Square Error. If God is an IT engineer, it could very well be the case.

I had that conversation with my son, who is graduating his IT engineering studies. I told him ‘See, I took that algorithm of neural network, and I just wrote its iterations out into separate tables of values in Excel, just to see what it does, like iteration after iteration. Interesting, isn’t it? I bet you have done such thing many times, eh?’. I still remember that heavy look in my son’s eyes: ‘Why the hell should I ever do that?’ he went. ‘There is a logical loop in that algorithm, you see? This loop is supposed to do the job, I mean to iterate until it comes up with something really useful. What is the point of doing manually what the loop is supposed to do for you? It is like hiring a gardener and then doing everything in the garden by yourself, just to see how it goes. It doesn’t make sense!’. ‘But it’s interesting to observe, isn’t it?’ I went, and then I realized I am talking to an alien form of intelligence, there.

Anyway, if God is a framework life who created some software to learn in itself, it could not be quite aware of the tiny little difficulties in the unfolding of the Big Plan. I mean acidification of oceans, hurricanes and stuff. The framework life could say: ‘Who cares? I want more learning in my algorithm, and it needs more energy to loop on itself, and so it makes those instances of me, pumping more carbon into the atmosphere, so as to have more energy to sustain more instances of me. Stands to reason, man. It is all working smoothly. I don’t understand what you are moaning about’.

Whatever that godly framework life says, I am still interested in studying particular instances of what happens. One of them is my business concept of EneFin. See « Which salesman am I? » as what I think is the last case of me being like fully verbal about it. Long story short, the idea consists in crowdfunding capital for small, local operators of power systems based on renewable energies, by selling shares in equity, or units of corporate debt, in bundles with tradable claims on either the present output of energy, or the future one. In simple terms, you buy from that supplier of energy tradable claims on, for example, 2 000 kWh, and you pay the regular market price, still, in that price, you buy energy properly spoken with a juicy discount. The rest of the actual amount of money you have paid buys you shares in your supplier’s equity.

The idea in that simplest form is largely based on two simple observations about energy bills we pay. In most countries (at least in Europe), our energy bills are made of two components: the (slightly) variable value of the energy actually supplied, and a fixed part labelled sometimes as ‘maintenance of the grid’ or similar. Besides, small users (e.g. households) usually pay a much higher unitary price per kWh than large, institutional scale buyers (factories, office buildings etc.). In my EneFin concept, a local supplier of renewable energy makes a deal with its local customers to sell them electricity at a fair, market price, with participations in equity on the top of electricity.

That would be a classical crowdfunding scheme, such as you can find with, StartEngine, for example. I want to give it some additional, financial spin. Classical crowdfunding has a weakness: low liquidity. The participatory shares you buy via crowdfunding are usually non-tradable, and they create a quasi-cooperative bond between investors and investees. Where I come from, i.e. in Central Europe, we are quite familiar with cooperatives. At the first sight, they look like a form of institutional heaven, compared to those big, ugly, capitalistic corporations. Still, after you have waved out that first mist, cooperatives turn out to be very exposed to embezzlement, and to abuse of managerial power. Besides, they are quite weak when competing for capital against corporate structures. I want to create highly liquid a transactional platform, with those investments being as tradable as possible, and use financial liquidity as a both a shield against managerial excesses, and a competitive edge for those small ventures.

My idea is to assure liquidity via a FinTech solution similar to that used by Katipult Technology Corp., i.e. to create some kind of virtual currency (note: virtual currency is not absolutely the same as cryptocurrency; cousins, but not twins, so to say). Units of currency would correspond to those complex contracts « energy plus equity ». First, you create an account with EneFin, i.e. you buy a certain amount of the virtual currency used inside the EneFin platform. I call them ‘tokens’ to simplify. Next, you pick your complex contracts, in the basket of those offered by local providers of energy. You buy those contracts with the tokens you have already acquired. Now, you change your mind. You want to withdraw your capital from the supplier A, and move it to supplier H, you haven’t considered so far. You move your tokens from A to H, even with a mobile app. It means that the transactional platform – the EneFin one – buys from you the corresponding amount of equity of A and tries to find for you some available equity in H. You can also move your tokens completely out of investment in those suppliers of energy. You can free your money, so to say. Just as simple: you just move them out, even with a movement of your thumb on the screen. The EneFin platform buys from you the shares you have moved out of.

You have an even different idea. Instead of investing your tokens into the equity of a provider of energy, you want to lend them. You move your tokens to the field ‘lending’, you study the interest rates offered on the transactional platform, and you close the deal. Now, the corresponding number of tokens represents securitized (thus tradable) corporate debt.

Question: why the hell bothering about a virtual currency, possibly a cryptocurrency, instead of just using good old fiat money? At this point, I am reaching to the very roots of the Bitcoin, the grandpa of all cryptocurrencies (or so they say). Question: what amount of money you need to finance 20 transactions of equal unitary value P? Answer: it depends on how frequently you monetize them. Imagine that the EneFin app offers you an option like ‘Monetize vs. Don’t Monetize’. As long as – with each transaction you do on the platform – you stick to the ‘Don’t Monetize’ option, your transactions remain recorded inside the transactional platform, and so there is recorded movement in tokens, but there is no monetary outcome, i.e. your strictly spoken monetary balance, for example that in €, does not change. It is only when you hit the ‘Monetize’ button in the app that the current bottom line of your transactions inside the platform is being converted into « official » money.

The virtual currency in the EneFin scheme would serve to allow a high level of liquidity (more transactions in a unit of time), without provoking the exactly corresponding demand for money. What connection with artificial intelligence? I want to study the possible absorption of such a scheme in the market of energy, and in the related financial markets, as a manifestation of collective intelligence. I imagine two intelligent framework structures: one incumbent (the existing markets) and one emerging (the EneFin platform). Both are intelligent structures to the extent that they technically can produce many alternative instances of themselves, and thus intelligently adapt to their environment by testing those instances and utilising the recorded local errors.

In terms of an algorithm of neural network, that intelligent adaptation can be manifest, for example, as an optimization in two coefficients: the share of energy traded via EneFin in the total energy supplied in the given market, and the capitalization of EneFin as a share in the total capitalization of the corresponding financial markets. Those two coefficients can be equated to weights in a classical MLP (Multilayer Perceptron) network, and the perceptron network could work around them. Of course, the issue can be approached from a classical methodological angle, as a general equilibrium to assess via « normal » econometric modelling. Still, what I want is precisely what I hinted in « Pardon my French, but the thing is really intelligent » and « Ce petit train-train des petits signaux locaux d’inquiétude »: I want to study the very process of adaptation and modification in those intelligent framework structures. I want to know, for example, how much experimentation those structures need to form something really workable, i.e. an EneFin platform with serious business going on, and, in the same time, that business contributing to the development of renewable energies in the given region of the world. Do those framework structures have enough local resources – mostly capital – for sustaining the number of alternative instances needed for effective learning? What kind of factors can block learning, i.e. drive the framework structure either into deliberate an ignorance of local errors or into panic?

Here is an example of more exact a theoretical issue. In a typical economic model, things are connected. When I pull on the string ‘capital invested in fixed assets’, I can see a valve open, with ‘Lifecycle in incumbent technologies’, and some steam rushes out. When I push the ‘investment in new production capacity’ button, I can see something happening in the ‘Jobs and employment’ department. In other words, variables present in economic systems mutually constrain each other. Just some combinations work, others just don’t. Now, the thing I have already discovered about them Multilayer Perceptrons is that as soon as I add some constraint on the weights assigned to input data, for example when I swap ‘random’ for ‘erandom’, the scope of possible structural functions leading to effective learning dramatically shrinks, and the likelihood of my neural network falling into deliberate ignorance or into panic just swells like hell. What degree of constraint on those economic variables is tolerable in the economic system conceived as a neural network, thus as a framework intelligent structure?

There are some general guidelines I can see for building a neural network that simulates those things. Creating local power systems, based on microgrids connected to one or more local sources of renewable energies, can be greatly enhanced with efficient financing schemes. The publicly disclosed financial results of companies operating in those segments – such as Tesla[1], Vivint Solar[2], FirstSolar[3], or 8Point3 Energy Partners[4] – suggest that business models in that domain are only emerging, and are far from being battle-tested. There is still a way to pave towards well-rounded business practices as regards such local power systems, both profitable economically and sustainable socially.

The basic assumptions of a neural network in that field are essentially behavioural. Firstly, consumption of energy is greatly predictable at the level of individual users. The size of a market in energy changes, as the number of users change. The output of energy needed to satisfy those users’ needs, and the corresponding capacity to install, are largely predictable on the long run. Consumers of energy use a basket of energy-consuming technologies. The structure of this basket determines their overall consumption, and is determined, in turn, by long-run social behaviour. Changes over time in that behaviour can be represented as a social game, where consecutive moves consist in purchasing, or disposing of a given technology. Thus, a game-like process of relatively slow social change generates a relatively predictable output of energy, and a demand thereof. Secondly, the behaviour of investors in any financial market, crowdfunding or other, is comparatively more volatile. Investment decisions are being taken, and modified at a much faster pace than decisions about the basket of technologies used in everyday life.

The financing of relatively small, local power systems, based on renewable energies and connected by local microgrids, implies an interplay of the two above-mentioned patterns, namely the relatively slower transformation in the technological base, and the quicker, more volatile modification of investors’ behaviour in financial markets.

I am consistently delivering good, almost new science to my readers, and love doing it, and I am working on crowdfunding this activity of mine. As we talk business plans, I remind you that you can download, from the library of my blog, the business plan I prepared for my semi-scientific project Befund  (and you can access the French version as well). You can also get a free e-copy of my book ‘Capitalism and Political Power’ You can support my research by donating directly, any amount you consider appropriate, to my PayPal account. You can also consider going to my Patreon page and become my patron. If you decide so, I will be grateful for suggesting me two things that Patreon suggests me to suggest you. Firstly, what kind of reward would you expect in exchange of supporting me? Secondly, what kind of phases would you like to see in the development of my research, and of the corresponding educational tools?

[1] http://ir.tesla.com/ last access December, 18th, 2018

[2] https://investors.vivintsolar.com/company/investors/investors-overview/default.aspx last access December, 18th, 2018

[3] http://investor.firstsolar.com/ last access December, 18th, 2018

[4] http://ir.8point3energypartners.com/ last access December, 18th, 2018

Pardon my French, but the thing is really intelligent

My editorial on You Tube

And so I am meddling with neural networks. It had to come. It just had to. I started with me having many ideas to develop at once. Routine stuff with me. Then, the Editor-in-Chief of the ‘Energy Economics’ journal returned my manuscript of article on the energy-efficiency of national economies, which I had submitted with them, with a general remark that I should work both on the clarity of my hypotheses, and on the scientific spin of my empirical research. In short, Mr Wasniewski, linear models tested with Ordinary Least Squares is a bit oldie, if you catch my drift. Bloody right, Mr Editor-In-Chief. Basically, I agree with your remarks. I need to move out of my cavern, towards the light of progress, and get acquainted with the latest fashion. The latest fashion we are wearing this season is artificial intelligence, machine learning, and neural networks.

It comes handy, to the extent that I obsessively meddle with the issue of collective intelligence, and am dreaming about creating a model of human social structure acting as collective intelligence, sort of a beehive. Whilst the casting for a queen in that hive remains open, and is likely to stay this way for a while, I am digging into the very basics of neural networks. I am looking in the Python department, as I have already got a bit familiar with that environment. I found an article by James Loy, entitled “How to build your own Neural Network from scratch in Python”. The article looks a bit like sourcing from another one, available at the website of ProBytes Software, thus I use both to develop my understanding. I pasted the whole algorithm by James Loy into my Python Shell, made in run with an ‘enter’, and I am waiting for what it is going to produce. In the meantime, I am being verbal about my understanding.

The author declares he wants to do more or less the same thing that I, namely to understand neural networks. He constructs a simple algorithm for a neural network. It starts with defining the neural network as a class, i.e. as a callable object that acts as a factory for new instances of itself. In the neural network defined as a class, that algorithm starts with calling the constructor function ‘_init_’, which constructs an instance ‘self’ of that class. It goes like ‘def __init__(self, x, y):’. In other words, the class ‘Neural network’ generates instances ‘self’ of itself, and each instance is essentially made of two variables: input x, and output y. The ‘x’ is declared as input variable through the ‘self.input = x’ expression. Then, the output of the network is defined in two steps. Yes, the ‘y’ is generally the output, only in a neural network, we want the network to predict a value of ‘y’, thus some kind of y^. What we have to do is to define ‘self.y = y’, feed the real x-s and the real y-s into the network, and expect the latter to turn out some y^-s.

Logically, we need to prepare a vessel for holding the y^-s. The vessel is defined as ‘self.output = np.zeros(y.shape)’. The ‘shape’ function defines a tuple – a table, for those mildly fond of maths – with given dimensions. What are the dimensions of ‘y’ in that ‘y.shape’? They have been given earlier, as the weights of the network were being defined. It goes as follows. It starts, thus, right after the ‘self.input = x’ has been said, ‘self.weights1 = np.random.rand(self.input.shape[1],4)’ fires off, closely followed by ‘self.weights2 =  np.random.rand(4,1)’. All in all, the entire class of ‘Neural network’ is defined in the following form:

class NeuralNetwork:

    def __init__(self, x, y):

        self.input      = x

        self.weights1   = np.random.rand(self.input.shape[1],4)

        self.weights2   = np.random.rand(4,1)                

        self.y          = y

        self.output     = np.zeros(self.y.shape)                

The output of each instance in that neural network is a two-dimensional tuple (table) made of one row (I hope I got it correctly), and four columns. Initially, it is filled with zeros, so as to make room for something more meaningful. The predicted y^-s are supposed to jump into those empty sockets, held ready by the zeros. The ‘random.rand’ expression, associated with ‘weights’ means that the network is supposed to assign randomly different levels of importance to different x-s fed into it.

Anyway, the next step is to instruct my snake (i.e. Python) what to do next, with that class ‘Neural Network’. It is supposed to do two things: feed data forward, i.e. makes those neurons work on predicting the y^-s, and then check itself by an operation called backpropagation of errors. The latter consists in comparing the predicted y^-s with the real y-s, measuring the discrepancy as a loss of information, updating the initial random weights with conclusions from that measurement, and do it all again, and again, and again, until the error runs down to very low values. The weights applied by the network in order to generate that lowest possible error are the best the network can do in terms of learning.

The feeding forward of predicted y^-s goes on in two steps, or in two layers of neurons, one hidden, and one final. They are defined as:

def feedforward(self):

        self.layer1 = sigmoid(np.dot(self.input, self.weights1))

        self.output = sigmoid(np.dot(self.layer1, self.weights2))

The ‘sigmoid’ part means sigmoid function, AKA logistic function, expressed as y=1/(1+e-x), where, at the end of the day, the y always falls somewhere between 0 and 1, and the ‘x’ is not really the empirical, real ‘x’, but the ‘x’ multiplied by a weight, ranging between 0 and 1 as well. The sigmoid function is good for testing the weights we apply to various types of input x-es. Whatever kind of data you take: populations measured in millions, or consumption of energy per capita, measured in kilograms of oil equivalent, the basic sigmoid function y=1/(1+e-x), will always yield a value between 0 and 1. This function essentially normalizes any data.

Now, I want to take differentiated data, like population as headcount, energy consumption in them kilograms of whatever oil equals to, and the supply of money in standardized US dollars. Quite a mix of units and scales of measurement. I label those three as, respectively, xa, xb, and xc. I assign them weights ranging between 0 and 1, so as the sum of weights never exceeds 1. In plain language it means that for every vector of observations made of xa, xb, and xc I take a pinchful of  xa, then a zest of xb, and a spoon of xc. I make them into x = wa*xa + wb*xb + wc*xc, I give it a minus sign and put it as an exponent for the Euler’s constant.

That yields y=1/(1+e-( wa*xa + wb*xb + wc*xc)). Long, but meaningful to the extent that now, my y is always to find somewhere between 0 and 1, and I can experiment with various weights for my various shades of x, and look what it gives in terms of y.

In the algorithm above, the ‘np.dot’ function conveys the idea of weighing our x-s. With two dimensions, like the input signal ‘x’ and its weight ‘w’, the ‘np.dot’ function yields a multiplication of those two one-dimensional matrices, exactly in the x = wa*xa + wb*xb + wc*xc drift.

Thus, the first really smart layer of the network, the hidden one, takes the empirical x-s, weighs them with random weights, and makes a sigmoid of that. The next layer, the output one, takes the sigmoid-calculated values from the hidden layer, and applies the same operation to them.

One more remark about the sigmoid. You can put something else instead of 1, in the nominator. Then, the sigmoid will yield your data normalized over that something. If you have a process that tends towards a level of saturation, e.g. number of psilocybin parties per month, you can put that level in the nominator. On the top of that, you can add parameters to the denominator. In other words, you can replace the 1+e-x with ‘b + e-k*x’, where b and k can be whatever seems to make sense for you. With that specific spin, the sigmoid is good for simulating anything that tends towards saturation over time. Depending on the parameters in denominator, the shape of the corresponding curve will change. Usually, ‘b’ works well when taken as a fraction of the nominator (the saturation level), and the ‘k’ seems to be behaving meaningfully when comprised between 0 and 1.

I return to the algorithm. Now, as the network has generated a set of predicted y^-s, it is time to compare them to the actual y-s, and to evaluate how much is there to learn yet. We can use any measure of error, still, most frequently, them algorithms go after the simplest one, namely the Mean Square Error MSE = [(y1 – y^1)2 + (y2 – y^2)2 + … + (yn – y^n)2]0,5. Yes, it is Euclidean distance between the set of actual y-s and that of predicted y^-s. Yes, it is also the standard deviation of predicted y^-s from the actual distribution of empirical y-s.

In this precise algorithm, the author goes down another avenue: he takes the actual differences between observed y-s and predicted y^-s, and then multiplies it by the sigmoid derivative of predicted y^-s. Then he takes the transpose of a uni-dimensional matrix of those (y – y^)*(y^)’ with (y^)’ standing for derivative. It goes like:

    def backprop(self):

        # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1

        d_weights2 = np.dot(self.layer1.T, (2*(self.y – self.output) * sigmoid_derivative(self.output)))

        d_weights1 = np.dot(self.input.T,  (np.dot(2*(self.y – self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)))

        # update the weights with the derivative (slope) of the loss function

        self.weights1 += d_weights1

        self.weights2 += d_weights2

    def sigmoid(x):

    return 1.0/(1+ np.exp(-x))

    def sigmoid_derivative(x):

     return x * (1.0 – x)

I am still trying to wrap my mind around the reasons for taking this specific approach to the backpropagation of errors. The derivative of a sigmoid y=1/(1+e-x) is y’ =  [1/(1+e-x)]*{1 – [1/(1+e-x)]} and, as any derivative, it measures the slope of change in y. When I do (y1 – y^1)*(y^1)’ + (y2 – y^2)*(y^2)’ + … + (yn – y^n)*(y^n)’ it is as if I were taking some kind of weighted average. That weighted average can be understood in two alternative ways. Either it is standard deviation of y^ from y, weighted with the local slopes, or it is a general slope weighted with local deviations. Now I take the transpose of a matrix like {(y1 – y^1)*(y^1)’ ; (y2 – y^2)*(y^2)’ ; … (yn – y^n)*(y^n)’}, it is a bit as if I made a matrix of inverted terms, i.e. 1/[(yn – y^n)*(y^n)’]. Now, I make a ‘.dot’ product of those inverted terms, so I multiply them by each other. Then, I feed the ‘.dot’ product into the neural network with the ‘+=’ operator. The latter means that in the next round of calculations, the network can do whatever it wants with those terms. Hmmweeellyyeess, makes some sense. I don’t know what exact sense is that, but it has some mathematical charm.

Now, I try to apply the same logic to the data I am working with in my research. Just to give you an idea, I show some data for just one country: Australia. Why Australia? Honestly, I don’t see why it shouldn’t be. Quite a respectable place. Anyway, here is that table. GDP per unit of energy consumed can be considered as the target output variable y, and the rest are those x-s.

Table 1 – Selected data regarding Australia

Year GDP per unit of energy use (constant 2011 PPP $ per kg of oil equivalent) Share of aggregate amortization in the GDP Supply of broad money, % of GDP Energy use (tons of oil equivalent per capita) Urban population as % of total population GDP per capita, ‘000 USD
  y X1 X2 X3 X4 X5
1990 5,662020744 14,46 54,146 5,062 85,4 26,768
1991 5,719765048 14,806 53,369 4,928 85,4 26,496
1992 5,639817305 14,865 56,208 4,959 85,566 27,234
1993 5,597913126 15,277 56,61 5,148 85,748 28,082
1994 5,824685357 15,62 59,227 5,09 85,928 29,295
1995 5,929177604 15,895 60,519 5,129 86,106 30,489
1996 5,780817973 15,431 62,734 5,394 86,283 31,566
1997 5,860645225 15,259 63,981 5,47 86,504 32,709
1998 5,973528571 15,352 65,591 5,554 86,727 33,789
1999 6,139349354 15,086 69,539 5,61 86,947 35,139
2000 6,268129418 14,5 67,72 5,644 87,165 35,35
2001 6,531818805 14,041 70,382 5,447 87,378 36,297
2002 6,563073754 13,609 70,518 5,57 87,541 37,047
2003 6,677186947 13,398 74,818 5,569 87,695 38,302
2004 6,82834791 13,582 77,495 5,598 87,849 39,134
2005 6,99630318 13,737 78,556 5,564 88 39,914
2006 6,908872246 14,116 83,538 5,709 88,15 41,032
2007 6,932137612 14,025 90,679 5,868 88,298 42,022
2008 6,929395465 13,449 97,866 5,965 88,445 42,222
2009 7,039061961 13,698 94,542 5,863 88,59 41,616
2010 7,157467568 12,647 101,042 5,649 88,733 43,155
2011 7,291989544 12,489 100,349 5,638 88,875 43,716
2012 7,671605162 13,071 101,852 5,559 89,015 43,151
2013 7,891026044 13,455 106,347 5,586 89,153 43,238
2014 8,172929207 13,793 109,502 5,485 89,289 43,071

In his article, James Loy reports the cumulative error over 1500 iterations of training, with just four series of x-s, made of four observations. I do something else. I am interested in how the network works, step by step. I do step-by-step calculations with data from that table, following that algorithm I have just discussed. I do it in Excel, and I observe the way that the network behaves. I can see that the hidden layer is really hidden, to the extent that it does not produce much in terms of meaningful information. What really spins is the output layer, thus, in fact, the connection between the hidden layer and the output. In the hidden layer, all the predicted sigmoid y^ are equal to 1, and their derivatives are automatically 0. Still, in the output layer, when the second random distribution of weights overlaps with the first one from the hidden layer. Then, for some years, those output sigmoids demonstrate tiny differences from 1, and their derivatives become very small positive numbers. As a result, tiny, local (yi – y^i)*(y^i)’ expressions are being generated in the output layer, and they modify the initial weights in the next round of training.

I observe the cumulative error (loss) in the first four iterations. In the first one it is 0,003138796, the second round brings 0,000100228, the third round displays 0,0000143, and the fourth one 0,005997739. Looks like an initial reduction of cumulative error, by one order of magnitude at each iteration, and then, in the fourth round, it jumps up to the highest cumulative error of the four. I extend the number to those hand-driven iterations from four to six, and I keep feeding the network with random weights, again and again. A pattern emerges. The cumulative error oscillates. Sometimes the network drives it down, sometimes it swings it up.

F**k! Pardon my French, but just six iterations of that algorithm show me that the thing is really intelligent. It generates an error, it drives it down to a lower value, and then, as if it was somehow dysfunctional to jump to conclusions that quickly, it generates a greater error in consecutive steps, as if it was considering more alternative options. I know that data scientists, should they read this, can slap their thighs at that elderly uncle (i.e. me), fascinated with how a neural network behaves. Still, for me, it is science. I take my data, I feed it into a machine that I see for the first time in my life, and I observe intelligent behaviour in something written on less than one page. It experiments with weights attributed to the stimuli I feed into it, and it evaluates its own error.

Now, I understand why that scientist from MIT, Lex Fridman, says that building artificial intelligence brings insights into how the human brain works.

I am consistently delivering good, almost new science to my readers, and love doing it, and I am working on crowdfunding this activity of mine. As we talk business plans, I remind you that you can download, from the library of my blog, the business plan I prepared for my semi-scientific project Befund  (and you can access the French version as well). You can also get a free e-copy of my book ‘Capitalism and Political Power’ You can support my research by donating directly, any amount you consider appropriate, to my PayPal account. You can also consider going to my Patreon page and become my patron. If you decide so, I will be grateful for suggesting me two things that Patreon suggests me to suggest you. Firstly, what kind of reward would you expect in exchange of supporting me? Secondly, what kind of phases would you like to see in the development of my research, and of the corresponding educational tools?

Combinatorial meaning and the cactus

My editorial on You Tube

I am back into blogging, after over two months of pausing. This winter semester I am going, probably, for record workload in terms of classes: 630 hours in total. October and November look like an immersion time, when I had to get into gear for that amount of teaching. I noticed one thing that I haven’t exactly been aware of, so far, or maybe not as distinctly as I am now: when I teach, I love freestyling about the topic at hand. Whatever hand of nice slides I prepare for a given class, you can bet on me going off the beaten tracks and into the wilderness of intellectual quest, like by the mid-class. I mean, I have nothing against Power Point, but at some point it becomes just so limiting… I remember that conference, one year ago, when the projector went dead during my panel (i.e. during the panel when I was supposed to present my research). I remember that mixed, and shared feeling of relief and enjoyment in people present in the room: ‘Good. Finally, no slides. We can like really talk science’.

See? Once again, I am going off track, and that in just one paragraph of writing. You can see what I mean when I refer to me going off track in class. Anyway, I discovered one more thing about myself: freestyling and sailing uncharted intellectual waters has a cost, and this is a very clear and tangible biological cost. After a full day of teaching this way I feel as if my brain was telling me: ‘Look, bro. I know you would like to write a little, but sorry: no way. Them synapses are just tired. You need to give me a break’.

There is a third thing I have discovered about myself: that intense experience of teaching makes me think a lot. I cannot exactly put all this in writing on the spot, fault of fresh neurotransmitter available, still all that thinking tends to crystallize over time and with some patience I can access it later. Later means now, as it seems. I feel that I have crystallized enough and I can start to pull it out into the daylight. The « it » consists, mostly, in a continuous reflection on collective intelligence. How are we (possibly) smart together?

As I have been thinking about it, three events combined and triggered in me a string of more specific questions. I watched another podcast featuring Jordan Peterson, whom I am a big fan of, and who raised the topic of the neurobiological context of meaning. How our brain makes meaning, and how does it find meaning in sensory experience? On the other hand, I have just finished writing the manuscript of an article on the energy-efficiency of national economies, which I have submitted to the ‘Energy Economics’ journal, and which, almost inevitably, made me work with numbers and statistics. As I had been doing that empirical research, I found out something surprising: the most meaningful econometric results came to the surface when I transformed my original data into local coefficients of an exponential progression that hypothetically started in 1989. Long story short, these coefficients are essentially growth rates, which behave in a peculiar way, due to their arithmetical structure: they decrease very quickly over time, whatever is the source, raw empirical observation, as if they were representing weakening shock waves sent by an explosion in 1989.

Different types of transformed data, the same data, in that research of mine, produced different statistical meanings. I am still coining up real understanding of what it exactly means, by the way. As I was putting that together with Jordan Peterson’s thoughts on meaning as a biological process, I asked myself: what is the exact meaning of the fact that we, as scientific community, assign meaning to statistics? How is it connected with collective intelligence?

I think I need to start more or less where Jordan Peterson moves, and ask ‘What is meaning?’. No, not quite. The ontological type, I mean the ‘What?’ type of question, is a mean beast. Something like a hydra: you cut the head, namely you explain the thing, you think that Bob’s your uncle, and a new head pops up, like out of nowhere, and it bites you, where you know. The ‘How?’ question is a bit more amenable. This one is like one of those husky dogs. Yes, it is semi wild, and yes, it can bite you, but once you tame it, and teach it to pull that sleigh, it will just pull. So I ask ‘How is meaning?’. How does meaning occur?

There is a particular type of being smart together, which I have been specifically interested in, for like the last two months. It is the game-based way of being collectively intelligent. The theory of games is a well-established basis for studying human behaviour, including that of whole social structures. As I was thinking about it, there is a deep reason for that. Social interactions are, well, interactions. It means that I do something and you do something, and those two somethings are supposed to make sense together. They really do at one condition: my something needs to be somehow conditioned by how your something unfolds, and vice versa. When I do something, I come to a point when it becomes important for me to see your reaction to what I do, and only when I will have seen it, I will further develop on my action.

Hence, I can study collective action (and interaction) as a sequence of moves in a game. I make my move, and I stop moving, for a moment, in order to see your move. You make yours, and it triggers a new move in me, and so the story goes further on in time. We can experience it very vividly in negotiations. With any experience in having serious talks with other people, thus when we negotiate something, we know that it is pretty counter-efficient to keep pushing our point in an unbroken stream of speech. It is much more functional to pace our strategy into separate strings of argumentation, and between them, we wait for what the other person says. I have already given a first theoretical go at the thing in « Couldn’t they have predicted that? ».

This type of social interaction, when we pace our actions into game-like moves, is a way of being smart together. We can come up with new solutions, or with the understanding of new problems – or a new understanding of old problems, as a matter of fact – and we can do it starting from positions of imperfect agreement and imperfect coordination. We try to make (apparently) divergent points, or we pursue (apparently) divergent goals, and still, if we accept to wait for each other’s reaction, we can coordinate and/or agree about those divergences, so as to actually figure out, and do, some useful s**t together.

What connection with the results of my quantitative research? Let’s imagine that we play a social game, and each of us makes their move, and then they wait for the moves of other players. The state of the game at any given moment can be represented as the outcome of past moves. The state of reality is like a brick wall, made of bricks laid one by one, and the state of that brick wall is the outcome of the past laying of bricks.  In the general theory of science, it is called hysteresis. There is a mathematical function, reputed to represent that thing quite nicely: the exponential progression. On a timeline, I define equal intervals. To each period of time, I assign a value y(t) = et*a, where ‘t’ is the ordinal of the time period, ‘e’ is a mathematical constant, the base of natural logarithm, e = 2,7188, and ‘a’ is what we call the exponential coefficient.

There is something else to that y = et*a story. If we think like in terms of a broader picture, and assume that time is essentially what we imagine it is, the ‘t’ part can be replaced by any number we imagine. Then, the Euler’s formula steps in: ei*x = cos x + i*sin x. If you paid attention in math classes, at high school, you might remember that sine and cosine, the two trigonometric functions, have a peculiar property. As they refer to angles, at the end of the day they refer to a full circle of 360°. It means they go in a circle, thus in a cycle, only they go in perfectly negative a correlation: when the sine goes one unit one way, the cosine goes one unit exactly the other way round etc. We can think about each occurrence we experience – the ‘x’ –  as a nexus of two, mutually opposing cycles, and they can be represented as, respectively, the sine, and the cosine of that occurrence ‘x’. When I grow in height (well, when I used to), my current height can be represented as the nexus of natural growth (sine), and natural depletion with age (cosine), that sort of things.

Now, let’s suppose that we, as a society, play two different games about energy. One game makes us more energy efficient, ‘cause we know we should (see Settlement by energy – can renewable energies sustain our civilisation?). The other game makes us max out on our intake of energy from the environment (see Technological Change as Intelligent, Energy-Maximizing Adaptation). At any given point in time, the incremental change in our energy efficiency is the local equilibrium between those two games. Thus, if I take the natural logarithm of our energy efficiency at a given point in space-time, thus the coefficient of GDP per kg of oil equivalent in energy consumed, that natural logarithm is the outcome of those two games, or, from a slightly different point of view, it descends from the number of consecutive moves made (the ordinal of time period we are currently in), and from a local coefficient – the equivalent of ‘i’ in the Euler’s formula – which represents the pace of building up the outcomes of past moves in the game.

I go back to that ‘meaning’ thing. The consecutive steps ‘t’ in an exponential progression y(t) = et*a progression correspond to successive rounds of moves in the games we play. There is a core structure to observe: the length of what I call ‘one move’, and which means a sequence of actions that each person involved in the interaction carries out without pausing and waiting for the reaction observable in other people in the game. When I say ‘length’, it involves a unit of measurement, and here, I am quite open. It can be a length of time, or the number of distinct actions in my sequence. The length of one move in the game determines the pace of the game, and this, in turn, sets the timeframe for the whole game to produce useful results: solutions, understandings, coordinated action etc.

Now, where the hell is any place for ‘meaning’ in all that game stuff? My view is the following: in social games, we sequence our actions into consecutive moves, with some waiting-for-reaction time in between, because we ascribe meaning to those sub-sequences that we define as ‘one move’. The way we process meaning matters for the way we play social games.

I am a scientist (well, I hope), and for me, meaning occurs very largely as I read what other people have figured out. So I stroll down the discursive avenue named ‘neurobiology of meaning’, welcomingly lit by with the lampposts of Science Direct. I am calling by an article by Lee M. Pierson, and Monroe Trout, entitled ‘What is consciousness for?[1]. The authors formulate a general hypothesis, unfortunately not supported (yet?) with direct empirical check, that consciousness had been occurring, back in the day, I mean like really back in the day, as cognitive support of volitional movement, and evolved, since then, into more elaborate applications. Volitional movement is non-automatic, i.e. decisions have to be made in order for the movement to have any point. It requires quick assemblage of data on the current situation, and consciousness, i.e. the awareness of many abstract categories in the same time, could the solution.

According to that approach, meaning occurs as a process of classification in the neurologically stored data that we need to use virtually simultaneously in order to do something as fundamental as reaching for another can of beer. Classification of data means grouping into sets. You have a random collection of data from sensory experience, like a homogenous cloud of information. You know, the kind you experience after a particularly eventful party. Some stronger experiences stick out: the touch of cold water on your naked skin, someone’s phone number written on your forearm with a lipstick etc. A question emerges: should you call this number? It might be your new girlfriend (i.e. the girlfriend whom you don’t consciously remember as your new one but whom you’d better to if you don’t want your car splashed with acid), or it might be a drug dealer whom you’d better not call back.  You need to group the remaining data in functional sets so as to take the right action.

So you group, and the challenge is to make the right grouping. You need to collect the not-quite-clear-in-their-meaning pieces of information (Whose lipstick had that phone number been written with? Can I associate a face with the lipstick? For sure, the right face?). One grouping of data can lead you to a happy life, another one can lead you into deep s**t. It could be handy to sort of quickly test many alternative groupings as for their elementary coherence, i.e. hold all that data in front of you, for a moment, and contemplate flexibly many possible connections. Volitional movement is very much about that. You want to run? Good. It would be nice not to run into something that could hurt you, so it would be good to cover a set of sensory data, combining something present (what we see), with something we remember from the past (that thing on the 2 o’clock azimuth stings like hell), and sort of quickly turn and return all that information so as to steer clear from that cactus, as we run.

Thus, as I follow the path set by Pierson and Trout, meaning occurs as the grouping of data in functional categories, and it occurs when we need to do it quickly and sort of under pressure of getting into trouble. I am going onto the level of collective intelligence in human social structures. In those structures, meaning, i.e. the emergence of meaningful distinctions communicable between human beings and possible to formalize in language, would occur as said structures need to figure something out quickly and under uncertainty, and meaning would allow putting together the types of information that are normally compartmentalized and fragmented.

From that perspective, one meaningful move in a game encompasses small pieces of action which we intuitively guess we should immediately group together. Meaningful moves in social games are sequences of actions, which we feel like putting immediately back to back, without pausing and letting the other player do their thing. There is some sort of pressing immediacy in that grouping. We guess we just need to carry out those actions smoothly one after the other, in an unbroken sequence. Wedging an interval of waiting time in between those actions could put our whole strategy at peril, or we just think so.

When I apply this logic to energy efficiency, I think about business strategies regarding innovation in products and technologies. When we launch a new product, or implement a new technology, there is something like fixed patterns to follow. When you start beta testing a new mobile app, for example, you don’t stop in the middle of testing. You carry out the tests up to their planned schedule. When you start launching a new product (reminder: more products made on the same energy base mean greater energy efficiency), you keep launching until you reach some sort of conclusive outcome, like unequivocal success or failure. Social games we play around energy efficiency could very well be paced by this sort of business-strategy-based moves.

I pick up another article, that by Friedemann Pulvermüller (2013[2]). The main thing I see right from the beginning is that apparently, neurology is progressively dropping the idea of one, clearly localised area in our brain, in charge of semantics, i.e. of associating abstract signs with sensory data. What we are discovering is that semantics engage many areas in our brain into mutual connection. You can find developments on that issue in: Patterson et al. 2007[3], Bookheimer 2002[4], Price 2000[5], and Binder & Desai 2011[6]. As we use words, thus as we pronounce, hear, write or read them, that linguistic process directly engages (i.e. is directly correlated with the activation of) sensory and motor areas of our brain. That engagement follows multiple, yet recurrent patterns. In other words, instead of having one mechanism in charge of meaning, we are handling different ones.

After reviewing a large bundle of research, Pulvermüller proposes four different patterns: referential, combinatorial, emotional-affective, and abstract semantics. Each time, the semantic pattern consists in one particular area of the brain acting as a boss who wants to be debriefed about something from many sources, and starts pulling together many synaptic strings connected to many places in the brain. Five different pieces of cortex come recurrently as those boss-hubs, hungry for differentiated data, as we process words. They are: inferior frontal cortex (iFC, so far most commonly associated with the linguistic function), superior temporal cortex (sTC), inferior parietal cortex (iPC), inferior and middle temporal cortex (m/iTC), and finally the anterior temporal cortex (aTC). The inferior frontal cortex (iFC) seems to engage in the processing of words related to action (walk, do etc.). The superior temporal cortex (sTC) looks like seriously involved when words related to sounds are being used. The inferior parietal cortex (iPC) activates as words connect to space, and spatio-temporal constructs. The inferior and middle temporal cortex (m/iTC) lights up when we process words connected to animals, tools, persons, colours, shapes, and emotions. That activation is category specific, i.e. inside m/iTC, different Christmas trees start blinking as different categories among those are being named and referred to semantically. The anterior temporal cortex (aTC), interestingly, has not been associated yet with any specific type of semantic connections, and still, when it is damaged, semantic processing in our brain is generally impaired.

All those areas of the brain have other functions, besides that semantic one, and generally speaking, the kind of meaning they process is correlated with the kind of other things they do. The interesting insight, at this point, is the polyvalence of cortical areas that we call ‘temporal’, thus involved in the perception of time. Physicists insist very strongly that time is largely a semantic construct of ours, i.e. time is what we think there is rather than what really is, out there. In physics, what exists is rather sequential a structure of reality (things happen in an order) than what we call time. That review of literature by Pulvermüller indirectly indicates that time is a piece of meaning that we attach to sounds, colours, emotions, animal and people. Sounds come as logical: they are sequences of acoustic waves. On the other hand, how is our perception of colours, or people, connected to our concept of time? This is a good one to ask, and a tough one to answer. What I would look for is recurrence. We identify persons as distinct ones as we interact with them recurrently. Autistic people have frequently that problem: when you put on a different jacket, they have hard time accepting you are the same person. Identification of animals or emotions could follow the same logic.

The article discusses another interesting issue: the more abstract the meaning is, the more different regions of the brain it engages. The really abstract ones, like ‘beauty’ or ‘freedom’, are super Christmas-trees: they provoke involvement all over the place. When we do abstraction, in our mind, for example when writing poetry (OK, just good poetry), we engage a substantial part of our brain. This is why we can be lost in our thoughts: those thoughts, when really abstract, are really energy-consuming, and they might require to shut down some other functions.

My personal understanding of the research reviewed by Pulvermüller is that at the neurological level, we process three essential types of meaning. One consists in finding our bearings in reality, thus in identifying things and people around, and in assigning emotions to them. It is something like a mapping function. Then, we need to do things, i.e. to take action, and that seems to be a different semantic function. Finally, we abstract, thus we connect distant parcels of data into something that has no direct counterpart neither in the mapped reality, nor in our actions.

I have an indirect insight, too. We have a neural wiring, right? We generate meaning with that wiring, right? Now, how is adaptation occurring, in that scheme, over time? Do we just adapt the meaning we make to the neural hardware we have, or is there a reciprocal kick, I mean from meaning to wiring? So far, neurological research has demonstrated that physical alteration in specific regions of the brain impacts semantic functions. Can it work the other way round, i.e. can recurrent change in semantics being processed alter the hardware we have between our ears? For example, as we process a lot of abstract concepts, like ‘taxes’ or ‘interest rate’, can our brains adapt from generation to generation, so as to minimize the gradient of energy expenditure as we shift between levels of abstraction? If we could, we would become more intelligent, i.e. able to handle larger and more differentiated sets of data in a shorter time.

How does all of this translate into collective intelligence? Firstly, there seem to be layers of such intelligence. We can be collectively smart sort of locally – and then we handle those more basic things, like group identity or networks of exchange – and then we can (possibly) become collectively smarter at more combinatorial a level, handling more abstract issues, like multilateral peace treaties or climate change. Moreover, the gradient of energy consumed, between the collective understanding of simple and basic things, on the one hand, and the overarching abstract issues, is a good predictor regarding the capacity of the given society to survive and thrive.

Once again, I am trying to associate this research in neurophysiology with my game-theoretical approach to energy markets. First of all, I recall the three theories of games, co-awarded the economic Nobel prize in 1994, namely those by: John Nash, John (Yan) Harsanyi, and Reinhard Selten. I start with the latter. Reinhard Selten claimed, and seems to have proven, that social games have a memory, and the presence of such memory is needed in order for us to be able to learn collectively through social games. You know those situations of tough talks, when the other person (or you) keeps bringing forth the same argumentation over and over again? This is an example of game without much memory, i.e. without much learning. In such a game we repeat the same move, like fish banging its head against the glass wall of an aquarium. Playing without memory is possible in just some games, e.g. tennis, or poker, if the opponent is not too tough. In other games, like chess, repeating the same move is not really possible. Such games force learning upon us.

Active use of memory requires combinatorial meaning. We need to know what is meaningful, in order to remember it as meaningful, and thus to consider it as valuable data for learning. The more combinatorial meaning is, inside a supposedly intelligent structure, such as our brain, the more energy-consuming that meaning is. Games played with memory and active learning could be more energy-consuming for our collective intelligence than games played without. Maybe that whole thing of electronics and digital technologies, so hungry of energy, is a way that we, collective human intelligence, put in place in order to learn more efficiently through our social games?

I am consistently delivering good, almost new science to my readers, and love doing it, and I am working on crowdfunding this activity of mine. As we talk business plans, I remind you that you can download, from the library of my blog, the business plan I prepared for my semi-scientific project Befund  (and you can access the French version as well). You can also get a free e-copy of my book ‘Capitalism and Political Power’ You can support my research by donating directly, any amount you consider appropriate, to my PayPal account. You can also consider going to my Patreon page and become my patron. If you decide so, I will be grateful for suggesting me two things that Patreon suggests me to suggest you. Firstly, what kind of reward would you expect in exchange of supporting me? Secondly, what kind of phases would you like to see in the development of my research, and of the corresponding educational tools?

[1] Pierson, L. M., & Trout, M. (2017). What is consciousness for?. New Ideas in Psychology, 47, 62-71.

[2] Pulvermüller, F. (2013). How neurons make meaning: brain mechanisms for embodied and abstract-symbolic semantics. Trends in cognitive sciences, 17(9), 458-470.

[3] Patterson, K. et al. (2007) Where do you know what you know? The representation of semantic knowledge in the human brain. Nat. Rev. Neurosci. 8, 976–987

[4] Bookheimer,S.(2002) FunctionalMRIoflanguage:newapproachesto understanding the cortical organization of semantic processing. Annu. Rev. Neurosci. 25, 151–188

[5] Price, C.J. (2000) The anatomy of language: contributions from functional neuroimaging. J. Anat. 197, 335–359

[6] Binder, J.R. and Desai, R.H. (2011) The neurobiology of semantic memory. Trends Cogn. Sci. 15, 527–536