The point of doing manually what the loop is supposed to do

My editorial on You Tube

OK, here is the big picture. The highest demographic growth, in absolute numbers, takes place in Asia and Africa. The biggest migratory flows start from there, as well, and aim at and into regions with much less of human mass in accrual: North America and Europe. Less human accrual, indeed, and yet much better conditions for each new homo sapiens. In some places on the planet, a huge amount of humans is born every year. That huge amount means a huge number of genetic variations around the same genetic tune, namely that of the homo sapiens. Those genetic variations leave their homeland, for a new and better homeland, where they bring their genes into a new social environment, which assures them much more safety, and higher odds of prolonging their genetic line.

What is the point of there being more specimens of any species? I mean, is there a logic to increasing the headcount of any population? When I say ‘any’, is ranges from bacteria to us, humans. After having meddled with the most basic algorithm of a neural network (see « Pardon my French, but the thing is really intelligent » and « Ce petit train-train des petits signaux locaux d’inquiétude »), I have some thoughts about what intelligence is. I think that intelligence is a class, i.e. it is a framework structure able to produce many local, alternative instances of itself.

Being intelligent consists, to start with, in creating alternative versions of itself, and creating them purposefully imperfect so as to generate small local errors, whilst using those errors to create still different versions of itself. The process is tricky. There is some sort of fundamental coherence required between the way of creating those alternative instances of oneself, and the way that resulting errors are being processed. Fault of such coherence, the allegedly intelligent structure can fall into purposeful ignorance, or into panic.

Purposeful ignorance manifests as the incapacity to signal and process the local imperfections in alternative instances of the intelligent structure, although those imperfections actually stand out and wave at you. This is the ‘everything is just fine and there is no way it could be another way’ behavioural pattern. It happens, for example, when the function of processing local errors is too gross – or not sharp enough, if you want – to actually extract meaning from tiny, still observable local errors. The panic mode of an intelligent structure, on the other hand, is that situation when the error-processing function is too sharp for the actually observable errors. Them errors just knock it out of balance, like completely, and the function signals general ‘Error’, or ‘I can’t stand this cognitive dissonance’.

So, what is the point of there being more specimens of any species? The point might be to generate as many specific instances of an intelligent structure – the specific DNA – as possible, so as to generate purposeful (and still largely unpredictable) errors, just to feed those errors into the future instantiations of that structure. In the process of breeding, some path of evolutionary coherence leads to errors that can be handled, and that path unfolds between a state of evolutionary ‘everything is OK, no need to change anything’ (case mosquito, unchanged for millions of years), and a state of evolutionary ‘what the f**k!?’ (case common fruit fly, which produces insane amount of mutations in response to the slightest environmental stressor).

Essentially, all life could be a framework structure, which, back in the day, made a piece of software in artificial intelligence – the genetic code – and ever since that piece of software has been working on minimizing the MSE (mean square error) in predicting the next best version of life, and it has been working by breeding, in a tree-like method of generating variations,  indefinitely many instances of the framework structure of life. Question: what happens when, one day, a perfect form of life emerges? Something like TRex – Megalodon – Angelina Jolie – Albert Einstein – Jeff Bezos – [put whatever or whoever you like in the rest of that string]? On the grounds of what I have already learnt about artificial intelligence, such a state of perfection would mean the end of experimentation, thus the end of multiplying instances of the intelligent structure, thus the end of births and deaths, thus the end of life.

Question: if the above is even remotely true, does that overarching structure of life understand how the software it made – the genetic code – works? Not necessarily. That very basic algorithm of neural network, which I have experimented with a bit, produces local instances of the sigmoid function Ω = 1/(1 + e-x) such that Ω < 1, and that 1 + e-x > 1, which is always true. Still, the thing does it just sometimes. Why? How? Go figure. That thing accomplishes an apparently absurd task, and it does so just by being sufficiently flexible with its random coefficients. If Life In General is God, that God might not have a clue about how the actual life works. God just needs to know how to write an algorithm for making actual life work. I would even say more: if God is any good at being one, he would write an algorithm smarter than himself, just to make things advance.

The hypothesis of life being one, big, intelligent structure gives an interesting insight into what the cost of experimentation is. Each instance of life, i.e. each specimen of each species needs energy to sustain it. That energy takes many forms: light, warmth, food, Lexus (a form of matter), parties, Armani (another peculiar form of matter) etc. The more instances of life are there, the more energy they need to be there. Even if we take the Armani particle out of the equation, life is still bloody energy-consuming. The available amount of energy puts a limit to the number of experimental instances of the framework, structural life that the platform (Earth) can handle.

Here comes another one about climate change. Climate change means warmer, let’s be honest. Warmer means more energy on the planet. Yes, temperature is our human measurement scale for the aggregate kinetic energy of vibrating particles. More energy is what we need to have more instances of framework life, in the same time. Logically, incremental change in total energy on the planet translates into incremental change in the capacity of framework life to experiment with itself. Still, as framework life could be just the God who made that software for artificial intelligence (yes, I am still in the same metaphor), said framework life could not be quite aware of how bumpy could the road be, towards the desired minimum in the Mean Square Error. If God is an IT engineer, it could very well be the case.

I had that conversation with my son, who is graduating his IT engineering studies. I told him ‘See, I took that algorithm of neural network, and I just wrote its iterations out into separate tables of values in Excel, just to see what it does, like iteration after iteration. Interesting, isn’t it? I bet you have done such thing many times, eh?’. I still remember that heavy look in my son’s eyes: ‘Why the hell should I ever do that?’ he went. ‘There is a logical loop in that algorithm, you see? This loop is supposed to do the job, I mean to iterate until it comes up with something really useful. What is the point of doing manually what the loop is supposed to do for you? It is like hiring a gardener and then doing everything in the garden by yourself, just to see how it goes. It doesn’t make sense!’. ‘But it’s interesting to observe, isn’t it?’ I went, and then I realized I am talking to an alien form of intelligence, there.

Anyway, if God is a framework life who created some software to learn in itself, it could not be quite aware of the tiny little difficulties in the unfolding of the Big Plan. I mean acidification of oceans, hurricanes and stuff. The framework life could say: ‘Who cares? I want more learning in my algorithm, and it needs more energy to loop on itself, and so it makes those instances of me, pumping more carbon into the atmosphere, so as to have more energy to sustain more instances of me. Stands to reason, man. It is all working smoothly. I don’t understand what you are moaning about’.

Whatever that godly framework life says, I am still interested in studying particular instances of what happens. One of them is my business concept of EneFin. See « Which salesman am I? » as what I think is the last case of me being like fully verbal about it. Long story short, the idea consists in crowdfunding capital for small, local operators of power systems based on renewable energies, by selling shares in equity, or units of corporate debt, in bundles with tradable claims on either the present output of energy, or the future one. In simple terms, you buy from that supplier of energy tradable claims on, for example, 2 000 kWh, and you pay the regular market price, still, in that price, you buy energy properly spoken with a juicy discount. The rest of the actual amount of money you have paid buys you shares in your supplier’s equity.

The idea in that simplest form is largely based on two simple observations about energy bills we pay. In most countries (at least in Europe), our energy bills are made of two components: the (slightly) variable value of the energy actually supplied, and a fixed part labelled sometimes as ‘maintenance of the grid’ or similar. Besides, small users (e.g. households) usually pay a much higher unitary price per kWh than large, institutional scale buyers (factories, office buildings etc.). In my EneFin concept, a local supplier of renewable energy makes a deal with its local customers to sell them electricity at a fair, market price, with participations in equity on the top of electricity.

That would be a classical crowdfunding scheme, such as you can find with, StartEngine, for example. I want to give it some additional, financial spin. Classical crowdfunding has a weakness: low liquidity. The participatory shares you buy via crowdfunding are usually non-tradable, and they create a quasi-cooperative bond between investors and investees. Where I come from, i.e. in Central Europe, we are quite familiar with cooperatives. At the first sight, they look like a form of institutional heaven, compared to those big, ugly, capitalistic corporations. Still, after you have waved out that first mist, cooperatives turn out to be very exposed to embezzlement, and to abuse of managerial power. Besides, they are quite weak when competing for capital against corporate structures. I want to create highly liquid a transactional platform, with those investments being as tradable as possible, and use financial liquidity as a both a shield against managerial excesses, and a competitive edge for those small ventures.

My idea is to assure liquidity via a FinTech solution similar to that used by Katipult Technology Corp., i.e. to create some kind of virtual currency (note: virtual currency is not absolutely the same as cryptocurrency; cousins, but not twins, so to say). Units of currency would correspond to those complex contracts « energy plus equity ». First, you create an account with EneFin, i.e. you buy a certain amount of the virtual currency used inside the EneFin platform. I call them ‘tokens’ to simplify. Next, you pick your complex contracts, in the basket of those offered by local providers of energy. You buy those contracts with the tokens you have already acquired. Now, you change your mind. You want to withdraw your capital from the supplier A, and move it to supplier H, you haven’t considered so far. You move your tokens from A to H, even with a mobile app. It means that the transactional platform – the EneFin one – buys from you the corresponding amount of equity of A and tries to find for you some available equity in H. You can also move your tokens completely out of investment in those suppliers of energy. You can free your money, so to say. Just as simple: you just move them out, even with a movement of your thumb on the screen. The EneFin platform buys from you the shares you have moved out of.

You have an even different idea. Instead of investing your tokens into the equity of a provider of energy, you want to lend them. You move your tokens to the field ‘lending’, you study the interest rates offered on the transactional platform, and you close the deal. Now, the corresponding number of tokens represents securitized (thus tradable) corporate debt.

Question: why the hell bothering about a virtual currency, possibly a cryptocurrency, instead of just using good old fiat money? At this point, I am reaching to the very roots of the Bitcoin, the grandpa of all cryptocurrencies (or so they say). Question: what amount of money you need to finance 20 transactions of equal unitary value P? Answer: it depends on how frequently you monetize them. Imagine that the EneFin app offers you an option like ‘Monetize vs. Don’t Monetize’. As long as – with each transaction you do on the platform – you stick to the ‘Don’t Monetize’ option, your transactions remain recorded inside the transactional platform, and so there is recorded movement in tokens, but there is no monetary outcome, i.e. your strictly spoken monetary balance, for example that in €, does not change. It is only when you hit the ‘Monetize’ button in the app that the current bottom line of your transactions inside the platform is being converted into « official » money.

The virtual currency in the EneFin scheme would serve to allow a high level of liquidity (more transactions in a unit of time), without provoking the exactly corresponding demand for money. What connection with artificial intelligence? I want to study the possible absorption of such a scheme in the market of energy, and in the related financial markets, as a manifestation of collective intelligence. I imagine two intelligent framework structures: one incumbent (the existing markets) and one emerging (the EneFin platform). Both are intelligent structures to the extent that they technically can produce many alternative instances of themselves, and thus intelligently adapt to their environment by testing those instances and utilising the recorded local errors.

In terms of an algorithm of neural network, that intelligent adaptation can be manifest, for example, as an optimization in two coefficients: the share of energy traded via EneFin in the total energy supplied in the given market, and the capitalization of EneFin as a share in the total capitalization of the corresponding financial markets. Those two coefficients can be equated to weights in a classical MLP (Multilayer Perceptron) network, and the perceptron network could work around them. Of course, the issue can be approached from a classical methodological angle, as a general equilibrium to assess via « normal » econometric modelling. Still, what I want is precisely what I hinted in « Pardon my French, but the thing is really intelligent » and « Ce petit train-train des petits signaux locaux d’inquiétude »: I want to study the very process of adaptation and modification in those intelligent framework structures. I want to know, for example, how much experimentation those structures need to form something really workable, i.e. an EneFin platform with serious business going on, and, in the same time, that business contributing to the development of renewable energies in the given region of the world. Do those framework structures have enough local resources – mostly capital – for sustaining the number of alternative instances needed for effective learning? What kind of factors can block learning, i.e. drive the framework structure either into deliberate an ignorance of local errors or into panic?

Here is an example of more exact a theoretical issue. In a typical economic model, things are connected. When I pull on the string ‘capital invested in fixed assets’, I can see a valve open, with ‘Lifecycle in incumbent technologies’, and some steam rushes out. When I push the ‘investment in new production capacity’ button, I can see something happening in the ‘Jobs and employment’ department. In other words, variables present in economic systems mutually constrain each other. Just some combinations work, others just don’t. Now, the thing I have already discovered about them Multilayer Perceptrons is that as soon as I add some constraint on the weights assigned to input data, for example when I swap ‘random’ for ‘erandom’, the scope of possible structural functions leading to effective learning dramatically shrinks, and the likelihood of my neural network falling into deliberate ignorance or into panic just swells like hell. What degree of constraint on those economic variables is tolerable in the economic system conceived as a neural network, thus as a framework intelligent structure?

There are some general guidelines I can see for building a neural network that simulates those things. Creating local power systems, based on microgrids connected to one or more local sources of renewable energies, can be greatly enhanced with efficient financing schemes. The publicly disclosed financial results of companies operating in those segments – such as Tesla[1], Vivint Solar[2], FirstSolar[3], or 8Point3 Energy Partners[4] – suggest that business models in that domain are only emerging, and are far from being battle-tested. There is still a way to pave towards well-rounded business practices as regards such local power systems, both profitable economically and sustainable socially.

The basic assumptions of a neural network in that field are essentially behavioural. Firstly, consumption of energy is greatly predictable at the level of individual users. The size of a market in energy changes, as the number of users change. The output of energy needed to satisfy those users’ needs, and the corresponding capacity to install, are largely predictable on the long run. Consumers of energy use a basket of energy-consuming technologies. The structure of this basket determines their overall consumption, and is determined, in turn, by long-run social behaviour. Changes over time in that behaviour can be represented as a social game, where consecutive moves consist in purchasing, or disposing of a given technology. Thus, a game-like process of relatively slow social change generates a relatively predictable output of energy, and a demand thereof. Secondly, the behaviour of investors in any financial market, crowdfunding or other, is comparatively more volatile. Investment decisions are being taken, and modified at a much faster pace than decisions about the basket of technologies used in everyday life.

The financing of relatively small, local power systems, based on renewable energies and connected by local microgrids, implies an interplay of the two above-mentioned patterns, namely the relatively slower transformation in the technological base, and the quicker, more volatile modification of investors’ behaviour in financial markets.

I am consistently delivering good, almost new science to my readers, and love doing it, and I am working on crowdfunding this activity of mine. As we talk business plans, I remind you that you can download, from the library of my blog, the business plan I prepared for my semi-scientific project Befund  (and you can access the French version as well). You can also get a free e-copy of my book ‘Capitalism and Political Power’ You can support my research by donating directly, any amount you consider appropriate, to my PayPal account. You can also consider going to my Patreon page and become my patron. If you decide so, I will be grateful for suggesting me two things that Patreon suggests me to suggest you. Firstly, what kind of reward would you expect in exchange of supporting me? Secondly, what kind of phases would you like to see in the development of my research, and of the corresponding educational tools?

[1] http://ir.tesla.com/ last access December, 18th, 2018

[2] https://investors.vivintsolar.com/company/investors/investors-overview/default.aspx last access December, 18th, 2018

[3] http://investor.firstsolar.com/ last access December, 18th, 2018

[4] http://ir.8point3energypartners.com/ last access December, 18th, 2018

Pardon my French, but the thing is really intelligent

My editorial on You Tube

And so I am meddling with neural networks. It had to come. It just had to. I started with me having many ideas to develop at once. Routine stuff with me. Then, the Editor-in-Chief of the ‘Energy Economics’ journal returned my manuscript of article on the energy-efficiency of national economies, which I had submitted with them, with a general remark that I should work both on the clarity of my hypotheses, and on the scientific spin of my empirical research. In short, Mr Wasniewski, linear models tested with Ordinary Least Squares is a bit oldie, if you catch my drift. Bloody right, Mr Editor-In-Chief. Basically, I agree with your remarks. I need to move out of my cavern, towards the light of progress, and get acquainted with the latest fashion. The latest fashion we are wearing this season is artificial intelligence, machine learning, and neural networks.

It comes handy, to the extent that I obsessively meddle with the issue of collective intelligence, and am dreaming about creating a model of human social structure acting as collective intelligence, sort of a beehive. Whilst the casting for a queen in that hive remains open, and is likely to stay this way for a while, I am digging into the very basics of neural networks. I am looking in the Python department, as I have already got a bit familiar with that environment. I found an article by James Loy, entitled “How to build your own Neural Network from scratch in Python”. The article looks a bit like sourcing from another one, available at the website of ProBytes Software, thus I use both to develop my understanding. I pasted the whole algorithm by James Loy into my Python Shell, made in run with an ‘enter’, and I am waiting for what it is going to produce. In the meantime, I am being verbal about my understanding.

The author declares he wants to do more or less the same thing that I, namely to understand neural networks. He constructs a simple algorithm for a neural network. It starts with defining the neural network as a class, i.e. as a callable object that acts as a factory for new instances of itself. In the neural network defined as a class, that algorithm starts with calling the constructor function ‘_init_’, which constructs an instance ‘self’ of that class. It goes like ‘def __init__(self, x, y):’. In other words, the class ‘Neural network’ generates instances ‘self’ of itself, and each instance is essentially made of two variables: input x, and output y. The ‘x’ is declared as input variable through the ‘self.input = x’ expression. Then, the output of the network is defined in two steps. Yes, the ‘y’ is generally the output, only in a neural network, we want the network to predict a value of ‘y’, thus some kind of y^. What we have to do is to define ‘self.y = y’, feed the real x-s and the real y-s into the network, and expect the latter to turn out some y^-s.

Logically, we need to prepare a vessel for holding the y^-s. The vessel is defined as ‘self.output = np.zeros(y.shape)’. The ‘shape’ function defines a tuple – a table, for those mildly fond of maths – with given dimensions. What are the dimensions of ‘y’ in that ‘y.shape’? They have been given earlier, as the weights of the network were being defined. It goes as follows. It starts, thus, right after the ‘self.input = x’ has been said, ‘self.weights1 = np.random.rand(self.input.shape[1],4)’ fires off, closely followed by ‘self.weights2 =  np.random.rand(4,1)’. All in all, the entire class of ‘Neural network’ is defined in the following form:

class NeuralNetwork:

    def __init__(self, x, y):

        self.input      = x

        self.weights1   = np.random.rand(self.input.shape[1],4)

        self.weights2   = np.random.rand(4,1)                

        self.y          = y

        self.output     = np.zeros(self.y.shape)                

The output of each instance in that neural network is a two-dimensional tuple (table) made of one row (I hope I got it correctly), and four columns. Initially, it is filled with zeros, so as to make room for something more meaningful. The predicted y^-s are supposed to jump into those empty sockets, held ready by the zeros. The ‘random.rand’ expression, associated with ‘weights’ means that the network is supposed to assign randomly different levels of importance to different x-s fed into it.

Anyway, the next step is to instruct my snake (i.e. Python) what to do next, with that class ‘Neural Network’. It is supposed to do two things: feed data forward, i.e. makes those neurons work on predicting the y^-s, and then check itself by an operation called backpropagation of errors. The latter consists in comparing the predicted y^-s with the real y-s, measuring the discrepancy as a loss of information, updating the initial random weights with conclusions from that measurement, and do it all again, and again, and again, until the error runs down to very low values. The weights applied by the network in order to generate that lowest possible error are the best the network can do in terms of learning.

The feeding forward of predicted y^-s goes on in two steps, or in two layers of neurons, one hidden, and one final. They are defined as:

def feedforward(self):

        self.layer1 = sigmoid(np.dot(self.input, self.weights1))

        self.output = sigmoid(np.dot(self.layer1, self.weights2))

The ‘sigmoid’ part means sigmoid function, AKA logistic function, expressed as y=1/(1+e-x), where, at the end of the day, the y always falls somewhere between 0 and 1, and the ‘x’ is not really the empirical, real ‘x’, but the ‘x’ multiplied by a weight, ranging between 0 and 1 as well. The sigmoid function is good for testing the weights we apply to various types of input x-es. Whatever kind of data you take: populations measured in millions, or consumption of energy per capita, measured in kilograms of oil equivalent, the basic sigmoid function y=1/(1+e-x), will always yield a value between 0 and 1. This function essentially normalizes any data.

Now, I want to take differentiated data, like population as headcount, energy consumption in them kilograms of whatever oil equals to, and the supply of money in standardized US dollars. Quite a mix of units and scales of measurement. I label those three as, respectively, xa, xb, and xc. I assign them weights ranging between 0 and 1, so as the sum of weights never exceeds 1. In plain language it means that for every vector of observations made of xa, xb, and xc I take a pinchful of  xa, then a zest of xb, and a spoon of xc. I make them into x = wa*xa + wb*xb + wc*xc, I give it a minus sign and put it as an exponent for the Euler’s constant.

That yields y=1/(1+e-( wa*xa + wb*xb + wc*xc)). Long, but meaningful to the extent that now, my y is always to find somewhere between 0 and 1, and I can experiment with various weights for my various shades of x, and look what it gives in terms of y.

In the algorithm above, the ‘np.dot’ function conveys the idea of weighing our x-s. With two dimensions, like the input signal ‘x’ and its weight ‘w’, the ‘np.dot’ function yields a multiplication of those two one-dimensional matrices, exactly in the x = wa*xa + wb*xb + wc*xc drift.

Thus, the first really smart layer of the network, the hidden one, takes the empirical x-s, weighs them with random weights, and makes a sigmoid of that. The next layer, the output one, takes the sigmoid-calculated values from the hidden layer, and applies the same operation to them.

One more remark about the sigmoid. You can put something else instead of 1, in the nominator. Then, the sigmoid will yield your data normalized over that something. If you have a process that tends towards a level of saturation, e.g. number of psilocybin parties per month, you can put that level in the nominator. On the top of that, you can add parameters to the denominator. In other words, you can replace the 1+e-x with ‘b + e-k*x’, where b and k can be whatever seems to make sense for you. With that specific spin, the sigmoid is good for simulating anything that tends towards saturation over time. Depending on the parameters in denominator, the shape of the corresponding curve will change. Usually, ‘b’ works well when taken as a fraction of the nominator (the saturation level), and the ‘k’ seems to be behaving meaningfully when comprised between 0 and 1.

I return to the algorithm. Now, as the network has generated a set of predicted y^-s, it is time to compare them to the actual y-s, and to evaluate how much is there to learn yet. We can use any measure of error, still, most frequently, them algorithms go after the simplest one, namely the Mean Square Error MSE = [(y1 – y^1)2 + (y2 – y^2)2 + … + (yn – y^n)2]0,5. Yes, it is Euclidean distance between the set of actual y-s and that of predicted y^-s. Yes, it is also the standard deviation of predicted y^-s from the actual distribution of empirical y-s.

In this precise algorithm, the author goes down another avenue: he takes the actual differences between observed y-s and predicted y^-s, and then multiplies it by the sigmoid derivative of predicted y^-s. Then he takes the transpose of a uni-dimensional matrix of those (y – y^)*(y^)’ with (y^)’ standing for derivative. It goes like:

    def backprop(self):

        # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1

        d_weights2 = np.dot(self.layer1.T, (2*(self.y – self.output) * sigmoid_derivative(self.output)))

        d_weights1 = np.dot(self.input.T,  (np.dot(2*(self.y – self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)))

        # update the weights with the derivative (slope) of the loss function

        self.weights1 += d_weights1

        self.weights2 += d_weights2

    def sigmoid(x):

    return 1.0/(1+ np.exp(-x))

    def sigmoid_derivative(x):

     return x * (1.0 – x)

I am still trying to wrap my mind around the reasons for taking this specific approach to the backpropagation of errors. The derivative of a sigmoid y=1/(1+e-x) is y’ =  [1/(1+e-x)]*{1 – [1/(1+e-x)]} and, as any derivative, it measures the slope of change in y. When I do (y1 – y^1)*(y^1)’ + (y2 – y^2)*(y^2)’ + … + (yn – y^n)*(y^n)’ it is as if I were taking some kind of weighted average. That weighted average can be understood in two alternative ways. Either it is standard deviation of y^ from y, weighted with the local slopes, or it is a general slope weighted with local deviations. Now I take the transpose of a matrix like {(y1 – y^1)*(y^1)’ ; (y2 – y^2)*(y^2)’ ; … (yn – y^n)*(y^n)’}, it is a bit as if I made a matrix of inverted terms, i.e. 1/[(yn – y^n)*(y^n)’]. Now, I make a ‘.dot’ product of those inverted terms, so I multiply them by each other. Then, I feed the ‘.dot’ product into the neural network with the ‘+=’ operator. The latter means that in the next round of calculations, the network can do whatever it wants with those terms. Hmmweeellyyeess, makes some sense. I don’t know what exact sense is that, but it has some mathematical charm.

Now, I try to apply the same logic to the data I am working with in my research. Just to give you an idea, I show some data for just one country: Australia. Why Australia? Honestly, I don’t see why it shouldn’t be. Quite a respectable place. Anyway, here is that table. GDP per unit of energy consumed can be considered as the target output variable y, and the rest are those x-s.

Table 1 – Selected data regarding Australia

Year GDP per unit of energy use (constant 2011 PPP $ per kg of oil equivalent) Share of aggregate amortization in the GDP Supply of broad money, % of GDP Energy use (tons of oil equivalent per capita) Urban population as % of total population GDP per capita, ‘000 USD
  y X1 X2 X3 X4 X5
1990 5,662020744 14,46 54,146 5,062 85,4 26,768
1991 5,719765048 14,806 53,369 4,928 85,4 26,496
1992 5,639817305 14,865 56,208 4,959 85,566 27,234
1993 5,597913126 15,277 56,61 5,148 85,748 28,082
1994 5,824685357 15,62 59,227 5,09 85,928 29,295
1995 5,929177604 15,895 60,519 5,129 86,106 30,489
1996 5,780817973 15,431 62,734 5,394 86,283 31,566
1997 5,860645225 15,259 63,981 5,47 86,504 32,709
1998 5,973528571 15,352 65,591 5,554 86,727 33,789
1999 6,139349354 15,086 69,539 5,61 86,947 35,139
2000 6,268129418 14,5 67,72 5,644 87,165 35,35
2001 6,531818805 14,041 70,382 5,447 87,378 36,297
2002 6,563073754 13,609 70,518 5,57 87,541 37,047
2003 6,677186947 13,398 74,818 5,569 87,695 38,302
2004 6,82834791 13,582 77,495 5,598 87,849 39,134
2005 6,99630318 13,737 78,556 5,564 88 39,914
2006 6,908872246 14,116 83,538 5,709 88,15 41,032
2007 6,932137612 14,025 90,679 5,868 88,298 42,022
2008 6,929395465 13,449 97,866 5,965 88,445 42,222
2009 7,039061961 13,698 94,542 5,863 88,59 41,616
2010 7,157467568 12,647 101,042 5,649 88,733 43,155
2011 7,291989544 12,489 100,349 5,638 88,875 43,716
2012 7,671605162 13,071 101,852 5,559 89,015 43,151
2013 7,891026044 13,455 106,347 5,586 89,153 43,238
2014 8,172929207 13,793 109,502 5,485 89,289 43,071

In his article, James Loy reports the cumulative error over 1500 iterations of training, with just four series of x-s, made of four observations. I do something else. I am interested in how the network works, step by step. I do step-by-step calculations with data from that table, following that algorithm I have just discussed. I do it in Excel, and I observe the way that the network behaves. I can see that the hidden layer is really hidden, to the extent that it does not produce much in terms of meaningful information. What really spins is the output layer, thus, in fact, the connection between the hidden layer and the output. In the hidden layer, all the predicted sigmoid y^ are equal to 1, and their derivatives are automatically 0. Still, in the output layer, when the second random distribution of weights overlaps with the first one from the hidden layer. Then, for some years, those output sigmoids demonstrate tiny differences from 1, and their derivatives become very small positive numbers. As a result, tiny, local (yi – y^i)*(y^i)’ expressions are being generated in the output layer, and they modify the initial weights in the next round of training.

I observe the cumulative error (loss) in the first four iterations. In the first one it is 0,003138796, the second round brings 0,000100228, the third round displays 0,0000143, and the fourth one 0,005997739. Looks like an initial reduction of cumulative error, by one order of magnitude at each iteration, and then, in the fourth round, it jumps up to the highest cumulative error of the four. I extend the number to those hand-driven iterations from four to six, and I keep feeding the network with random weights, again and again. A pattern emerges. The cumulative error oscillates. Sometimes the network drives it down, sometimes it swings it up.

F**k! Pardon my French, but just six iterations of that algorithm show me that the thing is really intelligent. It generates an error, it drives it down to a lower value, and then, as if it was somehow dysfunctional to jump to conclusions that quickly, it generates a greater error in consecutive steps, as if it was considering more alternative options. I know that data scientists, should they read this, can slap their thighs at that elderly uncle (i.e. me), fascinated with how a neural network behaves. Still, for me, it is science. I take my data, I feed it into a machine that I see for the first time in my life, and I observe intelligent behaviour in something written on less than one page. It experiments with weights attributed to the stimuli I feed into it, and it evaluates its own error.

Now, I understand why that scientist from MIT, Lex Fridman, says that building artificial intelligence brings insights into how the human brain works.

I am consistently delivering good, almost new science to my readers, and love doing it, and I am working on crowdfunding this activity of mine. As we talk business plans, I remind you that you can download, from the library of my blog, the business plan I prepared for my semi-scientific project Befund  (and you can access the French version as well). You can also get a free e-copy of my book ‘Capitalism and Political Power’ You can support my research by donating directly, any amount you consider appropriate, to my PayPal account. You can also consider going to my Patreon page and become my patron. If you decide so, I will be grateful for suggesting me two things that Patreon suggests me to suggest you. Firstly, what kind of reward would you expect in exchange of supporting me? Secondly, what kind of phases would you like to see in the development of my research, and of the corresponding educational tools?