My editorial on You Tube
I keep working on the application of neural networks as simulators of collective intelligence. The particular field of research I am diving into is the sector of energy, its shift towards renewable energies, and the financial scheme I invented some time ago, which I called EneFin. As for that last one, you can consult « The essential business concept seems to hold », in order to grasp the outline.
I continue developing the line of research I described in my last update in French: « De la misère, quoi ». There are observable differences in the prices of energy according to the size of the buyer. In many countries – practically in all the countries of Europe – there are two, distinct price brackets. One, which I further designated as P_{B}, is reserved to contracts with big consumers of energy (factories, office buildings etc.) and it is clearly lower. Another one, further called P_{A}, is applied to small buyers, mainly households and really small businesses.
As an economist, I have that intuitive thought in the presence of price forks: that differential in prices is some kind of value. If it is value, why not giving it some financial spin? I came up with the idea of the EneFin contract. People buy energy from a local supplier, in the amount Q, who sources it from renewables (water, wind etc.), and they pay the price P_{A}, thus generating a financial flow equal to Q*P_{A}. That flow buys two things: energy priced at P_{B}, and participatory titles in the capital of their supplier, for the differential Q*(P_{A} – P_{B}). I imagine some kind of crowdfunding platform, which could channel the amount of capital K = Q*(P_{A} – P_{B}).
That K remains in some sort of fluid relationship to I, or capital invested in the productive capacity of energy suppliers. Fluid relationship means that each of those capital balances can date other capital balances, no hard feelings held. As we talk (OK, I talk) about prices of energy and capital invested in capacity, it is worth referring to LCOE, or Levelized Cost Of Electricity. The LCOE is essentially the marginal cost of energy, and a no-go-below limit for energy prices.
I want to simulate the possible process of introducing that general financial concept, namely K = Q*(P_{A} – P_{B}), into the market of energy, in order to promote the development of diversified networks, made of local suppliers in renewable energy.
Here comes my slightly obsessive methodological idea: use artificial intelligence in order to simulate the process. In classical economic method, I make a model, I take empirical data, I regress some of it on another some of it, and I come up with coefficients of regression, and they tell me how the thing should work if we were living in a perfect world. Artificial intelligence opens a different perspective. I can assume that my model is a logical structure, which keeps experimenting with itself and we don’t the hell know where exactly that experimentation leads. I want to use neural networks in order to represent the exact way that social structures can possibly experiment with that K = Q*(P_{A} – P_{B}) thing. Instead of optimizing, I want to see that way that possible optimization can occur.
I have that simple neural network, which I already referred to in « The point of doing manually what the loop is supposed to do » and which is basically quite dumb, as it does not do any abstraction. Still, it nicely experiments with logical structures. I am sketching its logical structure in the picture below. I distinguish four layers of neurons: input, hidden 1, hidden 2, and output. When I say ‘layers’, it is a bit of grand language. For the moment, I am working with one single neuron in each layer. It is more of a synaptic chain.
Anyway, the input neuron feeds data into the chain. In the first round of experimentation, it feeds the source data in. In consecutive rounds of learning through experimentation, that first neuron assesses and feeds back local errors, measured as discrepancies between the output of the output neuron, and the expected values of output variables. The input neuron is like the first step in a chain of perception, in a nervous system: it receives and notices the raw external information.
The hidden layers – or the hidden neurons in the chain – modify the input data. The first hidden neuron generates quasi-random weights, which the second hidden neuron attributes to the input variables. Just as in a nervous system, the input stimuli are assessed as for their relative importance. In the original algorithm of perceptron, which I used to design this network, those two functions, i.e. generating the random weights and attributing them to input variables, were fused in one equation. Still, my fundamental intent is to use neural networks to simulate collective intelligence, and intuitively guess those two functions are somehow distinct. Pondering the importance of things is one action and using that ponderation for practical purposes is another. It is like scientist debating about the way to run a policy, and the government having the actual thing done. These are two separate paths of action.
Whatever. What the second hidden neuron produces is a compound piece of information: the summation of input variables multiplied by random weights. The output neuron transforms this compound data through a neural function. I prepared two versions of this network, with two distinct neural functions: the sigmoid, and the hyperbolic tangent. As I found out, the way they work is very different, just as the results they produce. Once the output neuron generates the transformed data – the neural output – the input neuron measures the discrepancy between the original, expected values of output variables, and the values generated by the output neuron. The exact way of computing that discrepancy is made of two operations: calculating the local derivative of the neural function, and multiplying that derivative by the residual difference ‘original expected output value minus output value generated by the output neuron’. The so calculated discrepancy is considered as a local error, and is being fed back into the input neuron as an addition to the value of each input variable.
Before I go into describing the application I made of that perceptron, as regards my idea for financial scheme, I want to delve into the mechanism of learning triggered through repeated looping of that logical structure. The input neuron measures the arithmetical difference between the output of the network in the preceding round of experimentation, and that difference is being multiplied by the local derivative of said output. Derivative functions, in their deepest, Newtonian sense, are magnitudes of change in something else, i.e. in their base function. In the Newtonian perspective, everything that happens can be seen either as change (derivative) in something else, or as an integral (an aggregate that changes its shape) of still something else. When I multiply the local deviation from expected values by the local derivative of the estimated value, I assume this deviation is as important as the local magnitude of change in its estimation. The faster things happen, the more important they are, so do say. My perceptron learns by assessing the magnitude of local changes it induces in its own estimations of reality.
I took that general logical structure of the perceptron, and I applied it to my core problem, i.e. the possible adoption of the new financial scheme to the market of energy. Here comes sort of an originality in my approach. The basic way of using neural networks is to give them a substantial set of real data as learning material, make them learn on that data, and then make them optimize a hypothetical set of data. Here you have those 20 old cars, take them into pieces and try to put them back together, observe all the anomalies you have thus created, and then make me a new car on the grounds of that learning. I adopted a different approach. My focus is to study the process of learning in itself. I took just one set of actual input values, exogenous to my perceptron, something like an initial situation. I ran 5000 rounds of learning in the perceptron, on the basis of that initial set of values, and I observed how is learning taking place.
My initial set of data is made of two tensors: input T_{I} and output T_{O}.
The thing I am the most focused on is the relative abundance of energy supplied from renewable sources. I express the ‘abundance’ part mathematically as the coefficient of energy consumed per capita, or Q/N. The relative bend towards renewables, or towards the non-renewables is apprehended as the distinction between renewable energy Q_{R}/N consumed per capita, and the non-renewable one, the Q_{NR}/N, possibly consumed by some other capita. Hence, my output tensor is T_{O} = {Q_{R}/N; Q_{NR}/N}.
I hypothesise that T_{O} is being generated by input made of prices, costs, and capital outlays. I split my price fork P_{A} – P_{B} (price for the big ones minus price for the small ones) into renewables and non-renewables, namely into: P_{A;R}, P_{A;NR}, P_{B;R}, and P_{B;NR}. I mirror the distinction in prices with that in the cost of energy, and so I call LCOE_{R} and LCOE_{NR}. I want to create a financial scheme that generates a crowdfunded stream of capital K, to finance new productive capacities, and I want it to finance renewable energies, and I call K_{R}. Still, some other people, like my compatriots in Poland, might be so attached to fossils they might be willing to crowdfund new installations based on non-renewables. Thus, I need to take into account a K_{NR} in the game. When I say capital, and I say LCOE, I sort of feel compelled to say aggregate investment in productive capacity, in renewables, and in non-renewables, and I call it, respectively, I_{R} and I_{NR}. All in all, my input tensor spells T_{I} = {LCOE_{R}, LCOE_{NR}, K_{R}, K_{NR}, I_{R}, I_{NR}, P_{A;R}, P_{A;NR}, P_{B;R}, P_{B;NR}}.
The next step is scale and measurement. The neural functions I use in my perceptron like having their input standardized. Their tastes in standardization differ a little. The sigmoid likes it nicely spread between 0 and 1, whilst the hyperbolic tangent, the more reckless of the two, tolerates (-1) ≥ x ≥ 1. I chose to standardize the input data between 0 and 1, so as to make it fit into both. My initial thought was to aim for an energy market with great abundance of renewable energy, and a relatively declining supply of non-renewables. I generally trust my intuition, only I like to leverage it with a bit of chaos, every now and then, and so I ran some pseudo-random strings of values and I chose an output tensor made of T_{O} = {Q_{R}/N = 0,95; Q_{NR}/N = 0,48}.
That state of output is supposed to be somehow logically connected to the state of input. I imagined a market, where the relative abundance in the consumption of, respectively, renewable energies and non-renewable ones is mostly driven by growing demand for the former, and a declining demand for the latter. Thus, I imagined relatively high a small-user price for renewable energy and a large fork between that P_{A;R} and the P_{B;R}. As for non-renewables, the fork in prices is more restrained (than in the market of renewables), and its top value is relatively lower. The non-renewable power installations are almost fed up with investment I_{NR}, whilst the renewables could still do with more capital I_{R} in productive assets. The LCOE_{NR} of non-renewables is relatively high, although not very: yes, you need to pay for the fuel itself, but you have economies of scale. As for the LCOE_{R} for renewables, it is pretty low, which actually reflects the present situation in the market.
The last part of my input tensor regards the crowdfunded capital K. I assumed two different, initial situations. Firstly, it is virtually no crowdfunding, thus a very low K. Secondly, some crowdfunding is already alive and kicking, and it is sort of slightly above the half of what people expect in the industry.
Once again, I applied those qualitative assumptions to a set of pseudo-random values between 0 and 1. Here comes the result, in the table below.
Table 1 – The initial values for learning in the perceptron
Tensor | Variable | The Market with virtually no crowdfunding | The Market with significant crowdfunding | |
Input T_{I} | LCOE_{R} | 0,26 | 0,26 | |
LCOE_{NR} | 0,48 | 0,48 | ||
K_{R} | 0,01 | <= !! => | 0,56 | |
K_{NR} | 0,01 | 0,52 | ||
I_{R} | 0,46 | 0,46 | ||
I_{NR} | 0,99 | 0,99 | ||
P_{A;R} | 0,71 | 0,71 | ||
P_{A;NR} | 0,46 | 0,46 | ||
P_{B;R} | 0,20 | 0,20 | ||
P_{B;NR} | 0,37 | 0,37 | ||
Output T_{O} | Q_{R}/N | 0,95 | 0,95 | |
Q_{NR}/N | 0,48 | 0,48 |
The way the perceptron works means that it generates and feeds back local errors in each round of experimentation. Logically, over the 5000 rounds of experimentation, each input variable gathers those local errors, like a snowball rolling downhill. I take the values of input variables from the last, i.e. the 5000^{th} round: they have the initial values, from the table above, and, on the top of them, there is cumulative error from the 5000 experiments. How to standardize them, so as to make them comparable with the initial ones? I observe: all those final output values have the same cumulative error in them, across all the T_{I} input tensor. I choose a simple method for standardization. As the initial values were standardized over the interval between 0 and 1, I standardize the outcoming values over the interval 0 ≥ x ≥ (1 + cumulative error).
I observe the unfolding of cumulative error along the path of learning, made of 5000 steps. There is a peculiarity in each of the neural functions used: the sigmoid, and the hyperbolic tangent. The sigmoid learns in a slightly Hitchcockian way. Initially, local errors just rocket up. It is as if that sigmoid was initially yelling: ‘F******k! What a ride!’. Then, the value of errors drops very sharply, down to something akin to a vanishing tremor, and starts hovering lazily over some implicit asymptote. Hyperbolic tangent learns differently. It seems to do all it can to minimize local errors whenever it is possible. Obviously, it is not always possible. Every now and then, that hyperbolic tangent produces an explosively high value of local error, like a sudden earthquake, just to go back into forced calm right after. You can observe those two radically different ways of learning in the two graphs below.
Two ways of learning – the sigmoidal one and the hyper-tangential one – bring interestingly different results, just as differentiated are the results of learning depending on the initial assumptions as for crowdfunded capital K. Tables 2 – 5, further below, list the results I got. A bit of additional explanation will not hurt. For every version of learning, i.e. sigmoid vs hyperbolic tangent, and K = 0,01 vs K ≈ 0,5, I ran 5 instances of 5000 rounds of learning in my perceptron. This is the meaning of the word ‘Instance’ in those tables. One instance is like a tensor of learning: one happening of 5000 consecutive experiments. The values of output variables remain constant all the time: T_{O} = {Q_{R}/N = 0,95; Q_{NR}/N = 0,48}. The perceptron sweats in order to come up with some interesting combination of input variables, given this precise tensor of output.
Table 2 – Outcomes of learning with the sigmoid, no initial crowdfunding
The learnt values of input variables after 5000 rounds of learning | |||||
Learning with the sigmoid, no initial crowdfunding | |||||
Instance 1 | Instance 2 | Instance 3 | Instance 4 | Instance 5 | |
cumulative error | 2,11 | 2,11 | 2,09 | 2,12 | 2,16 |
LCOE_{R} | 0,7617 | 0,7614 | 0,7678 | 0,7599 | 0,7515 |
LCOE_{NR} | 0,8340 | 0,8337 | 0,8406 | 0,8321 | 0,8228 |
K_{R} | 0,6820 | 0,6817 | 0,6875 | 0,6804 | 0,6729 |
K_{NR} | 0,6820 | 0,6817 | 0,6875 | 0,6804 | 0,6729 |
I_{R} | 0,8266 | 0,8262 | 0,8332 | 0,8246 | 0,8155 |
I_{NR} | 0,9966 | 0,9962 | 1,0045 | 0,9943 | 0,9832 |
P_{A;R} | 0,9062 | 0,9058 | 0,9134 | 0,9041 | 0,8940 |
P_{A;NR} | 0,8266 | 0,8263 | 0,8332 | 0,8247 | 0,8155 |
P_{B;R} | 0,7443 | 0,7440 | 0,7502 | 0,7425 | 0,7343 |
P_{B;NR} | 0,7981 | 0,7977 | 0,8044 | 0,7962 | 0,7873 |
Table 3 – Outcomes of learning with the sigmoid, with substantial initial crowdfunding
The learnt values of input variables after 5000 rounds of learning | |||||
Learning with the sigmoid, substantial initial crowdfunding | |||||
Instance 1 | Instance 2 | Instance 3 | Instance 4 | Instance 5 | |
cumulative error | 1,98 | 2,01 | 2,07 | 2,03 | 1,96 |
LCOE_{R} | 0,7511 | 0,7536 | 0,7579 | 0,7554 | 0,7494 |
LCOE_{NR} | 0,8267 | 0,8284 | 0,8314 | 0,8296 | 0,8255 |
K_{R} | 0,8514 | 0,8529 | 0,8555 | 0,8540 | 0,8504 |
K_{NR} | 0,8380 | 0,8396 | 0,8424 | 0,8407 | 0,8369 |
I_{R} | 0,8189 | 0,8207 | 0,8238 | 0,8220 | 0,8177 |
I_{NR} | 0,9965 | 0,9965 | 0,9966 | 0,9965 | 0,9965 |
P_{A;R} | 0,9020 | 0,9030 | 0,9047 | 0,9037 | 0,9014 |
P_{A;NR} | 0,8189 | 0,8208 | 0,8239 | 0,8220 | 0,8177 |
P_{B;R} | 0,7329 | 0,7356 | 0,7402 | 0,7375 | 0,7311 |
P_{B;NR} | 0,7891 | 0,7913 | 0,7949 | 0,7927 | 0,7877 |
Table 4 – Outcomes of learning with the hyperbolic tangent, no initial crowdfunding
The learnt values of input variables after 5000 rounds of learning | |||||
Learning with the hyperbolic tangent, no initial crowdfunding | |||||
Instance 1 | Instance 2 | Instance 3 | Instance 4 | Instance 5 | |
cumulative error | 1,1 | 1,27 | 0,69 | 0,77 | 0,88 |
LCOE_{R} | 0,6470 | 0,6735 | 0,5599 | 0,5805 | 0,6062 |
LCOE_{NR} | 0,7541 | 0,7726 | 0,6934 | 0,7078 | 0,7257 |
K_{R} | 0,5290 | 0,5644 | 0,4127 | 0,4403 | 0,4746 |
K_{NR} | 0,5290 | 0,5644 | 0,4127 | 0,4403 | 0,4746 |
I_{R} | 0,7431 | 0,7624 | 0,6797 | 0,6947 | 0,7134 |
I_{NR} | 0,9950 | 0,9954 | 0,9938 | 0,9941 | 0,9944 |
P_{A;R} | 0,8611 | 0,8715 | 0,8267 | 0,8349 | 0,8450 |
P_{A;NR} | 0,7432 | 0,7625 | 0,6798 | 0,6948 | 0,7135 |
P_{B;R} | 0,6212 | 0,6497 | 0,5277 | 0,5499 | 0,5774 |
P_{B;NR} | 0,7009 | 0,7234 | 0,6271 | 0,6446 | 0,6663 |
Table 5 – Outcomes of learning with the hyperbolic tangent, substantial initial crowdfunding
The learnt values of input variables after 5000 rounds of learning | |||||
Learning with the hyperbolic tangent, substantial initial crowdfunding | |||||
Instance 1 | Instance 2 | Instance 3 | Instance 4 | Instance 5 | |
cumulative error | -0,33 | 0,2 | -0,06 | 0,98 | -0,25 |
LCOE_{R} | (0,1089) | 0,3800 | 0,2100 | 0,6245 | 0,0110 |
LCOE_{NR} | 0,2276 | 0,5681 | 0,4497 | 0,7384 | 0,3111 |
K_{R} | 0,3381 | 0,6299 | 0,5284 | 0,7758 | 0,4096 |
K_{NR} | 0,2780 | 0,5963 | 0,4856 | 0,7555 | 0,3560 |
I_{R} | 0,1930 | 0,5488 | 0,4251 | 0,7267 | 0,2802 |
I_{NR} | 0,9843 | 0,9912 | 0,9888 | 0,9947 | 0,9860 |
P_{A;R} | 0,5635 | 0,7559 | 0,6890 | 0,8522 | 0,6107 |
P_{A;NR} | 0,1933 | 0,5489 | 0,4252 | 0,7268 | 0,2804 |
P_{B;R} | (0,1899) | 0,3347 | 0,1522 | 0,5971 | (0,0613) |
P_{B;NR} | 0,0604 | 0,4747 | 0,3306 | 0,6818 | 0,1620 |
The cumulative error, the first numerical line in each table, is something like memory. It is a numerical expression of how much experience has the perceptron accumulated in the given instance of learning. Generally, the sigmoid neural function accumulates more memory, as compared to the hyper-tangential one. Interesting. The way of processing information affects the amount of experiential data stored in the process. If you use the links I gave earlier, you will see different logical structures in those two functions. The sigmoid generally smoothes out anything it receives as input. It puts the incoming, compound data in the negative exponent of the Euler’s constant e = 2,72, and then it puts the resulting value as part of the denominator of 1. The sigmoid is like a bumper: it absorbs shocks. The hyperbolic tangent is different. It sort of exposes small discrepancies in input. In human terms, the hyper-tangential function is more vigilant than the sigmoid. As it can be observed in this precise case, absorbing shocks leads to more accumulated experience than vigilantly reacting to observable change.
The difference in cumulative error, observable in the sigmoid-based perceptron vs that based on hyperbolic tangent is particularly sharp in the case of a market with substantial initial crowdfunding K. In 3 instances on 5, in that scenario, the hyper-tangential perceptron yields a negative cumulative error. It can be interpreted as the removal of some memory, implicitly contained in the initial values of input variables. When the initial K is assumed to be 0,01, the difference in accumulated memory, observable between the two neural functions, significantly shrinks. It looks as if K ≥ 0,5 was some kind of disturbance that the vigilant hyperbolic tangent attempts to eliminate. That impression of disturbance created by K ≥ 0,5 is even reinforced as I synthetically compare all the four sets of outcomes, i.e. tables 2 – 5. The case of learning with the hyperbolic tangent, and with substantial initial crowdfunding looks radically different from everything else. The discrepancy between alternative instances seems to be the greatest in this case, and the incidentally negative values in the input tensor suggest some kind of deep shakeoff. Negative prices and/or negative costs mean that someone external is paying for the ride, probably the taxpayers, in the form of some fiscal stimulation.
I am consistently delivering good, almost new science to my readers, and love doing it, and I am working on crowdfunding this activity of mine. As we talk business plans, I remind you that you can download, from the library of my blog, the business plan I prepared for my semi-scientific project Befund (and you can access the French version as well). You can also get a free e-copy of my book ‘Capitalism and Political Power’ You can support my research by donating directly, any amount you consider appropriate, to my PayPal account. You can also consider going to my Patreon page and become my patron. If you decide so, I will be grateful for suggesting me two things that Patreon suggests me to suggest you. Firstly, what kind of reward would you expect in exchange of supporting me? Secondly, what kind of phases would you like to see in the development of my research, and of the corresponding educational tools?
2 thoughts on “More vigilant than sigmoid”