An odd vector in a comfortably Apple world

Work pays. Writing about my work helps me learn new things. I am coining up, step by step, the logical structure of my book on collective intelligence. Those last days, I realized the point of using an artificial neural network as simulator of collective behaviour. There is a difference between studying the properties of a social structure, on the one hand, and simulating its collective behaviour, on the other hand. When I study the partial properties of something, I make samples and put them under a microscope. This is what most quantitative methods in social sciences do: they sample and zoom on. This is cool, don’t get me wrong. That method has made the body of science we have today, and, historically, this is the hell of a body of science. Yet, there is a difference between, for example, a study of clusters in a society, and a simulation of the way those clusters form. There is a difference between identifying auto-regressive cycles in the time series of a variable, and understanding how those cycles happen in real life, with respect to collective human behaviour (see ‘An overhead of individuals’). Autoregression translated into human behaviour means that what we do actually accomplish today is somehow derived from and functionally connected to the outcomes of our actions some time ago. Please, notice: not to the outcomes of the actions which immediately preceded the current one, but to the outcomes generated with a lag in the past. Go figure how we, humans, can pick a specific, lagged occurrence in the past and make it a factor in what we do today? Intriguing, isn’t it?

The sample-and-put-under-the-microscope method is essentially based on classical statistics, thus on the assumption that the mean expected value of a variable, or of a vector, is the expected state of the corresponding phenomenon. Here we enter the tricky, and yet interesting a realm of questions such as ‘What do you mean by expected state? Expected by whom?’. First and most of all, we have no idea what other people expect. We can, at best, nail down our own expectations to the point of making them intelligible to ourselves and communicable to other people, and we do our best to understand what other people say they expect. Yet, all hope is not lost. Whilst we can hardly have any clue as for what other people expect, we can learn how they learn. The process of learning is much more objectively observable than expectations.

Here comes the subtle and yet fundamental distinction between expectations and judgments. Both regard the same domain – the reality we live in – but they are different in nature. Judgment is functional. I make myself an opinion about reality because I need it to navigate through said reality. Judgment is an approximation of truth. Emotions play their role in my judgments, certainly, but they are easy to check. When my judgment is correct, i.e. when my emotions give it the functionally right shade, I make the right decisions and I am satisfied with the outcomes. When my judgment is too emotional, or emotional the wrong way, I simply screw it, at the end of the day, and I am among the first people to know it.

On the other hand, when I expect something, it is much more imbibed with emotions. I expect things which are highly satisfactory, or, conversely, which raise my apprehension, ranging from disgust to fear. Expectations are so emotional that we even have a coping mechanism of ex-post rationalization. Something clearly unexpected happens to us and we reduce our cognitive dissonance by persuading ourselves post factum that ‘I expected it to happen, really. I just wasn’t sure’.

I think there is fundamental difference between applying a given quantitative method of to the way that society works, on the one hand, and attributing the logic of this method to the collective behaviour of people in that society, on the other hand. I will try to make my point more explicit by taking on one single business case: Tesla (https://ir.tesla.com/ ). Why Tesla? For two reasons. I invested significant money of mine in their stock, for one, and when I have my money invested in something, I like updating my understanding as for how the thing works. Tesla seems to me something like a unique phenomenon, an industry in itself. This is a perfect Black Swan, in the lines of Nassim Nicholas Taleb’s ‘The black swan. The impact of the highly improbable’ (2010, Penguin Books, ISBN 9780812973815). Ten years ago, Elon Musk, the founder of Tesla, was considered as just a harmless freak. Two years ago, many people could see him on the verge of tears, publicly explaining to shareholders why Tesla kept losing cash. Still, if I invested in the stock of Tesla 4 years ago, today I would have like seven times the money. Tesla is an outlier which turned into a game changer. Today, they are one of the rare business entities who managed to increase their cash flow over the year 2020. No analyst would predict that. As a matter of fact, even I considered Tesla as an extremely risky investment. Risky means burdened with a lot of uncertainty, which, in turn, hinges on a lot of money engaged.

When a business thrives amidst a general crisis, just as Tesla has been thriving amidst the COVID-19 pandemic, I assume there is a special adaptive mechanism at work, and I want to understand that mechanism. My first, intuitive association of ideas goes to the classic book by Joseph Schumpeter, namely ‘Business Cycles’. Tesla is everything Schumpeter mentioned (almost 100 years ago!) as attributes of breakthrough innovation: new type of plant, new type of entrepreneur, processes rethought starting from first principles, complete outlier as compared to the industrial sector of origin.

What does Tesla have to do with collective intelligence? Saying that Tesla’s success is a pleasant surprise to its shareholders, and that it is essentially sheer luck, would be too easy and simplistic. At the end of September 2020, Tesla had $45,7 billion in total assets. Interestingly, only 0,44% is the so-called ‘Goodwill’, which is the financial chemtrail left after a big company acquires smaller ones and which has nothing to do with good intentions. According to my best knowledge, those assets of $45,7 billion have been accumulated mostly through the good, old-fashioned organic growth, i.e. the growth of the capital base in correlation with the growth of operations. That type of growth requires the concurrence of many factors: the development of a product market, paired with the development of a technological base, and all that associated with a stream of financial capital.    

This is more than luck or accident. The organic growth of Tesla has been concurring with a mounting tide of start-up businesses in the domain of electric vehicles. It coincides closely with significant acceleration in the launching of electric vehicles in the big, established companies of the automotive sector, such as VW Group, PSG or Renault. When a Black Swan deeply modifies an entire industry, it means that an outlier has provoked adaptive change in the social structure. That kept in mind, an interesting question surfaces in my mind: why is there only one Tesla business, as for now? Why isn’t there more business entities like them? Why this specific outlier remains an outlier? I know there are many start-ups in the industry of electric vehicles, but none of them even remotely approaches the kind and the size of business structure that Tesla displays. How does an apparently unique social phenomenon remain unique, whilst having proven to be a successful experiment?

I am intuitively comparing Tesla to its grandfather in uniqueness, namely to Apple Inc. (https://investor.apple.com/investor-relations/default.aspx ). Apple used to be that outlier which Tesla is today, and, over time, it has become sort of banalized, business-wise. How can I say that? Let’s have a look at the financials of both companies. Since 2015, Tesla has been growing like hell, in terms of assets and revenues. However, they started to make real money just recently, in 2020, amidst the pandemic. Their cash-flow is record-high for the nine months of 2020. Apple is the opposite case. If you look at their financials over the last few years, they seem to be shrinking assets-wise, and sort of floating at the same level in terms of revenues. Tesla makes cash mostly by tax write-offs, through amortization and stock-based compensations for their employees. Apple makes cash in the good, old-fashioned way, by generating net income after tax. At the bottom line of all cash-flows, Tesla accumulates cash in their balance sheet, whilst Apple apparently gives it away and seems preferring the accumulation of marketable financial securities. Tesla seems to be wild and up to something, Apple not really. Apple is respectable.    

Uniqueness manifests in two attributes: distance from the mean expected state of the social system, on the one hand, and probability of happening, on the other hand. Outliers display low probability and noticeable distance from the mean expected state of social stuff. Importance of outlier phenomena can be apprehended similarly to risk factors: probability times magnitude of change or difference. With low probability, outliers are important by their magnitude.

Mathematically, I can express the emergence of any new phenomenon in two ways: as the activation of a dormant phenomenon, or as recombination of the already active ones. I can go p(x; t0) = 0 and p(x; t1) > 0, where p(x; t) is the probability of the phenomenon x. It used to be zero before, it is greater than zero now, although not much greater, we are talking about outliers, noblesse oblige. That’s the activation of something dormant. I have a phenomenon nicely defined, as ‘x’, and I sort of know the contour of it, it just has not been happening recently at all. Suddenly, it starts happening. I remember having read a theory of innovation, many years ago, which stated that each blueprint of a technology sort of hides and conveys in itself a set of potential improvements and modifications, like a cloud of dormant changes attached to it and logically inferable from the blueprint itself.

When, on the other hand, new things emerge as the result of recombination in something already existing, I can express it as p(x) = p(y)a * p(z)b. Phenomena ‘y’ and ‘z’ are the big fat incumbents of the neighbourhood. When they do something specific together (yes, they do what you think they do), phenomenon ‘x’ comes into existence, and its probability of happening p(x) is a combination of powers ascribed to the respective probabilities. I can go fancier, and use one of them neural activation functions, such as the hyperbolic tangent. Many existing phenomena – sort of Z = {p(z1), p(z2),…, p(zn)} – combine in a more or less haphazard way (frequently, it is the best way of all ways), meaning that the vector of Z of their probabilities has a date with a structurally identical vector of random significances W = {s1, s2, …, sn}, 0 < si <1. They date and they produce a weighted sum h = ∑p(zi)*si, and that weighed sum gets sucked into the vortex of reality via tanh = [(e2h – 1)/(e2h + 1)]. Why via tanh? First of all, why not? Second of all, tanh is a structure in itself. It is essentially (e2 -1)/(e2 +1) = 6,389056099 / 8,389056099 = 0,761594156, which has the courtesy of accepting h in, and producing something new.

Progress in the development of artificial neural networks leads to the discovery of those peculiar structures – the activation functions – which have the capacity to suck in a number of individual phenomena with their respective probabilities, and produce something new, like a by-phenomenon. In the article available at https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/ I read about a newly discovered activation function called ‘Swish’. With that weighted sum h = ∑p(zi)*si, Swish = h/(1+e-h). We have a pre-existing structure 1/(1 + e) = 0,268941421, which, combined with h in a complex way (as a direct factor of multiplication and as exponent in the denominator), produces something surprisingly meaningful.

Writing those words, I have suddenly noticed a meaning of activation functions which I have been leaving aside, so far. Values produced by activation functions are aggregations of the input which we feed into the neural network. Yet, under a different angle, activation functions produce a single new probability, usually very close to 1, which can be understood as something new happening right now, almost for sure, and deriving its existence from many individual phenomena happening now as well. I need to wrap my mind around it. It is interesting.    

Now, I study the mathematical take on the magnitude of the outlier ‘x’, which makes its impact on everything around, and makes it into a Black Swan. I guess x has some properties. I mean not real estate, just attributes. It has a vector of attributes R = {r1, r2, …, rm}, and, if I want to present ‘x’ as an outlier in mathematical terms, those attributes should be the same for all the comparable phenomena in the same domain. That R = {r1, r2, …, rm} is a manifold, in which every observable phenomenon is mapped into m coordinates. If I take any two phenomena, like z and y, each has its vector of attributes, i.e. R(z) and R(y). Each such pair can estimate their mutual closeness by going Euclidean[R(z), R(y)] = ∑{[ri(z) – ri(y)]2}0,5 / m. We remember that m is the number of attributes in that universe.

Phenomena are used to be in a comfortably predictable Euclidean distance to each other, and, all of a sudden, x pops out, and shows a bloody big Euclidean distance from any other phenomenon. Its vector R(x) is odd. This is how a Tesla turns up, as an odd vector in a comfortably Apple world.