Over the last few updates, I have been indulging in the mathematical logic of Gaussian process, eating it with the spoon of mean-reversion. My so-far experience with using the logic of Gaussian process is that of my personal strategy as regards investment in the stock market, and especially as regards those short, periodical episodes of reshuffling in my investment portfolio, when I am exposed to, and I frequently yield to the gambling-like temptation of short trade (see Acceptably dumb proof. The method of mean-reversion , Fast + slower = compound rhythm, the rhythm of life, and We really don’t see small change ). Gambling-like is the key concept here. I engage into quick trade, and I feel that special flow, peculiar to gambling behaviour, and yet I want that flow to weave around a rational strategy, very much in the spirit of Abraham de Moivre’s ‘The doctrine of chances: or, A method of calculating the probabilities of events in play’, published in 1756. A bit of gambling, yes, but informed gambling.
I am trying to understand why a neural network based on mean-reversed prices as input consistently underestimates the real price, and why the whole method of mean-reversion fails with super-stable prices, such as those of cobalt or uranium (see We really don’t see small change).
I like understanding things. I like understanding the deep logic of the things I do and the methods I use. Here comes the object of my deep intellectual dive, the normal distribution. In the two pictures below, you can see the initial outline of the problem.
How does a function, namely that of normal distribution, assist my process of decision making? Of course, the first-order answer is simple: ‘it gives you numbers, bro’, and when you see those numbers you essentially know what to do’. Good, great, but I want to understand HOW EXACTLY those numbers, thus the function I use, match with my thinking and my action.
Good. I have a function, i.e. that of normal distribution, and for some reason that function works. It works geometrically. The whole mathematical expression serves to create a fraction. If you look carefully at the equation, you will understand that with given mean value μ and standard deviation σ, there is no way this function can go above 1. It is always a fraction. A fraction can be seen from different angles. Firstly, it is a portion of something, like a / b, where a < b. There is a bigger something, the denominator of the fraction, σ[(2π)0,5] = σ* 2,506628275. (elevation to power 0,5 replaces the sign of square root, which I cannot reproduce exactly from the keyboard, as a font). Secondly, as we talk about denominators, a fraction is a change in units of measurement. Instead of measuring reality in units of 1 – the smallest prime number – we measure reality in units of whatever we put in the denominator of the fraction. Thirdly, a fraction is a proportion between two sides of a rectangle, namely the proportion between the shorter side and the longer side.
Good, so what this function of normal distribution represents is a portion cut of a bigger something equal to σ[(2π)0,5], and that something is my unit of measurement, and, in the same time, it is the longer side of a rectangle. The expression σ[(2π)0,5] is something like one dimension of my world, whilst the whole equation of normal distribution, i.e. the value of that function, makes the other dimension. Is the Gaussian world a rectangular world? I need to know. I start talking to dead people. Usually helps. This time, my interlocutor is Karl Friedrich Gauss, in his General Investigations of Curved Surfaces, presented to the Royal Society, October 8th, 1827.
What many people ignore today is that what we call a Gaussian curve is the outcome of a mathematical problem, which, initially, had virtually nothing to do with probability. What Karl Friedrich Gauss (almost) solved was the problem of geodetic measurements, i.e. the distinction between the bird’s flight distance, and the actual length of the same distance on the rugged and uneven surface of the Earth. I know, when we go through mountains, it is sometimes uphill, sometimes downhill, and, on average, it is flat. Still, when you have to build a railroad through the same mountains, the actual length (spell: cost) of rails to put on the ground is much greater than what would be needed for building the same railroad in the plain. That’s the type of puzzle that Karl Friedrich was after.
Someone could say there is no puzzle. You want to know how long a rail do you need to go over a mountain, you send surveyors and they measure it. Splendid. Yet, civil engineering involves some kind of interference with the landscape. I can come up with the idea of putting my railroad alongside like the half-height of the mountain (instead of going right over its top), or maybe we could sort of shave off the top, couldn’t we, civilised people whom we are? Yes, those ideas are all valid, and I can have a lot of them. Sending surveyors each time I come up with a new concept can become terribly time- and money-consuming. What I could do with is a method of approximating each of those alternative distances on a curved surface, a method which finds good compromise between exactitude and simplicity.
Gauss assumed that when we convert the observation of anything curved – rugged land, or the orbit of a planet – into linear equations, we lose information. The challenge is to lose as little an amount thereof as possible. And here the story starts. Below, you will find a short quote from Gauss: the first paragraph of the introduction.
Investigations, in which the directions of various straight lines in space are to be considered, attain a high degree of clearness and simplicity if we employ, as an auxiliary, a sphere of unit radius described about an arbitrary centre, and suppose the different points of the sphere to represent the directions of straight lines parallel to the radii ending at these points. As the position of every point in space is determined by three coordinates, that is to say, the distances of the point from three mutually perpendicular fixed planes, it is necessary to consider, first of all, the directions of the axes perpendicular to these planes. The points on the sphere, which represent these directions, we shall denote by (1), (2), (3). The distance of any one of these points from either of the other two will be a quadrant; and we shall suppose that the directions of the axes are those in which the corresponding coordinates increase.’
Before I go further, a disclaimer is due. What follows is my own development on Karl Friedrich Gauss’s ideas, not an exact summary on his thoughts. If you want to go to the source, go to the source, i.e. to Gauss’s original writings.
In this introductory paragraph, reality is a sphere. Question: what geometrical shape does my perception of reality have? Do I perceive reality as a flat surface, as a sphere (as it is the case with Karl Friedrich Gauss), or maybe is it a cone, or a cube? How can I know what is the geometrical shape of my perception? Good. I feel my synapses firing a bit faster. There is nothing like an apparently absurd, mindf**king question to kick my brain into higher gear. If I want to know what shape of reality I am perceiving, it is essentially about distance.
I approach the thing scientifically, and I start by positing hypotheses. My perceived reality is just a point, i.e. everything could be happening together, without any perceived dimension to it. Sort of a super small and stationary life. I could stretch into a segment, and thus giving my existence at least one dimension to move along, and yet within some limits. If I allow the unknown and the unpredictable into my reality, I can perceive it in the form of a continuous, endless, straight line. Sometimes, my existence can be like a bundle of separate paths, each endowed with its own indefiniteness and its own expanse: this is reality made of a few straight lines in front of me, crossing or parallel to each other. Of course, I can stop messing around with discontinuities and I can generalise those few straight lines into a continuous plane. This could make me ambitious, and I could I come to the conclusion that flat is boring. Then I bend the plane into a sphere, and, finally things get really interesting and I assume that what I initially thought is a sphere is actually a space, i.e. a Russian doll made of a lot of spheres with different radiuses, packed one into the other.
I am pretty sure that anything else can be made out of those seven cases. If, for example, my perceived reality is a tetrahedron (i.e. any of the Egyptian pyramids after having taken flight, as any spaceship should, from time to time; just kidding), it is a reality made of semi-planes delimited by segments, thus the offspring of a really tumultuous relationship between a segment and a plane etc.
Let’s take any two points in my universe. Why two and not just one? ‘Cause it’s more fun, in the first place, and then, because of an old, almost forgotten technique called triangulation. I did it in the boy scout times, long before Internet and commercial use of Global Positioning System. You are in the middle of nowhere, and you have just a very faint idea of where exactly that nowhere is, and yet you have a map of it. On the map of nowhere, you find points which you are sort of spotting in the vicinity. That mountain on your 11:00 o’clock looks almost exactly like the mountain (i.e. the dense congregation of concentric contour lines) on the map. That radio tower on your 01:00 o’clock looks like the one marked on the map etc. Having just two points, i.e. the mountain and the radio tower, you can already find your position. You need a flat surface to put your map on, a compass (or elementary orientation by the position of the sun), a pencil and a ruler (or anything with a straight, smooth, hard edge). You position your map conformingly to the geographical directions, i.e. the top edge of the map should be perpendicular to the East-West axis (or, in other words, the top edge of the map should be facing North). You position the ruler on the map so as it marks an imaginary line from the mountain in the real landscape to the mountain on the map. You draw that straight line with the pencil. I do the same for the radio tower, i.e. I draw, on the map, a line connecting the real radio tower I can see to the radio tower on the map. Those lines cross on the map, and the crossing point is my most likely position.
Most likely is different from exact. By my own experience of having applied triangulation in real outdoors (back in the day, before Google Maps, and almost right after Gutenberg printed his first Bible), I know that triangulating with two points is sort of tricky. If my map is really precise (low scale, like military grade), and if it is my lucky day, two points yield a reliable positioning. Still, what used to happen more frequently, were doubtful situations. Is the mountain I can see on the horizon the mountain I think it is on the map? Sometimes it is, sometimes not quite. The more points I triangulate my position on, the closer I come to my exact location. If I have like 5 points or more, triangulating on them can even compensate slight inexactitude in the North-positioning of my map.
The partial moral of the fairy tale is that representing my reality as a sphere around me comes with some advantages: I can find my place in that reality (the landscape) by using just an imperfect representation thereof (the map), and some thinking (the pencil, the ruler, and the compass). I perceive my reality as a sphere, and I assume, following the intuitions of William James, expressed in his ‘Essays in Radical Empiricism’ that “there is only one primal stuff or material in the world, a stuff of which everything is composed, and if we call that stuff ‘pure experience,’ then knowing can easily be explained as a particular sort of relation towards one another into which portions of pure experience may enter. The relation itself is a part of pure experience; one of its ‘terms’ becomes the subject or bearer of the knowledge, the knower,[…] the other becomes the object known.” (Excerpt From: William James. “Essays in Radical Empiricism”. Apple Books).
Good. I’m lost. I can have two alternative shapes of my perceptual world: it can be a flat rectangle, or a sphere, and I keep in mind that both shapes are essentially my representations, i.e. my relations with the primal stuff of what’s really going on. The rectangle serves me to measure the likelihood of something happening, and the unit of likelihood is σ[(2π)0,5]. The sphere, on the other hand, has an interesting property: being in the centre of the sphere is radically different from being anywhere else. When I am in the centre, all points on the sphere are equidistant from me. Whatever happens is always at the same distance from my position: everything is equiprobable. On the other hand, when my current position is somewhere else than the centre of the sphere, points on the sphere are at different distances from me.
Now, things become a bit complicated geometrically, yet they remain logical. Imagine that your world is essentially spherical, and that you have two complementary, perceptual representations thereof, thus two types of maps, and they are both spherical as well. One of those maps locates you in its centre: it is a map of all the phenomena which you perceive as equidistant from you, thus equiprobable as for their possible occurrence. C’mon, you know, we all have that thing: anything can happen, and we don’t even bother which exact thing happens in the first place. This is a state of mind which can be a bit disquieting – it is essentially chaos acknowledged – yet, once you get the hang of it, it becomes interesting. The second spherical map locates you away from its centre, and automatically makes real phenomena different in their distance from you, i.e. in their likelihood of happening. That second map is more structured than the first one. Whilst the first is chaos, the second is order.
The next step is to assume that I can have many imperfectly overlapping chaoses in an otherwise ordered reality. I can squeeze, into an overarching, ordered representation of reality, many local, chaotic representations thereof. Then, I can just slice through the big and ordered representation of reality, following one of its secant planes. I can obtain something that I try to represent graphically in the picture below. Each point under the curve of normal distribution can correspond to the centre of a local sphere, with points on that sphere being equidistant from the centre. This is a local chaos. I can fit indefinitely many local chaoses of different size under the curve of normal distribution. The sphere in the middle, the one that touches the very belly of the Gaussian curve, roughly corresponds to what is called ‘standard normal distribution’, with mean μ = 0, and standard deviation σ =1. This is my central chaos, if you want, and it can have indefinitely many siblings, i.e. other local chaoses, located further towards the tails of the Gaussian curve.
An interesting proportion emerges between the sphere in the middle (my central chaos), and all the other spheres I can squeeze under the curve of normal distribution. That central chaos groups all the phenomena, which are one standard deviation away from me; remember: σ =1. All the points on the curve correspond to indefinitely many intersections between indefinitely many smaller spheres (smaller local chaoses), and the likelihood of each of those intersections happening is always a fraction of σ[(2π)0,5] = σ* 2,506628275. The normal curve, with its inherent proportions, represents the combination of all the possible local chaoses in my complex representation of reality.
Good, so when I use the logic of mean-reversion to study stock prices and elaborating a strategy of investment, thus when I denominate the differences between those prices and their moving averages in units of standard deviation, it is as if I assumed that standard deviation makes σ =1. In other words, I am in the sphere of central chaos, and I discriminate stock prices into three categories, depending on the mean-reversed price. Those in the interval -1 ≤ mean-reversed price ≤ 1 are in my central chaos, which is essentially the ‘hold stock’ chaos. Those, which bear a mean-reversed price < -1, are in the peripheral chaos of the ‘buy’ strategy. Conversely, those with mean-reversed price > 1 are in another peripheral chaos, that of ‘sell’ strategy.
Now, I am trying to understand why a neural network based on mean-reversed prices as input consistently underestimates the real price, and why the whole method of mean-reversion fails with super-stable prices, such as those of cobalt or uranium (see We really don’t see small change). When prices are super-stable, thus when the moving standard deviation is σ = 0, mean-reversion, with its denomination in standard deviations, yields the ‘Division by zero!’ error, which is the mathematical equivalent of ‘WTF?’. When σ = 0, my central chaos (the central sphere under the curve) shrinks a point, devoid of any radius. Interesting. Things that change below the level of my perception deprive me of my central sphere of chaos. I am left just with the possible outliers (peripheral chaoses) without a ruler to measure them.
As regards the estimated output of my neural network (I mean, not the one in my head, the one I programmed) being consistently below real prices, I understand it as a proclivity of said network to overestimate the relative importance of peripheral chaoses in the [x < -1] [buy] zone, and, on the other hand, to underestimate peripheral chaoses existing in the [x > 1] [sell] zone. My neural network is sort of myopic to peripheral chaoses located far above (or to the right of, if you prefer) the center of my central chaos. If, as I deeply believe, the logic of mean-reversion represents an important cognitive structure in my mind, said mind tends to sort of leave one gate unguarded. In the case of price estimation, it is the gate of ‘sell’ opportunities, which, in turn, leads me to buy and hold whatever I invest in, rather than exchanging it back into money (which is the exact economic content of what we call ‘selling’).
Interesting. When I use the normal distribution to study stock prices, one tail of the distribution – the one with abnormally high values – is sort of neglected to the benefit of the other tail, that with low values. It looks like the normal distribution is not really normal, but biased.