DIY algorithms of our own

I return to that interesting interface of science and business, which I touched upon in my before-last update, titled ‘Investment, national security, and psychiatry’ and which means that I return to discussing two research projects I start being involved in, one in the domain of national security, another one in psychiatry, both connected by the idea of using artificial neural networks as analytical tools. What I intend to do now is to pass in review some literature, just to get the hang of what is the state of science, those last days.

On the top of that, I have been asked by my colleagues to crash take the leadership of a big, multi-thread research project in management science. The multitude of threads has emerged as a circumstantial by-product of partly the disruption caused by the pandemic, and partly as a result of excessive partition in the funding of research. As regards the funding of research, Polish universities have sort of two financial streams. One consists of big projects, usually team-based, financed by specialized agencies, such as the National Science Centre (https://www.ncn.gov.pl/?language=en ) or the National Centre for Research and Development (https://www.gov.pl/web/ncbr-en ). Another one is based on relatively small grants, applied for by and granted to individual scientists by their respective universities, which, in turn, receive bulk subventions from the Ministry of Education and Science. Personally, I think that last category, such as it is being allocated and used now, is a bit of a relic. It is some sort of pocket money for the most urgent and current expenses, relatively small in scale and importance, such as the costs of publishing books and articles, the costs of attending conferences etc. This is a financial paradox: we save and allocate money long in advance, in order to have money for essentially incidental expenses – which come at the very end of the scientific pipeline – and we have to make long-term plans for it. It is a case of fundamental mismatch between the intrinsic properties of a cash flow, on the one hand, and the instruments used for managing that cash flow, on the other hand.

Good. This is introduction to detailed thinking. Once I have those semantic niceties checked out, I cut into the flesh of thinking, and the first piece I intend to cut out is the state of science as regards Territorial Defence Forces and their role amidst the COVID-19 pandemic. I found an interesting article by Tiutiunyk et al. (2018[1]). It is interesting because it gives a detailed methodology for assessing operational readiness in any military unit, territorial defence or other. That corresponds nicely to Hypothesis #2 which I outlined for that project in national security, namely: ‘the actual role played by the TDF during the pandemic was determined by the TDF’s actual capacity of reaction, i.e. speed and diligence in the mobilisation of human and material resources’. That article by Tiutiunyk et al. (2018) allows entering into details as regards that claim. 

Those details start unfolding from the assumption that operational readiness is there when the entity studied possesses the required quantity of efficient technical and human resources. The underlying mathematical concept is quite simple. I the given situation, adequate response requires using m units of resources at k% of capacity during time te. The social entity studied can muster n units of the same resources at l% of capacity during the same time te. The most basic expression of operational readiness is, therefore, a coefficient OR = (n*l)/(m*k). I am trying to find out what specific resources are the key to that readiness. Tiutiunyk et al. (2018) offer a few interesting insights in that respect. They start by noticing the otherwise known fact that resources used in crisis situations are not exactly the same we use in everyday course of life and business, and therefore we tend to hold them for a time longer than their effective lifecycle. We don’t amortize them properly because we don’t really control for their physical and moral depreciation. One of the core concepts in territorial defence is to counter that negative phenomenon, and to maintain, through comprehensive training and internal control, a required level of capacity.

As I continue going through literature, I come by an interesting study by I. Bet-El (2020), titled: ‘COVID-19 and the future of security and defence’, published by the European Leadership Network (https://www.europeanleadershipnetwork.org/wp-content/uploads/2020/05/Covid-security-defence-1.pdf ). Bet-El introduces an important distinction between threats and risks, and, contiguously, the distinction between security and defence: ‘A threat is a patent, clear danger, while risk is the probability of a latent danger becoming patent; evaluating that probability requires judgement. Within this framework, defence is to be seen as the defeat or deterrence of a patent threat, primarily by military, while security involves taking measures to prevent latent threats from becoming patent and if the measures fail, to do so in such a way that there is time and space to mount an effective defence’. This is deep. I do a lot of research in risk management, especially as I invest in the stock market. When we face a risk factor, our basic behavioural response is hedging or insurance. We hedge by diversifying our exposures to risk, and we insure by sharing the risk with other people. Healthcare systems are a good example of insurance. We have a flow of capital that fuels a manned infrastructure (hospitals, ambulances etc.), and that infrastructure allows each single sick human to share his or her risks with other people. Social distancing is the epidemic equivalent of hedging. When cutting completely or significantly throttling social interactions between households, we have each household being sort of separated from the epidemic risk in other households. When one node in a network is shielded from some of the risk occurring in other nodes, this is hedging.

The military is made for responding to threats rather than risks. Military action is a contingency plan, implemented when insurance and hedging have gone to hell. The pandemic has shown that we need more of such buffers, i.e. more social entities able to mobilise quickly into deterring directly an actual threat. Territorial Defence Forces seem to fit the bill.  Another piece of literature, from my own, Polish turf, by Gąsiorek & Marek (2020[2]), state straightforwardly that Territorial Defence Forces have proven to be a key actor during the COVID-19 pandemic precisely because they maintain a high degree of actual readiness in their crisis-oriented resources, as compared to other entities in the Polish public sector.

Good. I have a thread, from literature, for the project devoted to national security. The issue of operational readiness seems to be somehow in the centre, and it translates into the apparently fluent frontier between security and national defence. Speed of mobilisation in the available resources, as well as the actual reliability of those resources, once mobilized, look like the key to understanding the surprisingly significant role of Territorial Defence Forces during the COVID-19 pandemic. Looks like my initial hypothesis #2, claiming that the actual role played by the TDF during the pandemic was determined by the TDF’s actual capacity of reaction, i.e. speed and diligence in the mobilisation of human and material resources, is some sort of theoretical core to that whole body of research.

In our team, we plan and have a provisional green light to run interviews with the soldiers of Territorial Defence Forces. That basic notion of actually mobilizable resources can help narrowing down the methodology to apply in those interviews, by asking specific questions pertinent to that issue. Which specific resources proved to be the most valuable in the actual intervention of TDF in pandemic? Which resources – if any – proved to be 100% mobilizable on the spot? Which of those resources proved to be much harder to mobilise than it had been initially assumed? Can we rate and rank all the human and technical resources of TDF as for their capacity to be mobilised?

Good. I gently close the door of that room in my head, filled with Territorial Defence Forces and the pandemic. I make sure I can open it whenever I want, and I open the door to that other room, where psychiatry dwells. Me and those psychiatrists I am working with can study a sample of medical records as regards patients with psychosis. Verbal elocutions of those patients are an important part of that material, and I make two hypotheses along that tangent:

>> Hypothesis #1: the probability of occurrence in specific grammatical structures A, B, C, in the general grammatical structure of a patient’s elocutions, both written and spoken, is informative about the patient’s mental state, including the likelihood of psychosis and its specific form.

>> Hypothesis #2: the action of written self-reporting, e.g. via email, from the part of a psychotic patient, allows post-clinical treatment of psychosis, with results observable as transition from mental state A to mental state B.

I start listening to what smarter people than me have to say on the matter. I start with Worthington et al. (2019[3]), and I learn there is a clinical category: clinical high risk for psychosis (CHR-P), thus a set of subtler (than psychotic) ‘changes in belief, perception, and thought that appear to represent attenuated forms of delusions, hallucinations, and formal thought disorder’. I like going backwards upstream, and I immediately ask myself whether that line of logic can be reverted. If there is clinical high risk for psychosis, the occurrence of those same symptoms in reverse order, from severe to light, could be a path of healing, couldn’t it?

Anyway, according to Worthington et al. (2019), some 25% of people with diagnosed CHR-P transition into fully scaled psychosis. Once again, from the perspective of risk management, 25% of actual occurrence in a risk category is a lot. It means that CHR-P is pretty solid as risk assessment comes. I further learn that CHR-P, when represented as a collection of variables (a vector for friends with a mathematical edge), entails an internal distinction into predictors and converters. Predictors are the earliest possible observables, something like a subtle smell of possible s**t, swirling here and there in the ambient air. Converters are information that bring progressive confirmation to predictors.

That paper by Worthington et al. (2019) is a review of literature in itself, and allows me to compare different approaches to CHR-P. The most solid ones, in terms of accurately predicting the onset of full-clip psychosis, always incorporate two components: assessment of the patient’s social role, and analysis of verbalized thought. Good. Looks promising. I think the initial hypotheses should be expanded into claims about socialization.

I continue with another paper, by Corcoran and Cecchi (2020[4]). Generally, patients with psychotic disorders display lower a semantic coherence than ordinary. The flow of meaning in their speech is impended: they can express less meaning in the same volume of words, as compared to a mentally healthy person. Reduced capacity to deliver meaning manifests as apparent tangentiality in verbal expression. Psychotic patients seem to err in their elocutions. Reduced complexity of speech, i.e. relatively low a capacity to swing between different levels of abstraction, with a tendency to exaggerate concreteness, is another observable which informs about psychosis. Two big families of diagnostic methods follow that twofold path. Latent Semantic Analysis (LSA) seems to be the name of the game as regards the study of semantic coherence. Its fundamental assumption is that words convey meaning by connecting to other words, which further unfolds into assuming that semantic similarity, or dissimilarity, with a more or less complex coefficient joint occurrence, as opposed to disjoint occurrence inside big corpuses of language.  

Corcoran and Cecchi (2020) name two main types of digital tools for Latent Semantic Analysis. One is Word2Vec (https://en.wikipedia.org/wiki/Word2vec), and I found a more technical and programmatic approach there to at: https://towardsdatascience.com/a-word2vec-implementation-using-numpy-and-python-d256cf0e5f28 . Another one is GloVe, which I found three interesting references to, at https://nlp.stanford.edu/projects/glove/ , https://github.com/maciejkula/glove-python , and at https://pypi.org/project/glove-py/ .

As regards semantic complexity, two types of analytical tools seem to run the show. One is the part-of-speech (POS) algorithm, where we tag words according to their grammatical function in the sentence: noun, verb, determiner etc. There are already existing digital platforms for implementing that approach, such as Natural Language Toolkit (http://www.nltk.org/ ). Another angle is that of speech graphs, where words are nodes in the network of discourse, and their connections (e.g. joint occurrence) to other words are edges in that network. Now, the intriguing thing about that last thread is that it seems to had been burgeoning in the late 1990ies, and then it sort of faded away. Anyway, I found two references for an algorithmic approach to speech graphs, at https://github.com/guillermodoghel/speechgraph , and at https://www.researchgate.net/publication/224741196_A_general_algorithm_for_word_graph_matrix_decomposition .

That quick review of literature, as regards natural language as predictor of psychosis, leads me to an interesting sidestep. Language is culture, right? Low coherence, and low complexity in natural language are informative about psychosis, right? Now, I put that argument upside down. What if we, homo (mostly) sapiens have a natural proclivity to psychosis, with that overblown cortex of ours? What if we had figured out, at some point of our evolutionary path, that language is a collectively intelligent tool which, with is unique coherence and complexity required for efficient communication, keeps us in a state of acceptable sanity, until we go on Twitter, of course.  

Returning to the intellectual discipline which I should demonstrate, as a respectable researcher, the above review of literature brings one piece of good news, as regards the project in psychiatry. Initially, in this specific team, we assumed that we necessarily need an external partner, most likely a digital business, with important digital resources in AI, in order to run research on natural language. Now, I realized that we can assume two scenarios: one with big, fat AI from that external partner, and another one, with DIY algorithms of our own. Gives some freedom of movement. Cool.


[1] Tiutiunyk, V. V., Ivanets, H. V., Tolkunov, І. A., & Stetsyuk, E. I. (2018). System approach for readiness assessment units of civil defense to actions at emergency situations. Науковий вісник Національного гірничого університету, (1), 99-105. DOI: 10.29202/nvngu/2018-1/7

[2] Gąsiorek, K., & Marek, A. (2020). Działania wojsk obrony terytorialnej podczas pandemii COVID–19 jako przykład wojskowego wsparcia władz cywilnych i społeczeństwa. Wiedza Obronna. DOI: https://doi.org/10.34752/vs7h-g945

[3] Worthington, M. A., Cao, H., & Cannon, T. D. (2019). Discovery and validation of prediction algorithms for psychosis in youths at clinical high risk. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. https://doi.org/10.1016/j.bpsc.2019.10.006

[4] Corcoran, C. M., & Cecchi, G. (2020). Using language processing and speech analysis for the identification of psychosis and other disorders. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. https://doi.org/10.1016/j.bpsc.2020.06.004

Leave a Reply