Predicting Alzheimer’s: Importance of Accuracy and Speed in Diagnosis

University of Toronto Machine Intelligence Team

5 min readFeb 8, 2020

“Current prediction models can predict AD as early as several decades before it is clinically diagnosed”

Alzheimer’s Disease (AD) is an irreversible neurodegenerative disease that destroys brain cells, causing thinking ability and memory to deteriorate. Alzheimer’s is not a normal part of aging and will directly affect a substantial proportion of today’s population.

To be able to predict Alzheimer’s Disease several decades before it is clinically diagnosed requires three base systems. Every base system processes audio recordings that undergo an assessment for an MMSE(Mini-Mental State Examination) score. An MMSE score is a measurement of cognitive impairment.

Each base system yields a different report as they vary in their accuracy in predicting AD. To test for the most accurate of AD prediction, each base system is distinct with different training and features. According to the research, S3 (Base System 3) is the most accurate with training on DementiaBank. DementiaBank is a shared database of multimedia interactions for the study of communication in dementia.

Jekaterina Novikova’s presentation explained the company’s approach in their assessment of AD. The main points from the talk entailed the three step assessment and the three base systems along with how we can improve this process by converting manual transcripts to automated transcripts. The approach is developed from their research paper published in 2016: “Dementia and Alzheimer’s Disease Detection using Natural Language”.

Their three step assessment (simplified) entails:

1. The patient describes a selected picture for 1–5 minutes
2. Recording is sent to the cloud analysis platform which automatically creates a transcript
3. Generate a report that quantifies speech, language, and cognition, as well as predict an MMSE score

A significant challenge is the discrepancy between using manual transcripts and automated transcripts.

The speed at which a manual transcript is constructed is much slower than the pace of an automated transcript, and is very capital intensive. However, at this expense, the accuracy of a manual transcript is superior to the automated transcript. In the future, the company hopes to replace all manual transcripts by improving the accuracy of the automated transcripts.

Winterlight Labs’ early prediction model was developed using celebrities as case studies e.g. Gene Wilder and Paul Newman. The use of celebrities instead of ordinary people is justified by the amount of speech data there is on the internet. Gene Wilder was diagnosed with AD at the age of 80, and Paul Newman did not suffer from cognitive impairment. Both case studies showed contrasting results when comparing three different linguistic features with respect to age:

1. Average noun phrase length
2. Number of clauses per sentence
3. Ratio of pronouns to nouns

The linguistic variables demonstrate a difference in correlation between the patient (Gene Wilder) with cognitive impairment and the patient without (Paul Newman).

This demonstrates the clear trends that differentiate AD and non-AD patients; this is instrumental in producing an earlier year of diagnosis.

The early prediction model consists of three base systems, these systems serve to analyze the transcripts:

S1. Base system, uses raw features
S2. Base system with longitudinal normalization
S3. Base system with latent representation learnt while training on DementiaBank

From the results collected by Winterlight Lab’s, S3 was especially able to predict AD early as several decades before it is clinically diagnosed. This requires very accurate transcripts which need to be produced quickly. ASR plays a crucial role in this process.

As mentioned, inaccurate automated transcripts, produced by ASR, lead to inaccurate predictions as it runs the risk of misidentifying words. This ties in to how ASR predicts the next word in a recording by using a language model; the model computes entropy and this demonstrates the lexical diversity of the recording. The language model enables ASR to produce automated transcripts which are inputted into the Base Systems from which a report is produced from the respective system. The study’s findings have demonstrated that the more diverse the text/recording is, the harder it is to predict the next word.

Below is a simple unigram entropy for AD:

Here, x stands for all unique tokens/n-grams, freq stands for the number of occurrences in the text, and len for the total number of tokens/n-grams in the text, (c,w) stands for all unique n-grams in the text, composed of c (context, all tokens but the last one) and the w(the last token). We compute entropy over tokens (unigrams), bigrams and trigrams.

Improving the accuracy of ASR will be a larger step in improving the speed and growth in the method of detecting Alzheimer’s. The paper “Dementia and Alzheimer’s Disease Detection using Natural Language” goes into full detail of their research on Alzheimer’s.

KEY WORDS

Alzheimer’s Disease (AD): is an irreversible neurodegenerative disease that destroys brain cells, causing thinking ability and memory to deteriorate.

Audio Speech Recognition (ASR): Method of converting an audio recording into a general report that quantifies speech, language, and cognition, as well as predict an MMSE score.

DementiaBank: Dementia is a shared database of multimedia interactions for the study of communication in dementia.

Entropy: Measures the amount of disorder in a system.

Lexical Diversity: A measure of how many different words that are used in a text, while lexical density provides a measure of the proportion of lexical items (i.e. nouns, verbs, adjectives and some adverbs) in the text.

Mini-Mental State Examination (MMSE): a 30-point questionnaire that is used extensively in clinical and research settings to measure cognitive impairment.

Unigrams, Bigrams, Trigrams: The n-grams typically are collected from a text or speech corpus. When the items are words, n-grams may also be called shingles. Using Latin numerical prefixes, an n-gram of size 1 is referred to as a “unigram”; size 2 is a “bigram” (or, less commonly, a “digram”); size 3 is a “trigram”.

Winterlight Labs: Toronto based company that specializes in computational linguistics, cognitive neuroscience, and machine learning.

References:

1. Kathleen C. Fraser, J. A. (2016). Linguistic Features Identify Alzheimer’s Disease in Narrative Speech. Journal of Alzheimer’s Disease 49.

2. Novikova, J. (2019, March). The Effect of ASR on Classification Performance in Alzheimer’s Detection.

This article is a recount of “Talk Series 4: Effect of ASR on Classification Performance in Alzheimer’s Detection by Jekaterina Novikova” on March 5, 2019 as an UTMIST event at University of Toronto. For more of our events, please subscribe to our newsletter at https://utmist.github.io/

Predicting Alzheimer’s: Importance of Accuracy and Speed in Diagnosis

KEY WORDS

References:

Written by University of Toronto Machine Intelligence Team

No responses yet