Wednesday, September 28, 2022
HomePediatrics DentistryOptimization of cognitive assessment in Parkinsonisms by applying artificial intelligence to a...

Optimization of cognitive assessment in Parkinsonisms by applying artificial intelligence to a comprehensive screening test

This study is part of a larger prospective, observational, analytical, single-center, cohort study, devised to create a database of clinical, functional, motor, neuropsychological and neurophysiological variables in patients affected by PD and APS coming from all over Italy. The whole study aims at optimally characterizing the profile of these patients in order to better define their treatment into the neurorehabilitative setting.

The protocol was conducted at the Department of Parkinson’s Disease and Movement Disorders Rehabilitation of the “Moriggia-Pelascini” Hospital (Gravedona ed Uniti, Italy) between January 2017 and December 2019. The study design and protocol were approved by the local Ethics Committee (“Comitato Etico Interaziendale delle Province di Lecco, Como e Sondrio”) and were in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki, 1967). The study was also registered on ClinicalTrials.gov (NCT04858893).


Six-hundred sixty-one patients were consecutively enrolled by neurologists with experience in movement disorders.

Patients were included in the present study if they met one of the following criteria: (a) diagnosis of idiopathic PD according to the MDS clinical diagnostic criteria38; (b) diagnosis of PSP according to the MDS clinical diagnostic criteria39; (c) diagnosis of MSA according to the second diagnostic consensus statement40 and (d) diagnosis of VP according to Zijlmans et al.41.

Exclusion criteria were (a) any focal brain lesion detected with brain-imaging studies; (b) psychiatric disorders, psychosis (evaluated with Neuropsychiatric Inventory), and/or delirium; (c) previous diagnosis of dementia; (d) neurological diseases other than PD or APS; (e) other medical conditions negatively affecting the cognitive status; (f) disturbing resting and/or action tremor, corresponding to scores 2–4 in the specific items of MDS Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) III, such as to affect the psychometric evaluation; (g) disturbing dyskinesia, corresponding to scores 2–4 in the specific items of MDS-UPDRS IV, such as to affect the psychometric evaluation; (h) auditory and/or visual dysfunctions impairing the patient’s ability to perform cognitive tests.

Accordingly, 161 patients were excluded from the study: 76 had previous diagnosis of dementia, 13 presented sensorial deficits (2 with visual impairment, 11 with hearing impairment), 51 suffered from psychiatric disorders and 21 presented disturbing tremor and/or dyskinesia. This led to a final study population of 500 patients. According to the clinical diagnostic criteria38,39,40,41, patients were classified as follows: 400 with PD, 41 with VP, 31 with PSP, and 28 with MSA.

A complete explanation of the study protocol was provided, and written informed consent was obtained from all participants before their participation in the study.

Neuropsychological evaluation

The in-depth neuropsychological evaluation was administered by expert neuropsychologists, blinded to patients’ diagnosis. All patients were tested during the morning, in two consecutive days, in a laboratory setting, with constant artificial lighting condition and in the absence of auditory interferences. PD patients were evaluated in medication “on” state.

  1. 1.

    First evaluation: CoMDA

    “CoMDA” stands for “Cognition in Movement Disorders Assessment” and combines MMSE, MoCA and FAB individual measures into a single tool. More specifically, CoMDA consists of all items of the three tests, without repetition for items that appear in more than one of them (e.g., this occurs for the 6 items evaluating orientation, which are both in MMSE and MoCA). CoMDA is thought to maximize the diagnostic-capacity power to screen patients with PD and APS. In our assumption, CoMDA was adopted to define and categorize these patients into three classes: NC, MCI, and IC.

    CoMDA scores result by linear non-weighted combination (additive model) of the non-redundant MMSE, MoCA and FAB items (see Table 1).

    CoMDA allows four different scores in L1: the first three ones are “partial” scores, which are obtained by scoring and summarizing all items of each single test (MMSE: 0–30, MoCA: 0–30, and FAB: 0–18) adjusted (weighted) on the Italian population data as by previous research42,43,44. The fourth one is the “total” score (CoMDA score), which is obtained by summarizing the first three “partial” scores. CoMDA score ranges from 0 (worst performance) to 78 (best performance).

  2. 2.

    Psychometric test battery

Furthermore, patients underwent a large battery of neuropsychological tests for evaluating several cognitive domains (see Table 4), according to the indications provided by Goldman et al.15. The majority of studies for obtaining normative values conducted on the Italian population45,46,47,48,49,50 adopted a statistical procedure, which provides regression-based norms and a system of scores on an ordinal scale, named Equivalent Scores (ES). It ranges from class 0 (scores equal or higher than the outer tolerance limit of 5%) to class 4 (scores lower than the median value of the whole sample); 1, 2, and 3 classes were obtained by dividing into three equal parts the area of distribution between 0 and 4. This method makes it possible to judge the scores obtained by the person under examination with respect to those of normal subjects, taking into account of the influence of variables related to age, education, and gender.

Table 4 Level 2 neuropsychological evaluation (see the text).

Hence, the whole patient’s performance was classified based on the termed L2, by adapting the indications provided by Litvan et al.8 to the described system of scores, along three consecutive classes: 2 = NC (all ES > 0 or one ES = 0); 1 = MCI (two ES = 0, in tests evaluating the same cognitive domain or two ES = 0, in tests evaluating two different cognitive domains); 0 = IC (more than two ES = 0).

Baseline statistics and machine learning

Descriptive statistics and method-comparison substudy

Basic descriptive statistics for continuous variables were reported as mean ± SD. Descriptive statistics for categorical variables were reported as N (percent frequency).

To assess whether MMSE, MoCA, and FAB scores, derived from CoMDA values, fit the scores computed in the standard way, we set up a method-comparison substudy. A group of 20 patients underwent two assessment sessions, in random order: one session included the administration of MMSE, MoCA, and FAB; the other one included the administration of the CoMDA. The agreement among “standard MMSE, MoCA and FAB scores” and “CoMDA-derived MMSE, MoCA and FAB scores” was assessed by Bland–Altman analysis, computing the bias (systematic difference) and the 95% limits of agreement (the range within which 95% of the differences are expected to lie). The Pearson correlation coefficient was also computed.

The time needed to administer CoMDA was also registered and compared both to the time needed for each standard test and the sum of the times required for the three single tests.

Inferential statistics

Non parametric Kruskal–Wallis test and the Chi-square test were carried out for between-group comparisons for continuous and categorical variables respectively. Non parametric Mann–Whitney U-test was applied for single pairwise between-group comparisons. Post hoc comparisons with Dunn–Sidak adjustments were applied for paired multiple-comparison tests.

Predictive discrimination analysis

The area under the curve (AUC) of receiver operating characteristic (ROC) curves was computed to assess the ability of all available cognitive screening tools to discriminate between two classes: NC versus MCI or IC, as assessed by the L2 classification (L2 = 0 vs. L2 = 1 + L2 = 2). A value of 0.5 indicates no predictive discrimination, while a value of 1 indicates perfect separation of patients with and without cognitive impairment. The AUC for the CoMDA, MMSE, MoCA, and FAB tools was compared by the Hanley–McNeil test.

A p-value < 0.05 was considered statistically significant. All analyses were carried out using the SAS/STAT statistical package, release 9.4 (SAS Institute Inc., Cary, NC, U.S.A.).

Machine learning

The ML solutions were engineered to finalize L2 classification on the total sample of 500 patients. All ML models availed of a baseline pool of 7 prescreening candidate predictors: “CoMDA score”, “gender”, “age”, “disease”, “disease duration”, “years of education”, and “L1 score”. A large set of ML architectures (i.e., algorithms, parameters and hyperparameter combinations) were concurrently tested to obtain the final model, which is the best in fitting algorithmic configuration to correctly predict L2 classification from the available data.

Models were validated via k-fold cross-validation operated on a training partition set out of the 500 available samples. This procedure implies to split the available dataset into k non-overlapping folds. Each of the k folds could be used as a held-back test set, while all other folds collectively are used as a training dataset. A test/holdout set was used to measure unbiased cross-generalization performance level of the ML solution on unseen data. A random-shuffling train-test split was carried out to avoid any potential selection bias. This procedure was performed on the original data sample before any model-training operation.

To know the most important predictor among the 7 available, the value of Information Gain, obtained in prediction, has been quantified. This value reflects a measure of “entropy reduction” or “information relevance” of predictors in the dataset of reference51,52.

Finally, cross-algorithm performances were assessed by widely adopted standard prediction metrics: accuracy, AUC, recall, precision, F1, kappa and MCC.

All machine-learning experiments were carried out by coding in Python 3.8 (Python Software Foundation, 9450 SW Gemini Dr., ECM# 90772, Beaverton, OR 97008, USA) with full use of PyCaret 2.3.3 library (PyCaret.org. PyCaret) and Jupiter Notebook (Python editors). The finalized version of ML algorithm had hyperparameters fine-tuned via “Optuna” mathematical method53.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments