Development of a classification system based on corneal biomechanical properties using artificial intelligence predicting keratoconus severity

Background To investigate machine-learning (ML) algorithms to differentiate corneal biomechanical properties between different topographical stages of keratoconus (KC) by dynamic Scheimpflug tonometry (CST, Corvis ST, Oculus, Wetzlar, Germany). In the following, ML models were used to predict the severity in a training and validation dataset. Methods Three hundred and eighteen keratoconic and one hundred sixteen healthy eyes were included in this monocentric and cross-sectional pilot study. Dynamic corneal response (DCR) and corneal thickness related (pachymetric) parameters from CST were chosen by appropriated selection techniques to develop a ML algorithm. The stage of KC was determined by the topographical keratoconus classification system (TKC, Pentacam, Oculus). Patients who were classified as TKC 1, TKC 2 and TKC 3 were assigned to subgroup mild, moderate, and advanced KC. If patients were classified as TKC 1–2, TKC 2–3 or TKC 3–4, they were assigned to subgroups according to the normative range of further corneal indices (index of surface variance, keratoconus index and minimum radius). Patients classified as TKC 4 were not included in this study due to the limited amount of cases. Linear discriminant analysis (LDA) and random forest (RF) algorithms were used to develop the classification models. Data were divided into training (70% of cases) and validation (30% of cases) datasets. Results LDA model predicted healthy, mild, moderate, and advanced KC eyes with a sensitivity (Sn)/specificity (Sp) of 82%/97%, 73%/81%, 62%/83% and 68%/95% from a validation dataset, respectively. For the RF model, a Sn/Sp of 91%/94%, 80%/90%, 63%/87%, 72%/95% could be reached for predicting healthy, mild, moderate, and advanced KC eyes, respectively. The overall accuracy of LDA and RF was 71% and 78%, respectively. The accuracy for KC detection including all subgroups of KC severity was 93% in both models. Conclusion The RF model showed good accuracy in predicting healthy eyes and various stages of KC. The accuracy was superior with respect to the LDA model. The clinical importance of the models is that the standalone dynamic Scheimpflug tonometry is able to predict the severity of KC without having the keratometric data. Trial registration NCT04251143 at Clinicaltrials.gov, registered at 12 March 2018 (Retrospectively registered). Supplementary Information The online version contains supplementary material available at 10.1186/s40662-021-00244-4.

Conclusion: The RF model showed good accuracy in predicting healthy eyes and various stages of KC. The accuracy was superior with respect to the LDA model. The clinical importance of the models is that the standalone dynamic Scheimpflug tonometry is able to predict the severity of KC without having the keratometric data. Trial registration: NCT04251143 at Clinicaltrials.gov, registered at 12 March 2018 (Retrospectively registered).
Keywords: Artificial intelligence, Corneal biomechanics, Corvis ST, Grading, Keratoconus, Machine learning Background Keratoconus (KC) is a bilateral ectatic disease of the cornea that is characterized by corneal steepening and thinning [1]. As a result, irregular astigmatism may lead to loss of vision. Former studies have reported low incidence and prevalence of keratoconus [2]. Recently, it was shown that KC does not occur as rarely as described [3]. Due to improving diagnosis, it is assumed that prevalence is higher and depends on geographic regions [4]. Placido-disk, Scheimpflug or optical coherence tomography (OCT) technology are useful tools to image corneal topography and tomography for screening ectasia. In the case of KC, biomechanical properties are altered to the effect that corneal tissue is biomechanically weakened [5]. Especially, focal weakening of elastic properties might be the initial trigger for stromal thinning and increasing steepening [6]. Thus, in vivo biomechanical assessment of the cornea became popular by releasing the non-contact tonometer labeled as ocular response analyzer (ORA, Reichert, Ophthalmic Instruments, Depew, NY, USA) in the field of refractive surgery, keratoconus and glaucoma [7]. ORA provides information regarding corneal viscoelastic properties that are described as corneal hysteresis and corneal resistance factor [7,8]. Furthermore, keratoconus match index (KMI, ORA) and probability (KMP, ORA) are derived from individual waveform characteristics of the measurement signal and are compared to a normative database [9,10]. Furthermore, investigations have shown that this database was not related to an objective keratoconus classification; besides, cases were classified by individuals of four different settings [11]. Therefore, no clear correlation to the topographic keratoconus classification (TKC, Pentacam, Oculus, Wetzlar, Germany) or anterior surface indices such as keratoconus index (KI, Pentacam), could be found throughout several investigations. Later, a Scheimpflug-based tonometer was introduced by Oculus that records the corneal deformation process induced by an air-puff using an ultra-high-speed camera (Corvis ST, Oculus, Wetzlar, Germany). The measurement outcome of the Corvis ST is described as dynamic corneal response (DCR) parameters. It has been shown that DCR parameters are highly repeatable in healthy [12] and KC eyes [13]. Additionally, Corvis ST can be used to assess alterations before and after corneal cross-linking (CXL) [14][15][16]. Corvis biomechanical index (CBI) and tomographic and biomechanical index (TBI) are indices that are able to differentiate between healthy and KC eyes as well as healthy and subclinical ectasia [17,18]. Subclinical eyes were defined as those with normal topography in one eye and manifest KC in the fellow eye with very asymmetric ectasia [17]. However, these indices were not designed to differentiate between various stages of KC. Previously, we showed that DCR parameters were different in several stages of KC [19]. The aim of this pilot study was to develop a corneal biomechanical based classification model, called Dresden keratoconus index (DKI), to predict the severity of KC in a standalone Corvis ST measurement without having keratometry data from the cornea.

Subjects
This monocentric pilot study was conducted at the Department of Ophthalmology, University Hospital Carl Gustav Carus, TU Dresden, Germany. The study protocol was approved by the ethics committee of the University Hospital Carl Gustav Carus, Dresden, TU Dresden, Germany following the tenets of the Declaration of Helsinki. Participants and KC patients were enrolled between January 2017 and March 2020 from the refractive and keratoconus clinic at the Department of Ophthalmology, University Hospital Carl Gustav Carus. All subjects had to confirm their approval by signing the informed consent. Furthermore, healthy subjects and keratoconus patients have received a complete ophthalmologic examination including slit lamp biomicroscopy of the anterior segment and fundus biomicroscopy as well as a survey of their medical history. Inclusion criteria for healthy participants were an age between 18 and 45 years, normal tomography, an intraocular pressure less than 21 mmHg and an ordinary optic nerve head. KC patients had to present clear signs of keratoconus in corneal maps (derived from Scheimpflug tomography) that was approved by an experienced clinician (FR) and optometrist (RH). The topographical keratoconus classification (TKC) had to be at least stage 1 (TKC 1). Of note, one follow-up examination was necessary to confirm topographical stability. Healthy participants and KC patients were requested to discontinue the wearing of contact lenses for 10 days. Exclusion criteria were previous corneal and ocular surgeries (e.g., corneal cross-linking), diabetes mellitus and severe cases of KC.

Measurement of dynamic corneal response parameters
The Corvis ST measures the corneal response to an induced, predefined air-puff using an ultra-high speed Scheimpflug camera [20,21]. DCR and corneal thickness related (pachymetric) parameters are derived from 2dimensional cross-section records of the cornea and describing the corneal behavior during different deformation phases. First, the air-puff reaches the cornea and pushes it to the 1st applanation. While the air-puff is active, the cornea is forced into a concave shape (described as highest concavity, HC). After that, the air pressure decreases and the cornea moves back through the 2nd applanation to its physiological state. Some of the DCR parameters indicate time and velocity to 1st and 2nd applanation as well as maximum deformation [18,[20][21][22]. Furthermore, corneal pachymetry (Pachy) and corneal thickness related parameters (ARTh, Ambrosio Rational Thickness horizontal [18] and Pachyslope [23]) are measured before the air-puff reaches the cornea. ARTh is calculated as the thinnest corneal thickness divided by pachymetric progression to periphery [18]. Contrarily, Pachyslope is calculated as the difference of mean corneal thickness at ±2.5 mm and corneal thickness at the apex [23,24]. The latest software release has included novel parameters like biomechanical corrected intraocular pressure (bIOP) [25,26]; maximum inverse concave radius (InverseR) [18]; integrated inverse radius (IntInverseR) [18]; ratio of central and peripheral deformation in a distance of 1 mm and 2 mm (DAR1/ DAR2) [18] and stiffness parameter at the 1st applanation (SPA1) [22]. Furthermore, the CBI is a combined index of several DCR parameters based on logistic regression analysis that distinguishes between healthy and KC eyes [18]. Instead, the TBI combines DCR and tomographic parameters using a random forest method [17].

Corneal tomography measurements and classification of keratoconus
Corneal tomography of healthy participants and KC patients were evaluated by Scheimpflug technology (Pentacam, Oculus, Wetzlar, Germany). Topographical data were derived from these measurements. The following parameters were used in this study: maximal keratometry (Kmax), thinnest corneal thickness (TCT), Belin/ Ambrósio total deviation value (BAD-D) and inferiorsuperior keratometric difference (I-S value). Pentacam provides two KC classification systems: the topographic keratoconus classification (TKC) [27] and the ABCD grading [28]. Both of them are related to the Amsler-Krumeich KC classification [27,28]. The ABCD grading offers an independent staging of anterior as well as posterior surface and TCT. However, our clinical experience has shown that it is difficult to find patients, which have the same stage in each category (e.g., A2B2C2). Therefore, we decided to use TKC as target classification for predicting KC severity by DCR and pachymetric parameters. TKC is based on topographic indices like index of surface variance (ISV), keratoconus index (KI) and minimum radius (Rmin) [27]. KC patients who were classified as TKC 1, TKC 2 and TKC 3, were assigned to subgroups "mild KC", "moderate KC" and "advanced KC", respectively. Patients classified as "TKC 1-2" and "TKC 2-3" were assigned to mild and moderate, according to the normative range of ISV, KI and Rmin (shown in Table 1). Patients classified as "TKC 4" were not included in this study due to the limited number of cases.

Statistical analysis and classification models
Statistical analysis and machine learning algorithms were performed using SPSS (version 25, IBM Statistics, Armonk, New York, USA) and R (R Foundation for Statistical Computing, Vienna, Austria; https://www.Rproject.org/). Incomplete data, insufficient quality of Corvis ST measurement or outliers of patients' datasets were removed. One eye per participant or patient was used. The dataset was randomly divided into a training (around 70% of cases) and a validation (around 30% of cases) dataset. To solve this classification problem, random forest (RF) and linear discriminant analysis (LDA) were selected due to their suitability for multiclass classification. Both RF [17,29] and LDA [30][31][32] were used in the past to solve classification problems in ophthalmology. The RF model is a machine learning algorithm that includes and combines a large number of decision trees to solve classification and regression issues [33,34]. A decision tree is built up based on nodes where one independent variable is chosen to cause a decision to find the final prediction [33]. In RF, the outcome of each decision tree is a vote and the most predicted decision determines the final prediction [33]. On the other hand, LDA is a classification algorithm that uses discriminant functions, a linear combination of selected parameters, to classify two or more groups [35]. The discriminant function describes the numeric properties of the subgroups where the mean of these results constitute a centroid [35]. The differences of these means of two or more groups represent the cut-off value [35].
In general, machine learning approaches are categorized as supervised, unsupervised and reinforcement learning [33,34]. Both RF and LDA are supervised machine learning algorithms. The aim of this application is that the machine learning algorithms are able to learn from a labeled dataset and to construct rules to predict unlabeled data, which is not in the dataset [34]. The learning process also includes an improvement in the accuracy if new data are added. The accuracy of the resulting model from the machine learning process depends on the amount and the quality of the data. There is also a risk of a biased training dataset that leads to a false prediction of unlabeled and independent data. DCR and pachymetric parameters were exported from the Corvis ST software (version 1.5r1902) including 40 variables. The CBI and TBI were excluded because they represent already established indices and were later used for comparative analysis. Whole eye movement were not considered in the analysis because it does not directly represent corneal biomechanical properties. Parameters were assessed in their multicollinearity to each other by calculating the variance inflation factor (VIF) from the regression analysis. In the following, the final selection of DCR and pachymetric parameters were done by recursive feature elimination (caret package, R) and stepwise Wilks-Lambda method (SPSS) for RF and LDA, respectively (Fig. 1). The performance of each algorithm was evaluated with the validation dataset by generating a confusion matrix. Additionally, accuracy of the algorithm determined the overall performance, whereas sensitivity (S n ) and specificity (S p ) were determined for each subgroup ("healthy", "mild KC", "moderate KC" and "ad- In cases where S n and S p were calculated for each subgroup (e.g., "mild KC"), true positives were all cases that were classified as mild KC. True negatives were all nonmild KC cases that were not classified as mild KC. False positives were all non-mild KC cases that were classified as mild KC, whereas false negatives were all mild KC cases that were not classified as mild KC.
The developed algorithms were compared with respect to their suitability of detecting KC in general using the CBI and TBI. Finally, receiver operating characteristics (ROC) curves were plotted and area under the curves (AUC) were determined. For multiple comparisons, oneway ANOVA with Bonferroni correction was used. A P value of less than 0.05 showed statistical significance. Sample size calculation was done using G Power (version

Demographics
In this study, 116 eyes of 116 healthy participants (controls) and 318 eyes of 318 keratoconus patients were analyzed. There were significantly more male than female subjects (P = 0.004). The demographic data of healthy and keratoconus subjects are summarized in Table 1. Topographic parameters (Km, Kmax, I-S value) were significantly higher in the KC group than in controls (P < 0.001), as well as between healthy and mild KC, between mild KC and moderate KC and between moderate KC and advanced KC (P < 0.05). ARC, PRC and TCT were significantly lower in KC compared to controls (P < 0.001), and the lower the stage of KC was (P < 0.001). The bIOP was significantly different between both cohorts (P < 0.001), however, no differences were found between subgroups of KC (P > 0.05). CBI and TBI showed significant differences between controls and KC (P < 0.05), whereas TBI was not different between mild and moderate, or between moderate and advanced KC (P > 0.05).

Dynamic corneal response parameters in healthy and KC subjects
The comparison of important DCR and pachymetric parameters is shown in Table 2. Except for the deflection amplitude at 1st applanation (A1DefA), all shown parameters were significantly different between controls and KC (P < 0.001), controls and mild KC (P < 0.01), mild and moderate KC (P < 0.01), as well as moderate and advanced KC (P < 0.001).

Classification of KC by DCR and pachymetric parameters
The complete dataset was randomly divided into a training and a validation dataset. There were no differences in age, bIOP, topographic and tomographic parameters between these datasets (P > 0.05, Table 3). Both models were tested with the validation dataset. Only these results were represented. In LDA, the final model contained the following parameters, ordered by their importance to the algorithm: ARTh, SPA1, IntInverseR, PachySlope, Radius and A1V. The prediction based on S n and S p for mild, moderate, and advanced KC versus healthy were 73%/81%, 62%/ 83%, 68%/95% versus 82%/97%, respectively (Fig. 2). The overall accuracy for classifying the severity of KC was 71% (Table 4).
Comparing the general KC detection by DKI and LDA models with established CBI and TBI, all severity subgroups were assigned to the KC group. Each of the biomechanically based indices (DKI, LDA and CBI) showed an accuracy of more than 90% ( Table 5). The prediction of KC by DKI and LDA model was as good as CBI. The TBI reached a S n and S p of 100%/99% in detecting KC.

Discussion
Biomechanical assessment using the Corvis ST is a useful tool to evaluate in vivo corneal biomechanics and is able to screen for KC and subclinical KC [17,18]. The CBI is a combined index, which is based on logistic regression analysis where the final beta is transformed into a logistic sigmoid function and a cut-off value of 0.5 discriminates healthy (CBI < 0.5) from KC (CBI > 0.5). Clinical studies have shown high sensitivity and specificity in detecting KC [18,19,36,37]. Contrarily, TBI combines tomographic and biomechanical data using random forest with leave-one-out cross validation, where a cut-off value of 0.29 provided an excellent accuracy in detecting KC and eyes with normal topography and tomography where the fellow eyes showed ectasia [17,37,38]. A cutoff value of 0.75 was found to detect clinical keratoconus with a S n and S p of 100%. However, the indices are not designed to predict the severity of KC. To the best of our knowledge, this is the first study that has used a random forest algorithm to predict the severity of KC based on DCR and pachymetric parameters derived from airpuff tonometry.
In our previous work, we showed with a smaller sample size, that DCR parameters were different in certain stages of KC [19]. However, differences were more pronounced between mild KC (TKC 1) and advanced KC (TKC 3) than between mild (TKC 1) and moderate (TKC 2) KC as well as between moderate and advanced     [19]. It was found that bIOP was significantly higher in healthy subjects compared to all KC patients. A higher IOP would imply a stiffer corneal behavior against the air-puff [22]. However, the difference was around 0.9 mmHg, which does not affect further analysis. In the current study, almost all DCR and pachymetric parameters were significantly different between controls and mild KC, as well as within KC groups. The results have shown that corneal thickness properties (Pachy, ARTh and Pachyslope) as well as the DCR parameters depend from the KC severity. In other words, the higher the stage of KC, the weaker is the corneal behavior against the air-puff. We investigated those DCR parameters that were important for the classification model. ARTh showed lower values, the higher the stage of KC, whereas Pachyslope showed higher values. Both parameters indicated thinner corneas and faster increase of corneal thickness to the periphery, the higher the stage of KC [39]. The more advanced the KC, the higher InverseR and IntInverseR were, suggesting a steeper corneal shape during the concave deformation phase. In higher stages of KC, the ratios of central to peripheral deformation at 1 mm (DAR1) and 2 mm (DAR2) were increased. These results suggest less resistance against the deformation, the higher the stage of KC is [18]. Furthermore, the stiffness parameter at the 1st applanation  [40].
For classification purposes, we decided to choose TKC as the target variable instead of ABCD grading, because of its higher complexity of anterior and posterior curvature, and corneal thickness evaluation. Furthermore, two different machine learning algorithms were used to predict the severity of KC using DCR and pachymetric parameters. The amount of DCR parameters were reduced while they were checked for multicollinearity (13 of 36 DCR parameters were removed). An improvement of each model (LDA and RF) was achieved by specific feature elimination methods. Finally, the LDA model contained six DCR parameters (ARTh, SPA1, IntInverseR, PachySlope, Radius and A1V) to predict the severity of KC, whereas the RF model contained 11 parameters (ARTh, InverseR, SPA1, PachySlope, Pachy, IntInverseR, DAR1, DAR2, Radius, A1DefA and A2DefAA).
In the first instance, LDA separated the four groups (controls, mild, moderate, and advanced KC) with an overall accuracy of 71%. The S n and S p were sufficient for controls and mild KC but inadequate for moderate and advanced KC. The RF model using default cut-off values showed excellent S n and S p for healthy controls and advanced KC. However, S n was inadequate for mild and moderate KC, where mild KC were predicted as healthy in 29% of cases. Therefore, the RF model was optimized by improving cut-off values that led to excellent S n and S p for healthy controls and mild KC but to the detriment of S n of advanced KC. Another reason for optimizing the cut-off values was the ability to better differentiate between healthy and mild KC due to clinical relevance. The optimized RF model is called Dresden Biomechanical Keratoconus Severity Index (DKI) and reached the highest overall accuracy of 78% compared to LDA and RF model with default cut-offs. Recently, Langenbucher et al. published a study with a similar aim where they utilized LDA and support vector machine (SVM) algorithms [41]. There were five subgroups (healthy and TKC 1-4), while the number of subjects were the same compared with this study. The overall accuracy was 65 and 64% for SVM and LDA, respectively [41]. These values were lower than in the present study.
Moreover, the DKI accomplished to be as good as the CBI in detecting keratoconus. In this study, surprisingly, S n of the CBI was lower than in previous studies [18,19,36,37], while a cut-off of 0.5 was used as published by Vinciguerra et al. [18]. The reason for this might be that the training process and the training dataset of the DKI was more suitable in separating mild KC from healthy compared with CBI. Nevertheless, DKI was comparable to the CBI in predicting KC in general. As mentioned previously, the TBI aims to predict subclinical ectasia by combining topographic and tomographic data with DCR parameters using a cut-off value of 0.29 [17]. In this study, a cut-off value of 0.79 was used for separating healthy from clinical keratoconus as described by Ambrosio et al. [17]. They found a S n and S p of 100% [17]. Our results reveal that the TBI showed only one misclassification in our study cohort, which resulted in excellent S n , S p , and accuracy.
A limitation of this pilot study is the small sample size of each group. Additionally, the single center design limits the accuracy concerning different races, devices, and users. The model might be improved by including more cases in the training database from different centers.

Conclusion
In this study, we developed a classification model that predicts the severity of KC with high accuracy but without compromising S n and S p in detecting KC when compared with the CBI. However, the most misclassifications occurred in moderate KC because of an overlap with mild and advanced KC. The DKI is mainly influenced by pachymetric parameters. However, DCR parameters describing properties of corneal deformation against the air-puff have a major impact on DKI as well. The clinical importance of the DKI is that a sole measurement of Scheimpflug-based tonometry is able to predict the severity of KC without any topographical and tomographical information. This could be interesting for clinical users that have a standalone Corvis ST without a Pentacam. Further studies should be conducted to determine the suitability of the DKI as a follow up parameter.