Estimation Of A Smooth And Convex Receiver Operator Characteristic Curve: A Comparison Of Parametric And Non-Parametric Methods
A M Zamzam
convexity, non-parametric, parametric, receiver operating characteristic curve, smoothing
A M Zamzam. Estimation Of A Smooth And Convex Receiver Operator Characteristic Curve: A Comparison Of Parametric And Non-Parametric Methods. The Internet Journal of Epidemiology. 2016 Volume 14 Number 1.
Introduction Sensitivity and specificity are two components that measure the performance of a diagnostic test. Receiver operating characteristic (ROC) curves display the trade-off between sensitivity and (1-specificity) across a series of cut-off points. This is an effective method for assessing the performance of a diagnostic test.
Methods The aim of this research is to provide a reliable method for estimating a smooth and convex ROC curve to help medical researchers use it effectively. Specified criteria such as likelihood ratio test, ease of use and computational time were used for evaluation. Using these criteria, parametric and non-parametric ROC curve estimation methods in two different computer software packages were analyzed. A simulated bi-cauchy distributed ROC curve was compared, using the two estimation methods, to a simulated binormal distributed ROC curve, where the latter stands as the known truth.
Results Compared to non-parametric methods, parametric methods failed to yield a smooth and convex ROC curve for small sample size. On the other hand, parametric methods showed superiority over non-parametric methods in estimating a smooth and convex ROC curve for large sample sizes.
Conclusion Parametric methods for ROC curve estimation is recommended over non-parametric methods for large sample size continuous biomarker data sets but it is conservative for small sample size.
Receiver operating characteristics (ROC) curve is used to analyze and evaluate the performance of diagnostic systems. The ROC curve is a graphical display of sensitivity on the y-axis against (1-specificity) on the x-axis. It is used to determine a cutoff value for that specific diagnostic test giving the optimal sensitivity and specificity, which is a point at which one can differentiate between two statuses (healthy and diseased). For this reason, ROC curves stand as an important method for evaluating the performance of diagnostic medical tests. The true positive fraction (TPF) and the false positive fraction (FPF) are other names for the two measures displayed on the ROC curve axis. Where TPF is the population proportion correctly classified as diseased and FPF is the proportion of healthy subjects incorrectly classified as diseased.
A convex ROC curve is one that has a monotonically decreasing slope, where for every pair of points the curve doesn’t lie below the line that connects these two points. It was proven that non-convexity in ROC curves represents an irrational decision process and is not accepted in medical research . As a result, hooking and data degeneracy should be avoided in order to display a reliable ROC curve. The term “hooking” refers to the upward concave region in the fitted curve. When “hooking” occurs it is unlikely to represent the actual behavior of human observers . On the other hand, a fitted ROC curve with horizontal and vertical line segments are said to be 'degenerate' .
The research done in this field proposed several methods for ROC curve estimation. Generally, we can classify them into parametric, non-parametric and semi-parametric ROC estimation methods. The non-parametric methods don’t require any distributional assumptions of the diagnostic test, while the parametric methods are used when the statistical distribution of the test values is known and has the advantage of producing a smooth ROC curve . Finally, the semi-parametric methods which assume using a non-parametric approach to estimate the distribution of test results in healthy population, but then assume a parametric approach for the distribution of test results in diseased population.
Previous literature focused on the area under the curve (AUC) and partial area under the curve (pAUC) as the main criteria determining an effective and precise ROC curve. Although AUC stands as reliable evaluation criteria, other aspects such as smoothing and convexity were not fully studied. Smoothness and convexity mainly contributes to the final shape of the ROC curve. Therefore, the estimation method with greater smoothness and convexity will yield a ROC curve with an improved performance than other methods.
The gap in the literature in identifying the recommended methods to estimate a smooth and convex ROC curve was the reason for the following evaluation of different ROC curves estimation methods.
In this study, the aim was to simulate a diagnostic test results with a continuous outcome and compare the smoothed ROC curve with a known truth using two methods. In an ideal situation, the ROC curve would pass through the point (0, 1). In this case, there would be an ideal performance for the diagnostic test in completely separating healthy and diseased subjects. A normal distribution for both healthy and diseased yielded an ROC curve with the optimum shape. This estimated binormal curve from this binormal distribution curve stood as the known truth.
In order to simulate a reliable binormal distribution, several attempts were made using R software (See Appendix A). In these attempts, all parameters for healthy and diseased distributions were fixed except for the mean of the diseased population . This mean was tested across several increments to simulate a binormal data set for a gold standard ROC curve. The two distributions in this binormal data set had almost overlapping density functions curves. In comparing parametric and non-parametric estimation methods, a study showed considerable evidence that the ratio of standard deviations of distribution for healthy to diseased populations in a simulated binormal model need to be higher than 1 . On the other hand, representing the realistic continuous biomarker data, a Cauchy distribution was used. The Cauchy distribution was chosen as it had an undefined mean and variance with a symmetric distribution matched median and inter quantile range (IQR) to a normal distribution. Moreover, in the ROC curve estimation analysis, a study estimated a cauchy distribution for the biomarker data density function . These characteristics helped to simulate a continuous realistic biomarker data which can be matched and compared to a known truth. Empirical curves for both binormal and bi-cauchy ROC curves were plotted using a large sample size of 10000. This large sample size aided in evaluating the true behavior of bi-cauchy curve without smoothing compared to binormal curve. Also, it will help in detecting any non-convexity expected in the shape of the bi-cauchy ROC curve.
Previous studies proved that sample size had a contribution in the performance of different ROC curve estimation methods. For example, a study proved that non-parametric approach was conservative with small sample sizes less than 20 . In order to test the performance of both parametric and non-parametric ROC curve estimation, two sample sizes representing two extreme scenarios of 20 and 10000 were simulated and analyzed for each software. The sample size of 10000 will represent a large sample size for a study with realistic biomarker data which might have enough power to detect changes. Moreover, a sample size of 20 was referred to as a small sample size where certain estimation methods might experience some difficulties.
2.1 Evaluation Criteria
2.1.1. Likelihood ratio test
In this study, the main focus was on the shape of the curve as it is an important feature for its overall performance. Therefore, the slope of the curve was chosen and calculated at specific threshold points. The slope from the tangent line at a point on a smoothed ROC curve represents the instantaneous change in the TPR per unit change in the FPR. As a result, the slope on the ROC curve corresponds to the likelihood ratio at that point, which can be estimated only for tests with a continuous outcome . Also, the slope of the ROC curve was related to the bias of the detector at a given operating threshold . For a test value, the Likelihood ratio (LR) will be the probability of that test result among the diseased subjects divided by the probability of the same test result among the healthy subjects . The equation was as follows: LR= Sensitivity/ (1-Specifictiy).
The simulation function was looped over 100 likelihood ratio results for each specificity given sensitivity. Accordingly, the means of the likelihood ratio at these three thresholds on the bi-cauchy ROC curve were calculated with the mean of the corresponding points’ likelihood ratio of the binormal ROC curve. The bias was then measured by the absolute difference between likelihood ratio values for both binormal and bi-cauchy curves at the specified threshold points. Due to the nature of cauchy distribution, failed loop runs was expected. As a result, proportion of discarded bi-cauchy simulated runs (per 100) was calculated.
2.1.2. Ease of use
Three different sub-criteria were chosen to encompass the use of the software and data handling, which were based on a previous study . Based on the author’s opinion, a Likert scale was used. Ranging from totally disagree to totally agree, the score was converted on a scale of one to five respectively. Accordingly, each answer on the ROC curve estimation method for each package used was assigned a maximum score of 5. The sum of all answers values gave the final score. The sub-criteria used are described briefly below.
Compatibility: Simulating data sets into the package and the ability to create a loop function. Also, using the ROC curve functions effectively.
Output: The program should show the capability of saving calculated graphs and drawing more than one curve in one graph. Also, output should be easily exported into an excel sheet for further analysis.
User Manual: This will assess the comprehensibility of the user manual and the online solutions for software problems provided by the manufacturer.
2.1.3. Computational time
As the simulation will run over 100 loops, it was important to present the estimation method with the least computation time. For each calculated output form the likelihood ratio test, the computational time was recorded (per 100 loops) and tabulated as a continuous outcome in minutes.
2.2 Choosing threshold points on the ROC curve
As described by Metz (1978), three possible operating points on the ROC curve were enough to analyse the overall performance of an operator . These points represented the strict, moderate and lax thresholds represented on the sensitivity axis by points 0.1, 0.5 and 0.9 respectively. Specificities given sensitivities at these points were calculated. As illustrated by Maxion and Roberts (2004) the likelihood ratio was calculated by finding the coordinates of the desired threshold points values on the ROC curve .
The R software with pROC package was used for the parametric method. For the non-parametric method, pcvsuite package in R software was chosen. These packages allowed the comparison of two ROC curves throughout the required evaluation criteria. The simulated binormal and bi-cauchy ROC curves were smoothed for each software. After which a comparison of the two ROC curves was done. The general features of the chosen packages are summarized in Table 1. It is important to note that bootstrapping for retrieving 95% confidence intervals for specificities at given sensitivities could not be done in the pcvsuite package. As a result, bootstrapping was removed from non-parametric method in order to apply a fair comparison regarding the computational time criteria. See Appendix B for complete R code used.
3.1 Simulated biomarker data
The Cauchy distribution for both healthy and diseased population were matched to the previous normal distribution using two parameters, which were the median and the inter quantile range (IQR). Figure 1 shows the simulated empirical binormal and bi-cauchy ROC curves with a sample size n=10000. The Cauchy distribution successfully mimicked the realistic continuous biomarker data behavior. This was also estimated in a previous study in the biomarker data analysis . The bi-cauchy curve which was matched to the binormal distribution showed lack of performance in its shape by not following the binormal curve.
Moreover, visually the performance of the bi-cauchy curve seemed to lack convexity in certain areas when compared to the binormal ROC curve.
3.2 Evaluation results
As a symmetric distribution with an undefined mean, some loop results from the bi-cauchy distribution had extreme values. These extreme values gave error messages when trying to estimate an ROC curve in both methods. As a result, the simulated data from some loops had to be discarded from the analysis. Table 2 illustrates the proportions of failed runs (per 100) and the computed confidence intervals for each using the modified Wald method.
Sample size n=20
Comparing the two estimation methods visually, the parametric method appeared to experience some difficulty in smoothing the bi-cauchy curve for small sample size. On the other hand, non-parametric smoothed curve showed the desired convexity in its shape (figures 2(a) and 2(b)). Moreover, absolute differences of the likelihood ratios at points 0.1 and 0.5 were extremely large for the parametric method compared to non-parametric method as shown in table 3. The two estimation methods likelihood ratio absolute differences came close together at the lax threshold on the ROC curve at the point of 0.9 sensitivity. The lax threshold, represented in the upper right part of the curve, is where the operator usually had higher performance and minimal bias.
Sample size n=10000
For large sample size, parametric method showed superiority over non-parametric method in the absolute differences of the likelihood ratios at points 0.1 and 0.5 (table 3). Although the superiority was not by a great margin, but the convexity visual comparison of both curves shown in figures 2(c) and 2(d) favors the parametric method.
Table 4 shows the results of the subjective evaluation for the two used R software packages. Package pROC showed a clear ease of use compared to Pcvsuite. This was clearly noted in the output plots of the ROC curves, the user manual comprehensibly and the online solutions provided for common enquiries. As pcvsuite package is no longer maintained, an older version of R software was used (version 2.15.3) which lacked some upgrades found in recent R software versions. Moreover, adjustments had to be made for pcvsuite R code to calculate specificities at given sensitivities (Appendix B). Finally, the 100 loops computational time for non-parametric estimation method by pROC package was almost two minutes faster than parametric method by pcvsuite.
Bi-cauchy distribution matched the characteristics of realistic biomarker data when compared to a known truth . This was illustrated in the concave regions generated when an empirical ROC curve was plotted. For small size continuous biomarker data set, non-parametric methods estimated a reliable smooth and convex ROC curve while parametric methods failed to estimate a smooth and convex one. On the other hand, parametric methods were recommended for large sample size data sets.
These results gave further understanding of different ROC curve estimation methods. As smoothness and convexity of the ROC curve were not on wide research focus. Future studies could benefit from these results, in choosing the desired ROC curve estimation method based on the overall shape and smoothness of the estimated curve rather than the AUC only.
The results showing superiority of one estimation method over another were contradicting across the searched literature. This research results testing parametric and non-parametric methods supports previous studies as well as contradicts others. It was assumed that adding the semi-parametric method to the comparison could eliminate this contradiction and give more rigid conclusions. Previous study proved that non-parametric approach was conservative with small sample sizes less than 20 . On the other hand, this research had a contradicted interpretation on sample size and smoothing which was conservative against parametric methods. It is important to note that, few studies discussed the difference in smoothing and convexity between ROC estimation methods and none of them have used the same evaluation criteria of this study.
Although the method for choosing the parameters for the binormal distribution representing the known truth was scientifically based, this might be susceptible to selection bias. The chosen means and standard deviations might have affected the results. Although, extreme caution was taken to overcome selection bias, the presence of such bias cannot be eliminated completely from simulation studies in general. The calculated failed runs proportions were considered minimal despite the slight increase of failed runs reported for estimation of ROC curve using parametric method with small sample size when compared to the other methods. For the ease of use criteria, the subjective evaluation should have been made by a group of researchers and the mean scores from each person calculated. This was not applied in this research thus increasing the chance of bias.
An older version of R software was used for pcvsuite package which lacked some updated features found in the newer versions. As a result, the ease of use criteria might be susceptible to bias due to the different versions of R software used. Moreover, the bootstrapping option calculating confidence intervals for specificities and given sensitivities was not implemented in pcvsuite. As a result, bootstrapping was not added in the function of pROC package. Removing the bootstrapping option from analysis didn’t affect the results, but could have added more precision if used. Also, calculating specificities at given sensitivities function was not found. As a result, a modification was done for the R code in order to calculate the required coordinates. This attempt had larger error margin as some sensitivities given were rounded up and not exact figure also due to the inability to bootstrap confidence intervals for the calculated specificities. For more information on code adjustment see Appendix B.
A complete analysis of the ROC curve estimation methods could not be done without comparing with semi-parametric estimation methods. The difficulty in simulating the binormal and bi-cauchy distribution was the main reason for the withdrawal of ROCkit which use the semi-parametric ROC curve. Moreover, smoothing was not an option found in computer software with semi-parametric methods. As a result, semi-parametric methods were discarded from the analysis. Although, it is thought that it will serve as an optimum method for smoothing ROC curve, additional programming codes need to be done to estimate a smoothed semi-parametric ROC curve.
As for further work, other symmetric distribution could be chosen to represent the realistic biomarker data such as t-distribution. On the other hand, other computer software such as STATA can be used. This would have allowed more variability in the ease of use criteria. As the pcvsuite package used in this study on R software is still updated and maintained by the developer only for STATA software. Therefore, it might have had more flexible options for parametric ROC estimation methods.
The positive and negative population were simulated from a normal distribution. The means and standard deviations can differ for both populations in the binormal model. This distribution is the most common model for an ideal diagnostic system with continuous biomarker data and is easy to implement . For the previous two reasons, this distribution was chosen for ROC curve estimation. Moreover, representing the known truth, the estimated ROC curve acted as the comparison benchmark or the gold standard. Figure 3 shows 9 trials for choosing the reasonable parameters for both healthy and diseased population. All means and standard deviations were fixed expect for the mean of diseased population. The final parameters chosen were mean=10, standard deviation=10 for healthy population and mean=-5, standard deviation=10 for diseased population.
The positive and negative population were simulated from a Cauchy distribution . The Cauchy distribution had a symmetric nature which is matched to the normal distribution. In this manner, it will mimic the realistic data distribution. Moreover, it will stand as a good indicator to test how smoothing of the ROC curve is sensitive to departure from normality. Since the mean and standard deviation were undefined in a Cauchy distribution, the median and interquartile range (IQR) were used to match both distributions together. Figure 4 shows density plots for both binormal and bi-cauchy distribution. The extreme values for the symmetric nature of the bi-cauchy distribution can be noticed.
The first intention was using bootstrap resampling to calculate the estimates of the coordinates. The higher the number of bootstrap the more precise the estimate but with more time to compute. Previous studies commonly used 1000 bootstrap replicate and this method often yielded a significant estimate . For the purpose of this study and to obtain a good estimate of the second significant digit, 2000 bootstrap replicates were expected to be done as recommended from a previous study . As discussed in the main article, bootstrapping was difficult to calculate in pcvsuite package and was removed from the simulation function.
Tables 5 and 6 shows the functions used in each R software package.
The following is the descriptive R code used to run the simulations, plot ROC curves, calculate likelihood ratios and apply on the AFP biomarker data set.
#choosing parameters for binormal distribution representing the known truth
for (i in 1:9)
for(mu in seq (from=-10, to=30, length=9))
plot(density(cases),main="Normal Healthy and Diseased PDFs")
#normal distribution for diseased sample size 1000
#calculating median, IQR, mean and SD for normal distribution of diseased population
#cauchy distribution for cases sample size 1000
#normal distribution for healthy population sample size 1000
#calculating median, IQR, mean and SD for normal distribution of healthy population
#cauchy distribution for healthy population sample size 100
#combining cases and controls
#density plot for binormal and bi-cauchy distribution
#starts the time for the 100 loop
for(i in 1:100)
#simulate normal distribution for cases or diseased population
#simulate cauchy distribution for cases or diseased population
#simulate normal distribution for controls or healthy population
#simulate cauchy distribution for controls or healthy population
#combining datasets into binormal and bicauchy
#simulating the truth for each marker
#binormal and bicauchy ROC curves
#calculating specificities at given sensitivities for each ROC curve
a=c(ci.coords(smooth(roc1), x=c(0.1, 0.5, 0.9), input = "sensitivity", ret="specificity", boot.n=2))
b=c(ci.coords(smooth(roc2), x=c(0.1, 0.5, 0.9), input = "sensitivity", ret="specificity", boot.n=2))
#calculating Likelihood ratios from given coordinates
#measuring the time for 100 loops to finish
#exporting the output file to Excel sheet
#parametric method using pcvsuite package
for(i in 1:100)
#estimating parametric ROC curve
#retrieving specificities at given sensitivities for each ROC curve. Code adjusted to retrieve
# specificities at given sensitivities
#calculating likelihood ratios from given coordinates
#measuring the time for 100 loops to finish
#non-parametric method for estimating a smooth ROC curve using biomarker data
#parametric method for estimating a smooth ROC curve using biomarker data