Decision curves are a useful tool to evaluate the population impact of adopting a risk prediction instrument into clinical practice. Given one or more instruments (risk models) that predict the probability of a binary outcome, this package calculates and plots decision curves, which display estimates of the standardized net benefit by the probability threshold used to categorize observations as ‘high risk.’ Curves can be estimated using data from an observational cohort, or from case-control studies when an estimate of the population outcome prevalence is available.
Confidence intervals calculated using the bootstrap can be displayed and a wrapper function to calculate cross-validated curves using k-fold cross-validation is also provided.
Key functions are:
decision_curve
: Estimate (standardized) net benefit curves with bootstrap confidence intervals.
plot_decision_curve
: Plot a decision curve or multiple curves.
plot_clinical_impact
and plot_roc_components
: Alternative plots for the output of decision_curve
. See help files or tutorial for more info.
cv_decision_curve
: Calculate k-fold cross-validated estimates of decision curves.
The easiest way to get the package is directly from CRAN:
install.packages("DecisionCurve")
You may also download the current version of the package here:
https://github.com/mdbrown/DecisionCurve/releases
navigate to the source package and use
install.packages("../DecisionCurve_1.1.tar.gz", repos = NULL, type = "source")
or install the package directly from github using devtools.
## install.packages("devtools")
library(devtools)
install_github("mdbrown/DecisionCurve")
Load the package and the provided simulated data set.
library(DecisionCurve)
#load simulated data
data(dcaData)
head(dcaData)
## Age Female Smokes Marker1 Marker2 Cancer
## 1 33 1 FALSE 0.2453110 1.02108482 0
## 2 29 1 FALSE 0.9429660 -0.25576212 0
## 3 28 1 FALSE 0.7735939 0.33184401 0
## 4 27 0 FALSE 0.4063595 -0.00568574 0
## 5 23 1 FALSE 0.5075153 0.20753273 0
## 6 35 1 FALSE 0.1856706 1.41251020 0
First we use the function decision_curve
to create a decision curve object for a logistic model to predict cancer status using age, gender and smoking status. We then plot it using plot_decision_curve
.
set.seed(123)
#first use DecisionCurve with the default settings (set bootstraps = 50 here to reduce computation time).
baseline.model <- decision_curve(Cancer~Age + Female + Smokes, #fitting a logistic model
data = dcaData,
study.design = "cohort",
bootstraps = 50)
#plot the curve
plot_decision_curve(baseline.model, curve.names = "baseline model")
Next, we create a decision curve with two markers added to the original baseline model. We then pass a list with both decision curves to plot_decision_curve
to plot both curves.
set.seed(123)
full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData,
bootstraps = 50)
#since we want to plot more than one curve, we pass a list of 'DecisionCurve' objects to the plot
plot_decision_curve( list(baseline.model, full.model),
curve.names = c("Baseline model", "Full model"))
#see all available options using
?decision_curve
?plot_decision_curve
We include two other functions plot_roc_components
and plot_clinical_impact
.
plot_roc_components
plots the components of the ROC curve-true positive rate and false positive rates-over a range of high risk thresholds.
If we were to use the specified model to classify 1,000 hypothetical subjects, plot_clinical_impact
plots the number classified as high risk and the number with the outcome classified as high risk for a given high risk threshold.
#plot the components of the roc curve--true positive rate and false positive rate
plot_roc_components(full.model, xlim = c(0, 0.4),
col = c("black", "red"))
#plot the clinical impact
plot_clinical_impact(full.model, xlim = c(0, .4),
col = c("black", "blue"))
We show several examples of how one might change the default settings.
baseline.model <- decision_curve(Cancer~Age + Female + Smokes,
data = dcaData,
thresholds = seq(0, .4, by = .001),# calculate thresholds from 0-0.4 at every 0.001 increment.
bootstraps = 25)
full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData,
thresholds = seq(0, .4, by = .001),# calculate thresholds from 0-0.4 at every 0.001 increment.
bootstraps = 25)
plot_decision_curve( list(baseline.model, full.model),
curve.names = c("Baseline model", "Full model"),
col = c("blue", "red"),
lty = c(1,2),
lwd = c(3,2, 2, 1), # the first two correspond to the decision curves, then 'all' and then 'none'
legend.position = "bottomright") #adjust the legend position
plot_decision_curve( list(baseline.model, full.model),
curve.names = c("Baseline model", "Full model"),
col = c("blue", "red"),
confidence.intervals = FALSE, #remove confidence intervals
cost.benefit.axis = FALSE, #remove cost benefit axis
legend.position = "none") #remove the legend
plot_decision_curve( list(baseline.model, full.model),
curve.names = c("Baseline model", "Full model"),
col = c("blue", "red"),
cost.benefits = c("1:1000", "1:4", "1:9", "2:3", "1:3"), #set specific cost benefits
legend.position = "bottomright")
baseline.model <- decision_curve(Cancer~Age + Female + Smokes,
data = dcaData,
thresholds = seq(0, .4, by = .01),
confidence.intervals = 0.9, #calculate 90% confidence intervals
bootstraps = 25)
full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData,
thresholds = seq(0, .40, by = .01),
confidence.intervals = 0.9, #calculate 90% confidence intervals
bootstraps = 25)
plot_decision_curve( list(baseline.model, full.model),
curve.names = c("Baseline model", "Full model"),
col = c("blue", "red"),
ylim = c(-0.05, 0.15), #set ylim
lty = c(2,1),
standardize = FALSE, #plot Net benefit instead of standardized net benefit
legend.position = "topright")
If a risk model has already been specified, and so no model fitting is needed, the user can specify fitted.risks=TRUE
and provide them in the formula. No model fitting will be done and bootstrap confidence intervals will be conditional on the fitted model.
#helper function
expit <- function(xx) exp(xx)/ (1+exp(xx))
# Assume we have access to previously published models
# (or models built using a training set)
# that we can use to predict the risk of cancer.
# Basic model using demographic variables: Age, Female, Smokes.
dcaData$BasicModel <- with(dcaData, expit(-7.3 + 0.18*Age - 0.08*Female + 0.80*Smokes ) )
# Model using demographic + markers : Age, Female, Smokes, Marker1 and Marker2.
dcaData$FullModel <- with(dcaData, expit(-10.5 + 0.22*Age - 0.01*Female + 0.91*Smokes + 2.03*Marker1 - 1.56*Marker2))
full.model <- decision_curve(Cancer~FullModel,
data = dcaData,
fitted.risk = TRUE,
thresholds = seq(0, .4, by = .05),
bootstraps = 25)
plot_decision_curve(full.model, legend.position = "none")
DecisionCurve also outputs all calculations and point estimates when assigned to an object.
full.model <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData,
thresholds = seq(0, .4, by = .05),
bootstraps = 25)
The best way to look at point estimates (w/ci’s) is to use summary
.
summary(full.model) #outputs standardized net benefit by default
##
## Standardized Net Benefit (95% Confidence Intervals):
## ------------------------------------------------------------------------------------------------
## risk cost:benefit percent All Cancer ~ Age + Female + Smokes None
## threshold ratio high risk + Marker1 + Marker2
## ----------- -------------- ------------ ---------------- -------------------------------- ------
## 0 0:1 100 1 1 0
## (100, 100) (1, 1) (1, 1)
##
## 0.05 1:19 45.6 0.614 0.73 0
## (29.2, 55.4) (0.426, 0.692) (0.624, 0.817)
##
## 0.1 1:9 31 0.185 0.639 0
## (20.6, 37.6) (-0.212, 0.35) (0.5, 0.753)
##
## 0.15 3:17 25.6 -0.294 0.604 0
## (16.2, 31) (-0.924, -0.032) (0.472, 0.727)
##
## 0.2 1:4 21 -0.833 0.521 0
## (13.2, 25.6) (-1.726, -0.462) (0.333, 0.625)
##
## 0.25 1:3 16.4 -1.444 0.389 0
## (9.6, 21) (-2.635, -0.95) (0.31, 0.552)
##
## 0.3 3:7 14.2 -2.143 0.302 0
## (7.2, 18.4) (-3.673, -1.507) (0.104, 0.498)
##
## 0.35 7:13 11.4 -2.949 0.258 0
## (6, 16) (-4.872, -2.15) (0.101, 0.44)
##
## 0.4 2:3 8.2 -3.889 0.294 0
## (4.2, 14) (-6.27, -2.9) (0.021, 0.41)
## ------------------------------------------------------------------------------------------------
you can also choose which measure to report, and how many decimal places to print. The options for the measure
argument are ‘TPR’, ‘FPR’, ‘sNB’, and ‘NB’.
#output true positive rate
summary(full.model, nround = 2, measure = "TPR")
##
## Sensitivity (TPR) (95% Confidence Intervals):
## --------------------------------------------------------------------------------------
## risk cost:benefit percent All Cancer ~ Age + Female + Smokes None
## threshold ratio high risk + Marker1 + Marker2
## ----------- -------------- ------------ ------ -------------------------------- ------
## 0 0:1 100 1 1 0
## (100, 100) (1, 1) (1, 1)
##
## 0.05 1:19 45.6 1 0.88 0
## (29.2, 55.4) (1, 1) (0.81, 0.97)
##
## 0.1 1:9 31 1 0.83 0
## (20.6, 37.6) (1, 1) (0.75, 0.92)
##
## 0.15 3:17 25.6 1 0.83 0
## (16.2, 31) (1, 1) (0.72, 0.92)
##
## 0.2 1:4 21 1 0.77 0
## (13.2, 25.6) (1, 1) (0.6, 0.84)
##
## 0.25 1:3 16.4 1 0.63 0
## (9.6, 21) (1, 1) (0.51, 0.78)
##
## 0.3 3:7 14.2 1 0.57 0
## (7.2, 18.4) (1, 1) (0.35, 0.7)
##
## 0.35 7:13 11.4 1 0.5 0
## (6, 16) (1, 1) (0.29, 0.66)
##
## 0.4 2:3 8.2 1 0.45 0
## (4.2, 14) (1, 1) (0.23, 0.58)
## --------------------------------------------------------------------------------------
You can also access the data directly. full.model$derived.data
holds the net benefit and it’s components.
head(full.model$derived.data)
## thresholds FPR TPR NB sNB rho prob.high.risk
## 1 0.00 1.0000000 1.0000000 0.12000000 1.0000000 0.12 1.000
## 2 0.05 0.3977273 0.8833333 0.08757895 0.7298246 0.12 0.456
## 3 0.10 0.2386364 0.8333333 0.07666667 0.6388889 0.12 0.310
## 4 0.15 0.1772727 0.8333333 0.07247059 0.6039216 0.12 0.256
## 5 0.20 0.1340909 0.7666667 0.06250000 0.5208333 0.12 0.210
## 6 0.25 0.1000000 0.6333333 0.04666667 0.3888889 0.12 0.164
## DP model FPR_lower
## 1 0.120 Cancer ~ Age + Female + Smokes + Marker1 + Marker2 1.00000000
## 2 0.106 Cancer ~ Age + Female + Smokes + Marker1 + Marker2 0.23799127
## 3 0.100 Cancer ~ Age + Female + Smokes + Marker1 + Marker2 0.14628821
## 4 0.100 Cancer ~ Age + Female + Smokes + Marker1 + Marker2 0.10262009
## 5 0.092 Cancer ~ Age + Female + Smokes + Marker1 + Marker2 0.08296943
## 6 0.076 Cancer ~ Age + Female + Smokes + Marker1 + Marker2 0.05676856
## FPR_upper TPR_lower TPR_upper NB_lower NB_upper sNB_lower sNB_upper
## 1 1.0000000 1.0000000 1.0000000 0.08400000 0.14600000 1.0000000 1.0000000
## 2 0.4941452 0.8139535 0.9701493 0.05368421 0.11463158 0.6242350 0.8174342
## 3 0.3044496 0.7500000 0.9166667 0.04800000 0.10688889 0.5000000 0.7527387
## 4 0.2295082 0.7209302 0.9154930 0.04294118 0.10317647 0.4718137 0.7265949
## 5 0.1733021 0.6041667 0.8448276 0.03200000 0.08500000 0.3333333 0.6250000
## 6 0.1328671 0.5116279 0.7758621 0.02666667 0.06933333 0.3100775 0.5517241
## rho_lower rho_upper prob.high.risk_lower prob.high.risk_upper DP_lower
## 1 0.084 0.146 1.000 1.000 0.084
## 2 0.084 0.146 0.292 0.554 0.070
## 3 0.084 0.146 0.206 0.376 0.068
## 4 0.084 0.146 0.162 0.310 0.062
## 5 0.084 0.146 0.132 0.256 0.056
## 6 0.084 0.146 0.096 0.210 0.044
## DP_upper cost.benefit.ratio
## 1 0.146 0:1
## 2 0.132 1:19
## 3 0.130 1:9
## 4 0.130 3:17
## 5 0.118 1:4
## 6 0.102 1:3
If data is from a case-control study instead of an observational cohort, an estimate of the population level outcome prevalence is needed to calculate decision curves. Decision curves can be calculated by setting study.design = "case-control"
and setting the population.prevalence
. Once the decision_curve
is calculated, all other calls to the plot functions remain the same. Note that bootstrap sampling is done stratified by outcome status for these data.
#simulated case-control data with same variables as above
data(dcaData_cc)
#estimated from the population where the
#case-control sample comes from.
population.rho = 0.11
baseline.model_cc <- decision_curve(Cancer~Age + Female + Smokes,
data = dcaData_cc,
thresholds = seq(0, .4, by = .01),
bootstraps = 25,
study.design = "case-control",
population.prevalence = population.rho)
full.model_cc <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData_cc,
thresholds = seq(0, .4, by = .01),
bootstraps = 25,
study.design = "case-control",
population.prevalence = population.rho)
plot_decision_curve( list(baseline.model_cc, full.model_cc),
curve.names = c("Baseline model", "Full model"),
col = c("blue", "red"),
lty = c(1,2),
lwd = c(3,2, 2, 1),
legend.position = "bottomright")
We provide a wrapper to perform k-fold cross-validation to obtain bias corrected decision curves. Once cv_decision_curve
is called, all plot and summary functions work the same as shown above for decision_curve
output. Confidence interval calculation is not available at this time for cross-validated curves.
full.model_cv <- cv_decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData,
folds = 5,
thresholds = seq(0, .4, by = .01))
full.model_apparent <- decision_curve(Cancer~Age + Female + Smokes + Marker1 + Marker2,
data = dcaData,
thresholds = seq(0, .4, by = .01),
confidence.intervals = 'none')
plot_decision_curve( list(full.model_apparent, full.model_cv),
curve.names = c("Apparent curve", "Cross-validated curve"),
col = c("red", "blue"),
lty = c(2,1),
lwd = c(3,2, 2, 1),
legend.position = "bottomright")