Model structure and setup

We will consider an illness-death economic model in which patients can transition between three health states: healthy, sick, and death. The economic model will be analyzed using the R programming language. In this analysis, we will use seven R packages. The diagram package will be used to visualize the illness-death model. flexsurv and mstate are used to estimate a multi-state statistical model needed to parameterize the economic model. hesim is used to simulate the economic model conditional on the estimated parameters. BCEA is used to analyze the output (i.e., quality-adjusted life-years (QALYs) and costs) produced by hesim. Finally, ggplot2 and scales are used to visualize the results.

We begin by loading the required packages.

library("diagram")
library("flexsurv")
library("mstate")
library("hesim")
library("BCEA")
library("ggplot2")
library("scales")

We can visualize the illness-death model by creating a transition matrix that describes the 3 possible health states and 4 possible transitions. The matrix is a square-matrix where the (i,j) element is a positive integer if a transition from i to j is possible and NA otherwise.

tmat <- rbind(c(NA, 1, 2),
              c(3, NA, 4),
              c(NA, NA, NA))
print(tmat)

##      [,1] [,2] [,3]
## [1,]   NA    1    2
## [2,]    3   NA    4
## [3,]   NA   NA   NA

diagram::plotmat(t(tmat), name = c("Healthy", "Sick", "Death"),
                 pos = c(1, 2))

The full cost-effectiveness analysis is conducted in three steps:

Parameter estimation: Multi-state statistical models are used to characterize transitions between health states. In addition, quality-of-life and cost values are estimated for each health state.
Simulation: Once the model is parameterized, time spent in each health state is simulated, which is, in turn, used to compute QALYs and costs.
Decision analysis: Given the simulated output of the model, we summarize the probabilistic sensitivity analysis (PSA) and compute quantities of interest such as cost per QALY.

Before proceeding, we define the population, treatment strategies, and modeling approach. We will evaluate 2 competing treatment strategies and model health state transitions with a “clock-reset” multi-state statistical model, which implies that the hazard functions reset each time a patient enters a new health state. Since we are using a clock-reset model, we can only estimate health state occupancy using an individual patient simulation. 1,000 patients are simulated to ensure that our simulated means are stable. The patients, treatment strategies, and health states are specified with hesim_data() from the hesim package.

strategies <- data.frame(strategy_id = c(1, 2))
patients <- data.frame(patient_id = seq(1, 1000))
states <- data.frame(state_id = seq(1, 2),
                     state_name = c("Healthy", "Sick"))
hesim_dat <- hesim_data(strategies = strategies,
                        patients = patients,
                        states = states)

Parameter estimation

Three sets of parameters must be estimated to parameterize the economic model: the parameters of the multi-state model of the health state transitions, the utility values, and the cost values. Since we would like to estimate the impact of parameter uncertainty on the cost-effectiveness results with a probabilistic sensitivity analysis (PSA), we take a Bayesian (or quasi-Bayesian) approach in that we estimate the joint probability distribution of all model parameters.

Multi-state model

Before fitting the multi-state model, we inspect the observed transitions in the available data. The data shows that there are 532 and 544 transitions in which patients begin in state 1 (healthy) and state 2 (sick), respectively. From the healthy state, 274 patients transition to the healthy state, 104 die, and 154 remain in the healthy state at the end of follow-up; likewise, in the sick state, 314 patients recover and move back to the healthy state while 188 die. In total, 292 patients die.

mstate_data <- hesim::ctstm3_exdata$transitions
class(mstate_data) <- c("msdata", "data.frame")
attr(mstate_data, "trans") <- tmat
mstate::events(mstate_data)

## $Frequencies
##     to
## from   1   2   3 no event total entering
##    1   0 274 104      154            532
##    2 314   0 188       42            544
##    3   0   0   0      292            292
## 
## $Proportions
##     to
## from          1          2          3   no event
##    1 0.00000000 0.51503759 0.19548872 0.28947368
##    2 0.57720588 0.00000000 0.34558824 0.07720588
##    3 0.00000000 0.00000000 0.00000000 1.00000000

Now that the data has been examined we can fit parametric multi-state models to the data. Although parametric models are needed to extrapolate outcomes beyond available follow-up time with the economic model, it is a good idea to fit non-parametric models as well so that we can assess the impact of our modeling assumptions. flexsurv allows for a number of parametric survival models including exponential, Weibull, Gompertz, gamma, lognormal, log-logistic, generalized gamma, generalized F, and survival splines. For brevity we will use the flexible generalized gamma distribution.

# Non-parametric models
cox_fits <- survival::coxph(Surv(years, status) ~ strata(trans) + strategy_id, 
                            data = mstate_data) 

# Parametric models
n_trans <- max(tmat, na.rm = TRUE) # Number of transitions
ggamma_fits <- vector(length = n_trans, mode = "list") 
for (i in 1:length(ggamma_fits)){
  ggamma_fits[[i]] <- flexsurv::flexsurvreg(Surv(years, status) ~ factor(strategy_id), 
                                            data = mstate_data, 
                                            subset = (trans == i) , 
                                            dist = "gengamma") 
}

The non-parametric and parametric model fits can be compared using the cumulative hazard functions. Specifically, we plot the cumulative hazard functions for each of the four possible transitions: healthy to sick, healthy to death, sick to healthy, and sick to death.

# Non-parametric 
cox_cumhaz <- mstate::msfit(cox_fits, 
                            newdata = data.frame(strategy_id = 1, strata = 1:n_trans),
                            trans = tmat, variance = FALSE)
max_time <- max(cox_cumhaz$Haz$time) # Maximum follow-up time

# Parametric
ggamma_cumhaz <- flexsurv::msfit.flexsurvreg(ggamma_fits, 
                                             newdata = data.frame(strategy_id = 1),
                                             t = seq(1, max_time, by = .01),
                                             trans = tmat, variance = FALSE)

# Plot to compare
cumhaz_data <- rbind(data.frame(cox_cumhaz$Haz,
                                model = "Cox"),
                     data.frame(ggamma_cumhaz$Haz,
                                model = "Generalized gamma"))
cumhaz_data$trans <- factor(cumhaz_data$trans,
                            levels = seq(1, 4),
                            labels = c("Healthy -> Sick",
                                       "Healthy -> Death",
                                       "Sick -> Healthy",
                                       "Sick -> Death"))
ggplot2::ggplot(cumhaz_data, aes(x = time, y = Haz, col = model, linetype = model)) +
  geom_line() + facet_wrap(~trans) + 
  scale_color_discrete(name = "") +
  scale_linetype_discrete(guide = FALSE) +
  xlab("Years") + ylab("Cumulative hazard") + theme_minimal()

Costs and utility

Although cost and utility values could in principle be estimated directly from the data, we consider the common case in which they are based on estimates from the literature. For instance, suppose that there are two cost categories: medical and drug. Drug costs are $10,000 per for treatment strategy 1 and $12,500 per year for treatment strategy 2. Suppose we know drug costs with certainty so they are constant across samples in the PSA.

drugcost_tbl <- hesim::stateval_tbl(tbl = data.frame(strategy_id = strategies$strategy_id,
                                                     est = c(10000, 12500)),
                                    dist = "fixed",
                                    hesim_data = hesim_dat)

Unlike drug costs, medical costs vary across health states but not across treatment strategies. Since medical costs are typically right skewed, they often follow a gamma distribution. Here, we assume that medical costs are gamma distributed and that we have data on their mean and standard error. We use methods of moments to recover the underlying parameters of the gamma distribution, which in turn, used to sample mean medical costs from the gamma distribution.

medcost_tbl <- hesim::stateval_tbl(data.frame(state_id = states$state_id,
                                              mean = c(800, 1500),
                                              se = c(100, 150)),
                                   dist = "gamma",
                                   hesim_data = hesim_dat)

We take a similar approach for utility. Specifically, we assume that utility (which typically ranges from 0 to 1) follows a beta distribution. We use the method of moments to recover the parameters of the beta distribution from the mean and standard error, and sample utility values from the beta distribution.

utility_tbl <- hesim::stateval_tbl(data.frame(state_id = states$state_id,
                                              mean = c(0.90, 0.55),
                                              se = c(.1, .2)),
                                  dist = "beta",
                                  hesim_data = hesim_dat)

Simulation

Given parameter estimates for the health state transitions, costs, and utility, the economic model can be simulated. 1,000 Monte Carlo simulations are used for the PSA (i.e., random samples of the parameters from their probability distributions).

n_samples <- 1000

We can then setup the model for disease progression which simulates the multi-state trajectory. The object trans_data stores the input data used to simulate health state transitions across treatment strategies and patients with the fitted generalized gamma multi-state model. Parameters from the posterior distribution of the gamma distribution are simulated in a quasi-Bayesian manner by drawing from multi-variate normal distributions.

trans_data <- hesim::expand(hesim_dat, by = c("strategies", "patients"))
ggamma_fits <- hesim::flexsurvreg_list(ggamma_fits)
transmod <- hesim::create_IndivCtstmTrans(ggamma_fits, 
                                          input_data = trans_data, trans_mat = tmat,
                                          n = n_samples)

We can also setup models for costs and utility using the parameter estimates and probability distributions specified in the tables above. Each model is a “mean” model that simulates state values (e.g., utility or costs) by treatment strategy, patient, and health state.

# Medical cost model
medcostmod <- hesim::create_StateVals(medcost_tbl, n = n_samples)

# Drug cost model
drugcostmod <- hesim::create_StateVals(drugcost_tbl, n = n_samples)

# Utility model
utilitymod <- hesim::create_StateVals(utility_tbl, n = n_samples)

Once (statistical) models for the health state transitions, costs, and utility have been created, the economic model can be simulated. We use an individual-level continuous time state transition model (CTSTM) to simulate disease progression, often referred to as a semi-Markov multi-state model.

ictstm <- hesim::IndivCtstm$new(trans_model = transmod,
                                utility_model = utilitymod,
                                cost_models = list(drugs = drugcostmod,
                                                   medical = medcostmod))

We first simulate disease progression by simulating the multi-state trajectory across patients by treatment strategy. Since the simulation is written in C++, we are able to simulate 1,000 random samples of the parameters for the PSA, 1,000 patients, and 2 treatment strategies quite quickly. We summarize the results of the simulation by computing the simulated probabilities of state occupancy over time.

ictstm$sim_disease()
ictstm$sim_stateprobs(t = seq(0, 20))

# Plotting
stateprob_means <- ictstm$stateprobs_[, .(prob = mean(prob)), 
                                      by = c("strategy_id", "state_id", "t")]
ggplot2::ggplot(stateprob_means, 
                aes(x = t, y = prob, col = factor(strategy_id))) +
  geom_line() + facet_wrap(~factor(state_id)) + 
  scale_color_discrete(name = "") +
  xlab("Years") + ylab("Probability") + theme_minimal()

Given simulated disease progression, we can compute (discounted) costs and QALYs assuming a 3 percent discount rate. We “summarize” the results so that we have a posterior distribution of costs and QALYs stratified by treatment strategy.

ictstm$sim_qalys(dr = .03)
ictstm$sim_costs(dr = .03)
ce_output <- ictstm$summarize()
head(ce_output$costs)

head(ce_output$qalys)

Decision analysis

To analyze the results and perform a decision analysis (i.e., Bayesian cost-effectiveness analysis) with the BCEA package, we store our summaries of clinical benefit (e.g., QALYs) and costs in the matrices qalys_mat and costs_mat.

qalys_mat <- matrix(ce_output$qalys$qalys, nrow = n_samples,
                    byrow = TRUE)
costs_mat <- matrix(ce_output$costs[category == "total", costs], nrow = n_samples,
                    byrow = TRUE)
colnames(qalys_mat) <- colnames(costs_mat) <- c("Strategy 1", "Strategy 2")

Each row is a simulated parameter set from the PSA and each column is a unique treatment strategy.

head(qalys_mat)

##      Strategy 1 Strategy 2
## [1,]   6.235999   6.315122
## [2,]  10.267582  10.240878
## [3,]   7.164399   6.746049
## [4,]   7.265146   8.051308
## [5,]  12.202031  13.048700
## [6,]  10.452891  11.160129

head(costs_mat)

##      Strategy 1 Strategy 2
## [1,]   82047.21  103036.86
## [2,]  129605.59  159266.78
## [3,]   85803.31   97673.35
## [4,]   88195.38  117648.69
## [5,]  157484.32  206009.07
## [6,]  125952.47  162479.66

Given this output, we can use the function bcea to summarize the results of the economic model and compute suitable measures of “cost-effectiveness”.

bcea_out <- BCEA::bcea(e = qalys_mat, c = costs_mat, ref = 2, Kmax = 100000)

It is then straightforward to produce commonly used graphs. For instance, we can plot the cost-effectiveness plane, which compares intervention 2 with intervention 1.

ceplane.plot(bcea_out, graph = "ggplot2", wtp = 50000) +
  theme_minimal()

We can also plot the cost-effectiveness acceptability curve which measures the probability that intervention 2 is cost-effective relative to intervention 1 at various willingness to pay thresholds.

ceac.plot(bcea_out, graph = "ggplot2") + theme_minimal() +
  scale_x_continuous(labels = comma)

Finally, we can plot the expected value of perfect information (EVPI), which is a measure of the maximum amount that a decision maker would be willing to pay to reduce all uncertainty in all parameters. The dashed lines represent the break even point, or, the point where an optimal decision changes from intervention 1 to intervention 2.

evi.plot(bcea_out, graph = "ggplot2") + theme_minimal() +
   scale_x_continuous(labels = comma)

It is also possible to compute the value of reducing uncertainty in a subset of parameters (i.e., the expected value of partial perfect information (EVPPI)) with the function evppi. Furthermore, the R package EVSI can be used to compute the expected value of sample information (EVSI).