`hesim`

`hesim`

is designed to facilitate the development of high performance health-economic simulation models and the analysis of the output of said models. A decision model is built by specifying a model structure, which consists of three submodels that are each a function of parameters and input data: a set of statistical models for predicting or simulating disease progression, a model for utility values, and set of cost models for each potential cost category (e.g., costs of medical care, drug costs). The disease model is used to model the probability of being in given disease state and the utility and costs models are used to model the costs and utility associated with a health state.

Probabilistic sensitivity analysis (PSA) is supported by default, as all modeled outcomes are based on sampled values of the parameters from appropriate probability distributions or via bootstrapping. Furthermore, predictions and simulations are made by patient according to the patient level covariates contained in the input data. `hesim`

therefore also provides functions for decision analysis and quantifying the output of a PSA at the individual or subgroup level.

The statistical models comprising a model structure are characterized by a distribution over outcomes, \(y\), which depend on model parameters, \(\alpha_1, \ldots, \alpha_L\),

\[ \begin{aligned} P(y| \alpha_1(x_1), \ldots, \alpha_L(x_L)). \end{aligned} \] The \(l\)th model parameter can depend on a vector of coefficients, \(\beta_l\), and an input vector, \(x_l\), of covariates through linked transformed linear models \(g(\alpha_l(x_l)) = x_l^T \beta_l\). The outcomes, \(y\), can be predicted using the mean of the distribution, \(E(y|\cdot)\), or sampled with a random draw from the probability distribution, \(P(y|\cdot)\).

In `hesim`

, the sampled values of the coefficients for each parameter are generally stored in matrices (or a single matrix if the model only has one parameter), where the columns of each matrix are covariates and the rows are random samples. Parameters for statistical models are stored in objects prefixed by `params_`

(e.g., `params_surv`

for a survival model, `params_lm`

for a linear model). A `params_`

prefixed object can be created in one of two ways:

- With a function of the same name such as
`params_surv()`

or`params_lm()`

- By using the generic function
`form_params()`

to create objects from fitted statistical models

This flexibility is provided so that parameters can either be estimated using models fit with `R`

or from an external source. To illustrate, consider the shape and scale parameters of a Weibull survival model, which are stored in a `params_surv`

object (note that `hesim`

accepts Weibull distributions by the name “weibull” or “weibull.quiet”).

```
library("hesim")
library("flexsurv")
n.samples <- 3
# Create from fitted R model
fit.wei <- flexsurv::flexsurvreg(formula = Surv(endpoint1_time, endpoint1_status) ~ female + age,
data = part_surv4_simdata$survival, dist = "weibull")
params.wei <- form_params(fit.wei, n = n.samples)
print(params.wei)
```

```
## $coefs
## $coefs$shape
## shape
## [1,] 0.05861593
## [2,] 0.04914213
## [3,] 0.07080777
##
## $coefs$scale
## scale female age
## [1,] -0.05304737 0.03381455 -0.009452243
## [2,] 0.24290353 0.04026665 -0.014003036
## [3,] -0.04245734 0.01539563 -0.010515485
##
##
## $dist
## [1] "weibull.quiet"
##
## $n_samples
## [1] 3
##
## attr(,"class")
## [1] "params_surv"
```

```
# Create with 'params_surv()'
coefs <- flexsurv::normboot.flexsurvreg(fit.wei, B = n.samples, raw = TRUE)
params.wei2 <- params_surv(coefs = list(shape = coefs[, "shape", drop = FALSE],
scale = coefs[, c("scale", "female", "age")]),
dist = "weibull")
print(params.wei2)
```

```
## $coefs
## $coefs$shape
## shape
## [1,] 1.058807
## [2,] 1.031021
## [3,] 1.051617
##
## $coefs$scale
## scale female age
## [1,] 0.9912089 0.04617189 -0.01050913
## [2,] 1.0393153 -0.03508900 -0.01073399
## [3,] 1.0178921 0.02626816 -0.01029823
##
##
## $dist
## [1] "weibull"
##
## $n_samples
## [1] 3
##
## attr(,"class")
## [1] "params_surv"
```

The input vectors for each parameter, \(x_{l,hijk}\), are rows in multidimensional input matrices, \(X_l\), where each row denotes a unique observation. The input matrix for parameter \(l\) is indexed by health-related indices, \(h\), patients, \(i\), and treatment lines \(j\) for a treatment strategy \(k\). Example health-related indices include a health state, a transition between two health states, or a survival endpoint in a partitioned survival model. In some cases, the health-related index \(h\) can be suppressed and separate models can be fit for each health index. This is, for instance, the case in a partitioned survival model where separate models are fit for each survival endpoint. Likewise, models can be fit without multiple treatment lines as would, again, be the case in a partitioned survival analysis where sequential treatment would be incorporated by adding additional health states.

In `hesim`

, simulations are made by constructing an object of class `input_data`

with the function `input_data()`

, which consists of an input matrix \(X\) and the indices \(h\), \(i\), \(j\), and \(k\). We illustrate in the figure below with an example in which a statistical model with one parameter is used to make predictions across two treatment strategies, two patients, and two health states. (If a statistical model with more than one parameter were used, then a separate input matrix would need to be created for each parameter.)

`hesim`

provides helper functions for constructing objects of class `input_data`

. Specifically, an `input_data`

object can be created by following a three step process.

- Using
`hesim_data()`

to create a collection of data tables (i.e., a`data.frame`

or`data.table`

) of class`hesim_data`

for storing relevant simulation data (e.g., a treatment strategy table, a patient table, a health state table), which would contain any variables needed for prediction with a statistical model - Using
`expand_hesim_data()`

to combine some or all of the tables created by`hesim_data()`

into a single long dataset of class`expanded_hesim_data`

with dimensions required for predicting or simulating values with a particular submodel - Using the dataset in Step 2 to create an object of class
`input_data`

with`form_input_data()`

The figure below illustrated this process using the same example above with two treatment strategies, two patients, and two health states.

To illustrate with `R`

, consider a simple partitioned survival model where the only survival endpoint is overall survival, so that there is only one (non-death) health state. Furthermore, suppose that there are two treatment strategies and two patients. A `hesim_data`

object can be created as follows.

```
dt.strategies <- data.frame(strategy_id = c(1, 2))
dt.patients <- data.frame(patient_id = seq(1, 2),
age = c(45, 50),
female = c(0, 1))
dt.states <- data.frame(state_id = 1,
state_name = "Alive",
stringsAsFactors = FALSE)
hesim.dat <- hesim_data(strategies = dt.strategies,
patients = dt.patients,
states = dt.states)
print(hesim.dat)
```

```
## $strategies
## strategy_id
## 1 1
## 2 2
##
## $patients
## patient_id age female
## 1 1 45 0
## 2 2 50 1
##
## $states
## state_id state_name
## 1 1 Alive
##
## attr(,"class")
## [1] "hesim_data"
```

The `hesim_data`

object can subsequently be “expanded” over the dimensions of interest using `expand_hesim_data()`

. The new `expanded_hesim_data`

object must contain the covariates used in the model fit as well as id variables indexing the dimensions of the data. Given a model object or formula, we extract the input matrix from the `expanded_hesim_data`

object using the generic function `form_input_data()`

, which returns an `input_data`

object containing the input matrix as well as the relevant id variables.

As with parameter estimation, an `input_data`

object can be created in one of two ways:

- From a model fit using R
- Based on the covariates from a model fit outside of
`R`

In case 2 a `formula`

or `formula_list`

object can be used in lieu of a fitted model. To illustrate, suppose we would like to predict survival curves by treatment strategy and patient using our fitted Weibull model. We first expand our `hesim_data`

object using the strategies and patients tables so that we have a dataset where each row is a unique treatment strategy and patient. We then use `form_input_data()`

to create (i) a model matrix, `X`

, for each parameter (the shape parameter only includes an intercept while the scale parameter includes an age covariate and a dummy variable for gender) and (ii) `strategy_id`

and `patient_id`

variables indicating the strategy and patient represented by each row in the list of matrices. In addition, the number of unique strategies and patients being modeled are returned based on the id variables specified.

```
# Predict survival curves by treatment strategy and patient
## From a model fit with flexsurvreg
hesim.edata.surv <- expand_hesim_data(hesim.dat, by = c("strategies", "patients"))
input.dat.wei <- form_input_data(fit.wei, data = hesim.edata.surv)
print(input.dat.wei)
```

```
## $X
## $X$shape
## (Intercept)
## 1 1
## 2 1
## 3 1
## 4 1
## attr(,"assign")
## [1] 0
##
## $X$scale
## (Intercept) female age
## 1 1 0 45
## 2 1 1 50
## 3 1 0 45
## 4 1 1 50
## attr(,"assign")
## [1] 0 1 2
##
##
## $strategy_id
## [1] 1 1 2 2
##
## $n_strategies
## [1] 2
##
## $patient_id
## [1] 1 2 1 2
##
## $n_patients
## [1] 2
##
## attr(,"class")
## [1] "input_data"
```

```
## From a formula object
formula.list.wei <- formula_list(shape = formula(~1),
scale = formula(~ female + age))
input.dat.wei2 <- form_input_data(formula.list.wei, data = hesim.edata.surv)
```

We can create input data to make predictions by treatment strategy, patient, and health state (e.g., for our utility and cost models) in a similar fashion.

```
# Predict state values by treatment strategy, patient, and health state
hesim.edata.statevals <- expand_hesim_data(hesim.dat, by = c("strategies", "patients", "states"))
formula.costs <- formula(~ 1 + female)
print(form_input_data(formula.costs, data = hesim.edata.statevals))
```

```
## $X
## (Intercept) female
## 1 1 0
## 2 1 1
## 3 1 0
## 4 1 1
## attr(,"assign")
## [1] 0 1
##
## $strategy_id
## [1] 1 1 2 2
##
## $n_strategies
## [1] 2
##
## $patient_id
## [1] 1 2 1 2
##
## $n_patients
## [1] 2
##
## $state_id
## [1] 1 1 1 1
##
## $n_states
## [1] 1
##
## attr(,"class")
## [1] "input_data"
```

A decision model is created by combining the disease, utility, and cost models. Each submodel is, in turn, created by combining input data and parameter estimates. For example, in a partitioned survival model there are three submodels: a set of survival models, a utility model, and a set of cost models for different types of costs. The survival models are `R6`

objects of class `PartSurvCurves`

and used to predict the survival curves in a partitioned survival analysis. The cost and utility models are `R6`

objects of class `PartSurvStateValues`

and used to predict the values to assign to the health states.

Again, as with the model parameters and input data, there are two ways to instantiate each submodel:

- Based on parameters and input data using
`$new()`

- From fitted statistical models using generic functions prefixed by
`form_`

(e.g.,`form_PartSurvCurves()`

,`form_PartSurvStateVals()`

).

We illustrate by instantiating a `PartSurvCurves`

object in both ways.

```
# Instantiate from a 'partsurvfit' object
partsurvfit.wei <- partsurvfit(flexsurvreg_list(fit.wei), data = part_surv4_simdata$survival)
part.surv.curves <- form_PartSurvCurves(partsurvfit.wei, data = hesim.edata.surv,
n = n.samples, bootstrap = FALSE)
# Instantiate with $new()
params.wei.list <- params_surv_list(params.wei)
input.dat.wei.list <- form_input_data(formula_list(formula.list.wei),
data = hesim.edata.surv)
part.surv.curves2 <- PartSurvCurves$new(params = params_surv_list(params.wei),
data = input.dat.wei.list)
```

Submodels such as `PartSurvCurves`

have a number of member functions that can be used for prediction, simulation, or to evaluate model fit. For example, we can summarize survival curves from a `PartSurvCurves`

object with functions to predict hazards, cumulative hazards, survival, restricted mean survival time, and quantiles.

```
partsurvfit.wei <- partsurvfit(flexsurvreg_list(fit.wei), data = part_surv4_simdata$survival)
part.surv.curves <- form_PartSurvCurves(partsurvfit.wei, data = hesim.edata.surv,
n = n.samples, bootstrap = FALSE)
head(part.surv.curves$quantile(.5))
```

```
## curve sample strategy_id patient_id p quantile
## 1: 1 1 1 1 0.5 0.4364805
## 2: 1 1 1 2 0.5 0.4307200
## 3: 1 1 2 1 0.5 0.4364805
## 4: 1 1 2 2 0.5 0.4307200
## 5: 1 2 1 1 0.5 0.4259796
## 6: 1 2 1 2 0.5 0.4524522
```

`head(part.surv.curves$cumhazard(t = seq(0, 3)))`

```
## curve sample strategy_id patient_id t cumhazard
## 1: 1 1 1 1 0 0.000000
## 2: 1 1 1 1 1 1.632087
## 3: 1 1 1 1 2 3.339709
## 4: 1 1 1 1 3 5.077052
## 5: 1 1 1 2 0 0.000000
## 6: 1 1 1 2 1 1.654640
```

`head(part.surv.curves$rmst(t = 3))`

```
## curve sample strategy_id patient_id t rmst
## 1: 1 1 1 1 3 0.6106811
## 2: 1 1 1 2 3 0.6028601
## 3: 1 1 2 1 3 0.6106811
## 4: 1 1 2 2 3 0.6028601
## 5: 1 2 1 1 3 0.5774410
## 6: 1 2 1 2 3 0.6126176
```

We can instantiate a `PartSurvStateVals`

object in the same two ways, although here we only use the second method. A `$predict()`

function is available to predict mean state values by treatment strategy, patient, and health state.

```
medcost.dat <- subset(part_surv4_simdata$costs$medical, state_name == "state3")
costs.medical.fit <- stats::lm(costs ~ female, data = medcost.dat)
part.surv.medcosts <- form_PartSurvStateVals(costs.medical.fit,
data = hesim.edata.statevals, n = n.samples)
head(part.surv.medcosts$predict())
```

```
## state_id sample strategy_id patient_id value
## 1: 1 1 1 1 32100.88
## 2: 1 1 1 2 32823.56
## 3: 1 1 2 1 32100.88
## 4: 1 1 2 2 32823.56
## 5: 1 2 1 1 35107.92
## 6: 1 2 1 2 35602.00
```

By combining submodels, a decision model is able to simulate disease progression and cost and utility values associated with each disease state. Total discounted costs and QALYs are calculated by summing (or integrating) simulated costs and utility over time. In the partitioned survival example, disease progression is simulated by first generating survival curves and then calculating health states probabilities based on those curves. Cost and QALYs are calculated by numerically integrating the weighted probability of being in each state, where weights are a function of the discount factorand predicted state values.

```
times <- seq(0, 3, .01)
part.surv <- PartSurv$new(survival_models = part.surv.curves,
cost_models = list(medical = part.surv.medcosts))
part.surv$sim_survival(t = times)
part.surv$sim_stateprobs()
part.surv$sim_costs(dr = c(0, .03))
head(part.surv$costs_)
```

```
## state_id sample strategy_id patient_id dr type costs
## 1: 1 1 1 1 0 medical 19603.75
## 2: 1 1 1 2 0 medical 19788.38
## 3: 1 1 2 1 0 medical 19603.75
## 4: 1 1 2 2 0 medical 19788.38
## 5: 1 2 1 1 0 medical 20273.05
## 6: 1 2 1 2 0 medical 21810.69
```

Once costs and QALYS have been calculated, the PSA output can be summarized with measures commonly used for technology assessment including:

- cost-effectiveness planes
- cost-effectiveness acceptability curves (CEACs)
- the expected value of perfect information (EVPI)

Moreover, since patient level outcomes are modeled, output can be summarized by patient or subgroup. These “individualized cost-effectiveness analyses” are performed with the `icea()`

function. A detailed explanation is given here.

`hesim`

currently supports N-state partitioned survival analysis; however, the package is being actively developed and support for state transition modeling and individual patient simulation will be added. In addition, we plan to add support for additional statistical techniques such as generalized linear models.