Title: | Non-Homogeneous Markov and Hidden Markov Multistate Models |
---|---|
Description: | Fits non-homogeneous Markov multistate models and misclassification-type hidden Markov models in continuous time to intermittently observed data. Implements the methods in Titman (2011) <doi:10.1111/j.1541-0420.2010.01550.x>. Uses direct numerical solution of the Kolmogorov forward equations to calculate the transition probabilities. |
Authors: | Andrew Titman [aut, cre] |
Maintainer: | Andrew Titman <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.1 |
Built: | 2024-11-05 06:14:08 UTC |
Source: | https://github.com/cran/nhm |
Outputs the matrix of misclasification probabilities in a misclassification type hidden Markov multi-state model fitted using nhm
.
ematrix.nhm(object, covvalue=NULL)
ematrix.nhm(object, covvalue=NULL)
object |
Fitted model object produced using |
covvalue |
Optional vector of covariate vectors (should be given in the order specified in the |
The emat_nhm
function used to fit the model is called to obtain the values of the misclassification probabilities at the supplied times for the supplied covariate value.
Returns a list containing a matrix of misclassification probabilities and a matrix of corresponding standard errors computed using the delta method.
Andrew Titman [email protected]
nhm
, plot.nhm
, predict.nhm
, qmatrix.nhm
The observed states and associated observation times for 1000 patients simulated from a 4 state process non-homogeneous Markov model
data("example_data1")
data("example_data1")
A data frame with 3861 rows and 5 variables:
Observed state at the time of observation
Time at which the observation occurred
Patient identification number
Binary covariate
Continuous covariate
The observed states and associated observation times for 1000 patients simulated from a 4 state process non-homogeneous Markov model with misclassification to adjacent transient states.
data("example_data1")
data("example_data1")
A data frame with 3864 rows and 5 variables:
Observed state at the time of observation
Time at which the observation occurred
Patient identification number
Binary covariate
Continuous covariate
Outputs the vector of initial state probabilities in a misclassification type hidden Markov multi-state model fitted using nhm
.
initialprob.nhm(object, covvalue=NULL)
initialprob.nhm(object, covvalue=NULL)
object |
Fitted model object produced using |
covvalue |
Optional vector of covariate vectors (should be given in the order specified in the |
The initp_nhm
function used to fit the model is called to obtain the values of the initial state vector at the supplied times for the supplied covariate value.
Returns a list containing a vector of initial state probabilities and a corresponding vector of standard errors computed using the delta method.
Andrew Titman [email protected]
Sets up a model object in preparation for fitting a non-homogeneous Markov or misclassification type hidden Markov multi-state model.
model.nhm(formula, data, subject, covariates=NULL, type, trans, nonh=NULL, covm=NULL, centre_time=NULL, emat=NULL, ecovm=NULL, firstobs=NULL, initp=NULL, initp_value=NULL, initcovm=NULL, splinelist=NULL,degrees=NULL,censor=NULL, censor.states=NULL,death=FALSE,death.states=NULL,intens=NULL)
model.nhm(formula, data, subject, covariates=NULL, type, trans, nonh=NULL, covm=NULL, centre_time=NULL, emat=NULL, ecovm=NULL, firstobs=NULL, initp=NULL, initp_value=NULL, initcovm=NULL, splinelist=NULL,degrees=NULL,censor=NULL, censor.states=NULL,death=FALSE,death.states=NULL,intens=NULL)
formula |
A formula identifying the state and time variables within |
data |
data frame containing the observed states, observation times, subject identifiers and covariates. Should include initial observation/recruitment times. |
subject |
Name of the subject identifier variable within the |
covariates |
A character vector giving the variable names of the covariates to be used in the model |
type |
type of intensity model.
|
trans |
Square matrix of viable transitions with dimension equal to the number of states. Impossible transitions should be 0. Others should be labelled consecutively from 1. Labelling transitions with the same value assumes the parameter is shared. |
nonh |
Square matrix to indicate non-homogeneous transitions with dimension equal to the number of states. Impossible transitions or homogeneous transitions should be 0. Otherwise label consecutively from 1. Labelling the same value implies the same non-homogeneity. Not required if |
covm |
Either a named list of nstate x nstates indicating the covariate effects with respect to a particular covariate OR
an nstate x nstate x ncov array to indicate covariate effects, where ncov is the length of the supplied |
centre_time |
Value by which to centre time for Gompertz models. By default the model is of the form |
emat |
Square matrix of viable misclassification errors. Must be supplied if the model has misclassification. Impossible errors should be 0. Others should be labelled consecutively. Labelling the same implies a common parameter on the logit scale. |
ecovm |
Either a named list of nstate x nstates indicating the covariate effects with respect to a particular covariate OR
an nstate x nstate x ncov array to indicate indicate covariate effects on misclassification, where ncov is the length of the supplied |
firstobs |
For misclassification models: Form of the first observation for each subject in the data.
|
initp |
For misclassification models: Numerical vector of length nstate to define the model for the initial probabilities. The first entry should be zero. Should be numbered consecutively. If the same number is repeated implies a shared parameter. If absent then initial probabilities taken from |
initp_value |
For misclassification models where |
initcovm |
For misclassification models; Either a named list of vectors of length nstate, or an nstate x ncovs matrix to specify the covariate effects on misclassification probabilities. 0 implies no covariate effect. Otherwise label consecutively from 1. Labelling the same value implies a common covariate effect. |
splinelist |
For bspline models only: list (of length equal to the number of nonhomogeneous transitions) of knot point locations including the boundary knots. |
degrees |
For bspline models only: optional vector (of length equal to number of nonhomogeneous transitions) of degrees of splines. Defaults to 3 if not specified. |
censor |
Vector of censor state indicators in the data. Note that censored observations can only occur as the last observation for a subject. |
censor.states |
List of vectors of states in which subject occupy if censored by corresponding censor state indicator. Can be a vector if only one censor state marker is present. |
death |
Setting |
death.states |
Vector specifying which states have exact death times. Should only correspond to absorbing states. |
intens |
Optional supplied intensity function. See below for details. |
The function allows the model to be specified and creates the metadata needed to use nhm
to fit it. The function automatically generates a function intens
which defines the generator matrix of the model and its first derivatives as a function of time t
, covariates z
and the underlying parameters x
, provided the model is of Weibull, Gompertz or B-spline type.
Alternatively, type='bespoke'
can be chosen. In which case it is necessary for the user to supply a function intens
. This must have arguments t, z, x
and return a list consisting of a component q
which is the nstate x nstate generator matrix, and dq
which is the nstate x nstate x nparQ first derivatives of the generator matrix with respect to the parameters of the model, where nparQ is the number of parameters in the model for the intensities only (excludes parameters for the emat or initp). Since unrestricted maximization is used so the parameters must take values on -Inf, Inf
. Note that using a hard-coded version via type='bespoke'
can be substantially faster than the analogous automatically generated function, so for large models or datasets it may be advantageous to code directly.
For misclassification type models, the function also automatically creates functions emat_nhm
and initp_nhm
, to allow the misclassification probability matrix and the initial probability vectors and their derivatives to be calculated at given parameter and covariate values. In each case, a multinomial logistic regression is used for the covariate model. User specification of the misclassification probability function or initial probability vector is not currently possible.
Returns an object of class nhm_model
containing the necessary metadata needed to use nhm
to fit the model.
Andrew Titman [email protected]
Fit a continuous-time Markov or hidden Markov multi-state model by maximum likelihood. Observations of the process can be made at arbitrary times, or the exact times of transition between states can be known. Covariates can be fitted to the Markov chain transition intensities or to the hidden Markov observation process.
nhm(model_object, initial=NULL, gen_inits=FALSE, control, score_test=FALSE, fixedpar=NULL)
nhm(model_object, initial=NULL, gen_inits=FALSE, control, score_test=FALSE, fixedpar=NULL)
model_object |
Model object created using |
initial |
Vector of initial parameter values |
gen_inits |
If |
control |
Object of class |
score_test |
If |
fixedpar |
Numerical vector indicating which parameters are taken as fixed at the value specified by |
For more details about the methodology behind the nhm package, see Titman (2011) and the package vignette.
By default returns an object of class nhm
containing model output data such as the estimated parameters, maximized likelihood value, information matrix etc. The object can be used with print
, predict
, plot
and anova
.
If score.test=TRUE
then returns an object of class nhm_score
. See print.nhm_score
for more details.
Andrew Titman [email protected]
Titman AC. Flexible Nonhomogeneous Markov Models for Panel Observed Data. Biometrics, 2011. 67, 780-787.
model.nhm
, nhm.control
, plot.nhm
, predict.nhm
, print.nhm_score
### Example dataset ### For further examples, see the vignette trans <- rbind(c(0,1,0,0),c(0,0,2,0),c(0,0,0,3),rep(0,4)) nonh <- rbind(c(0,1,0,0),c(0,0,2,0),c(0,0,0,3),rep(0,4)) gomp_model <- model.nhm(state~time, data=example_data1, subject = id, type="gompertz",trans=trans,nonh=nonh) initial_val <- c(-0.65,-0.45,-0.55,0,0,0) gomp_fit <- nhm(gomp_model,initial=initial_val,control=nhm.control(obsinfo=FALSE)) gomp_fit plot(gomp_fit) plot(gomp_fit,what="intensities")
### Example dataset ### For further examples, see the vignette trans <- rbind(c(0,1,0,0),c(0,0,2,0),c(0,0,0,3),rep(0,4)) nonh <- rbind(c(0,1,0,0),c(0,0,2,0),c(0,0,0,3),rep(0,4)) gomp_model <- model.nhm(state~time, data=example_data1, subject = id, type="gompertz",trans=trans,nonh=nonh) initial_val <- c(-0.65,-0.45,-0.55,0,0,0) gomp_fit <- nhm(gomp_model,initial=initial_val,control=nhm.control(obsinfo=FALSE)) gomp_fit plot(gomp_fit) plot(gomp_fit,what="intensities")
This is used to set various logical or numeric parameters controlling a non-homogeneous Markov model fit. Usually to be used within a call to nhm
.
nhm.control(tmax=NULL, coarsen=FALSE, coarsen.vars=NULL, coarsen.lv=NULL, checks=FALSE,rtol=1e-6, atol=1e-6, fishscore=NULL, linesearch=FALSE, damped=FALSE, damppar=0,obsinfo=TRUE,splits=NULL,ncores=1,print.level=2, maxLikcontrol=NULL)
nhm.control(tmax=NULL, coarsen=FALSE, coarsen.vars=NULL, coarsen.lv=NULL, checks=FALSE,rtol=1e-6, atol=1e-6, fishscore=NULL, linesearch=FALSE, damped=FALSE, damppar=0,obsinfo=TRUE,splits=NULL,ncores=1,print.level=2, maxLikcontrol=NULL)
tmax |
Optional parameter to set the maximum time to which the Kolmogorov Forward equations should be integrated. Defaults to 1+max(time) if left unspecified. |
coarsen |
If |
coarsen.vars |
Vector of the index of covariates which require coarsening. Must be supplied if |
coarsen.lv |
Number of unique covariate values to which the covariates should be coarsened. |
checks |
If |
rtol |
Relative error tolerance to be passed to lsoda, default is 1e-6 |
atol |
Absolute error tolerance to be passed to lsoda, default is 1e-6 |
fishscore |
If |
linesearch |
If |
damped |
If |
damppar |
Numerical damping parameter to be applied if |
obsinfo |
If |
splits |
Optional vector of intermediate split times for solving the ODEs. Only needed if P(0,t) becomes singular for some t causing the optimization to stop. Should be a set of consecutive values less than tmax. |
ncores |
Number of cores to use. 1= no parallelization, 2 or more: Uses |
print.level |
For |
maxLikcontrol |
For |
tmax
, rtol
and atol
refer directly to parameters with the lsoda
function in deSolve
and relate to how the Kolmogorov Forward Equations are numerically solved.
coarsen
, coarsen.vars
and coarsen.lv
are useful in situations where it is computationally infeasible (or unattractive) to compute the exact solution for all covariate patterns. Implements an approximate solution in which the covariates are coarsened using K-means clustering (as proposed in Titman (2011)).
linesearch
, damped
, damppar
are specific to the Fisher scoring algorithm.
Setting obsinfo=TRUE
will tend to give more accurate standard error estimates and gives more opportunity to check for non-convergence of the maximum likelihood procedure.
The option splits
modifies the way in which the transition probabilities are computed. By default, nhm
solves a single system of differential equations starting from 0 to obtain and then uses inversion of the Chapman-Kolmogorov equation
to find
for a given
. In some cases
will be singular or effectively singular. If a split is specified at
then
nhm
will find for
by solving the system of equations
where
is the smallest interval start time greater than or equal to
within the data. If
nhm
fails due to the lack of split times, the error message will advise on the interval in which the split should be introduced. Note that the need for splits can also arise if the initial parameters specified are inappropriate. It may often be better to find more appropriate initial parameter estimates,for instance by fitting the analogous homogeneous model in msm
, rather than adding multiple split times.
ncores
allows parallel processing to be used, through the parallel package, to simultaneously solve the systems of differential equations for each covariate pattern. If ncores > 1
then ncores
defines the mc.cores
value in mclapply
. Note that the data needs to include multiple covariate patterns for this to successfully increase computation speed.
A list containing the values of each of the above constants
Andrew Titman [email protected]
Titman AC. Flexible Nonhomogeneous Markov Models for Panel Observed Data. Biometrics, 2011. 67, 780-787.
Produces plots of the transition probabilites or intensities from a non-homogeneous Markov or misclassification type hidden Markov multi-state model fitted using nhm
.
## S3 method for class 'nhm' plot(x, what="probabilities",time0=0, state0=1, times=NULL, covvalue=NULL, ci=TRUE, sim=FALSE, coverage=0.95, B=1000, rtol=1e-6, atol=1e-6, main_arg=NULL, xlab="Time", ...)
## S3 method for class 'nhm' plot(x, what="probabilities",time0=0, state0=1, times=NULL, covvalue=NULL, ci=TRUE, sim=FALSE, coverage=0.95, B=1000, rtol=1e-6, atol=1e-6, main_arg=NULL, xlab="Time", ...)
x |
Fitted model object produced using |
what |
Character string to indicate what should be plotted. Options are |
time0 |
Starting time from which to compute the transition probabilities or intensities. Defaults to 0. |
state0 |
Starting state from which to compute the transition probabilities. Defaults to 1. Not required for transition intensities |
times |
Optional vector of times at which to compute the transition probabilities or intensities. If omitted, the probabilities/intensities will be computed at a sequence of times of length 100 from |
covvalue |
Optional vector of covariate vectors (should be given in the order specified in the |
ci |
If |
sim |
If |
coverage |
Coverage level (should be a value between 0 and 1) for the confidence intervals. Defaults to 0.95. |
B |
Number of simulations to be performed to compute the simulation Delta method. |
rtol |
Relative tolerance parameter to be used by |
atol |
Absolute tolerance parameter to be used by |
main_arg |
Character string specifying beginning of title to be given to each of the plot panes generated. |
xlab |
Character string specifying x-axis label to be given to each plot. |
... |
Other items to be passed to the function. Currently not used. |
Computation is performed by calling predict.nhm
, for the transition probabilities, or qmatrix.nhm
for the intensities (see for more details).
Generates a multi-pane plot for each state. If values are required they can be obtained using predict.nhm
.
Andrew Titman [email protected]
Mandel M. Simulation-based confidence intervals for functions with complicated derivatives. 2013. The American Statistician, 67. 76-81.
Outputs the transition probabilites from a non-homogeneous Markov or misclassification type hidden Markov multi-state model fitted using nhm
.
## S3 method for class 'nhm' predict(object, time0=0, state0=1, times=NULL, covvalue=NULL, ci=TRUE, sim=FALSE, coverage=0.95, B=1000, rtol=1e-6, atol=1e-6, ...)
## S3 method for class 'nhm' predict(object, time0=0, state0=1, times=NULL, covvalue=NULL, ci=TRUE, sim=FALSE, coverage=0.95, B=1000, rtol=1e-6, atol=1e-6, ...)
object |
Fitted model object produced using |
time0 |
Starting time from which to compute the transition probabilities. Defaults to 0. |
state0 |
Starting state from which to compute the transition probabilities. Defaults to 1. |
times |
Optional vector of times at which to compute the transition probabilities. If omitted, the probabilities will be computed at a sequence of times from |
covvalue |
Optional vector of covariate vectors (should be given in the order specified in the |
ci |
If |
sim |
If |
coverage |
Coverage level (should be a value between 0 and 1) for the confidence intervals. Defaults to 0.95. |
B |
Number of simulations to be performed to compute the simulation Delta method. |
rtol |
Relative tolerance parameter to be used by |
atol |
Absolute tolerance parameter to be used by |
... |
Other items to be passed to the function. Currently not used. |
The same approach as in the main nhm
function of numerically solving the system of differential equations is used to compute transition probabilities based on the maximum likelihood estimates found in nhm
and assuming a specific vector of covariates.
If the simulation delta method approach is specified then the function will generate B
parameter vectors from the asymptotic distribution of the MLE and solve the system of equations for each of them, before finding pointwise percentile bootstrap confidence intervals from them.
Returns a list containing the vector of times at which the probabilities are computed, a matrix of probabilities for each state at each of the times. If confidence intervals are requested then the lower and upper limits are also provided.
If transition intensity (as opposed to probability) estimates are required then qmatrix.nhm
should be used.
Andrew Titman [email protected]
Mandel M. Simulation-based confidence intervals for functions with complicated derivatives. 2013. The American Statistician, 67. 76-81.
Print output from a score test based on parameters supplied to nhm
with score_test=TRUE
specified.
## S3 method for class 'nhm_score' print(x, which_comp = NULL, ...)
## S3 method for class 'nhm_score' print(x, which_comp = NULL, ...)
x |
An object of class |
which_comp |
Optional vector to specify which of the parameters are to be tested. If omitted, the function will assume all parameters governing non-homogeneity are to be tested. Must be supplied if |
... |
Other parameters to be supplied. Currently ignored. |
The function provides usable output from specifying score_test=TRUE
when using nhm
. It is most useful to provide a quick(er) test of whether there may be non-homogeneity in a specific model. Note that the model assumes the initial parameters correspond to the constrained maximum likelihood estimate (for instance a model with all the parameters relating to time homogeneity).
The method can be used to compute the local score tests of homogeneity proposed by de Stavola (1988) if type="gompertz"
is specified in nhm
.
If fisherscore=TRUE
in nhm
then the expected Fisher information is used. Otherwise, the empirical mean of the squared gradient terms (as used in the BHHH algorithm) is used to estimate the information.
Prints the results of a score test.
Andrew Titman [email protected]
de Stavola BL. Testing Departures from Time Homogeneity in Multistate Markov Processes. Journal of the Royal Statistical Society: Series C (Applied Statistics) 1988. 37. 242-250.
Outputs the transition intensities from a non-homogeneous Markov or misclassification type hidden Markov multi-state model fitted using nhm
.
qmatrix.nhm(object, time0=0, times=NULL, covvalue=NULL, ci=TRUE, sim=FALSE, coverage=0.95, B=1000)
qmatrix.nhm(object, time0=0, times=NULL, covvalue=NULL, ci=TRUE, sim=FALSE, coverage=0.95, B=1000)
object |
Fitted model object produced using |
time0 |
Starting time from which to compute the transition intensities. Defaults to 0. |
times |
Optional vector of times at which to compute the transition intensities. If omitted, the intensities will be computed at a sequence of times from |
covvalue |
Optional vector of covariate vectors (should be given in the order specified in the |
ci |
If |
sim |
If |
coverage |
Coverage level (should be a value between 0 and 1) for the confidence intervals. Defaults to 0.95. |
B |
Number of simulations to be performed to compute the simulation Delta method. |
The intens
function used to fit the model is called to obtain the values of the transition intensities at the supplied times for the supplied covariate value.
If the simulation delta method approach is specified then the function will generate B
parameter vectors from the asymptotic distribution of the MLE and compute the intensities for each of them, before finding pointwise percentile bootstrap confidence intervals from them.
Returns a list containing the vector of times at which the intensities are computed, a matrix of probabilities for each state at each of the times. If confidence intervals are requested then the lower and upper limits are also provided.
If transition probability (as opposed to intensity) estimates are required then predict.nhm
should be used.
Andrew Titman [email protected]
Mandel M. Simulation-based confidence intervals for functions with complicated derivatives. 2013. The American Statistician, 67. 76-81.