Observation models for discrete data

Purpose

Count, category or event data models are defined as an observation model in the DEFINITION: block of the [LONGITUDINAL] section section of the Mlxtran file. Count, categorical or event data models cannot be specified in Monolix user interface and must therefore be defined in the Mlxtran file definition.

Observation model for count data

Use of count data

Longitudinal count data is a special type of longitudinal data that can take only nonnegative integer values {0, 1, 2, …} that come from counting something, e.g., the number of seizures, hemorrhages or lesions in each given time period . In this context, data from individual i is the sequence y_i=(y_{ij},1\leq j \leq n_i) where y_{ij} is the number of events observed in the jth time interval I_{ij}.
Count data models can also be used for modeling other types of data such as the number of trials required for completing a given task or the number of successes (or failures) during some exercise. Here, y_{ij} is either the number of trials or successes (or failures) for subject i at time t_{ij}. For any of these data types we will then model y_i=(y_{ij},1\leq j \leq n_i) as a sequence of random variables that take their values in {0, 1, 2, …}.  If we assume that they are independent, then the model is completely defined by the probability mass functions \mathbb{P}(y_{ij}=k) for k \geq 0 and 1 \leq j \leq n_i. Here, we will consider only parametric distributions for count data.

Observation model syntax

Considering the observations as a sequence of conditionally independent random variables, the model is completely defined by the probability mass functions P(y_{ij}=k). An observation variable for count data is defined using the type count. Its additional field is:

  • P(Y=k): Probability of a given count value k, for the observation named Y. k is a natural number. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The bounded variable k supersedes in this scope any predefined variable k.

Example

In the proposed example, the Poisson distribution is used for defining the distribution of y_j:

y_j \sim \textrm{Poisson}(\lambda_j)

where the Poisson intensity \lambda_j is function of time \lambda_j = a+b*t_j. This model is implemented as follows

[LONGITUDINAL]
input = {a,b}

EQUATION:
lambda = a+b*t

DEFINITION:
y = {type=count, P(y=k)=exp(-lambda)*(lambda^k)/factorial(k)}

Observation model for categorical ordinal data

Use of categorical data

Assume now that the observed data takes its values in a fixed and finite set of nominal categories \{c_1, c_2,\ldots , c_K\}. Considering the observations (y_{ij},\, 1 \leq j \leq n_i) for any individual i as a sequence of conditionally independent random variables, the model is completely defined by the probability mass functions \mathbb{P}(y_{ij}=c_k | \psi_i) for k=1,\ldots, K and 1 \leq j \leq n_i. For a given (i,j), the sum of the K probabilities is 1, so in fact only K-1 of them need to be defined. In the most general way possible, any model can be considered so long as it defines a probability distribution, i.e., for each k, \mathbb{P}(y_{ij}=c_k | \psi_i) \in [0,1], and \sum_{k=1}^{K} \mathbb{P}(y_{ij}=c_k | \psi_i) =1. Ordinal data further assume that the categories are ordered, i.e., there exists an order \prec such that

c_1 \prec c_2,\prec \ldots \prec c_K .

We can think, for instance, of levels of pain (low \prec moderate \prec severe) or scores on a discrete scale, e.g., from 1 to 10. Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities \mathbb{P}(y_{ij} \preceq c_k | \psi_i) for k=1,\ldots ,K-1, or in the other direction: \mathbb{P}(y_{ij} \succeq c_k | \psi_i) for k=2,\ldots, K. Any model is possible as long as it defines a probability distribution, i.e., it satisfies

0 \leq \mathbb{P}(y_{ij} \preceq c_1 | \psi_i) \leq \mathbb{P}(y_{ij} \preceq c_2 | \psi_i)\leq \ldots \leq \mathbb{P}(y_{ij} \preceq c_K | \psi_i) =1 .

It is possible to introduce dependence between observations from the same individual by assuming that (y_{ij},\,j=1,2,\ldots,n_i) forms a Markov chain. For instance, a Markov chain with memory 1 assumes that all that is required from the past to determine the distribution of y_{ij} is the value of the previous observation y_{i,j-1}., i.e., for all k=1,2,\ldots ,K,

\mathbb{P}(y_{ij} = c_k\,|\,y_{i,j-1}, y_{i,j-2}, y_{i,j-3},\ldots,\psi_i) = \mathbb{P}(y_{ij} = c_k | y_{i,j-1},\psi_i).

Observation model syntax

Considering the observations as a sequence of conditionally independent random variables, the model is again completely defined by the probability mass functions
P(y_{ij}=c_k) for each category. For a given j, the sum of the K probabilities is 1, so in fact only K-1 of them need to be defined. The distribution of ordered categorical data can be defined in the block DEFINITION: of the Section [LONGITUDINAL] using either the probability mass functions. Ordinal data further assume that the categories are ordered: c_1 \leq c_2 \leq ... \leq c_K. Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities P(y_j \leq c_k) for k from 1 to K-1 or the cumulative logits \textrm{logit}(P(y_j \leq c_k)) for k from 1 to K-1. An observation variable for ordered categorical data is defined using the type categorical. Its additional fields are:

  • categories: List of the available ordered categories. They are represented by increasing successive integers.
  • P(Y=i): Probability of a given category integer i, for the observation named Y. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are defined following the order of their categories. They can be provided for events where the category is a boundary, instead of an exact match. All boundaries must be of the same kind. Such an event is denoted by using a comparison operator. When the value of a probability can be deduced from others, its definition can be spared.

Example

In the proposed example, we use 4 categories and the model is implemented as follows

[LONGITUDINAL]
input =  {th1, th2, th3}

DEFINITION: 
level = { type = categorical, categories = {0, 1, 2, 3},
logit(P(level<=0)) = th1
logit(P(level<=1)) = th1 + th2
logit(P(level<=2)) = th1 + th2 + th3}

Observation model for categorical data modeled as a discrete Markov chain

Use of categorical data modeled as a Markov chain

In the previous categorical model, the observations were considered as independent for individual i. It is however possible to introduce dependence between observations from the same individual assuming that (y_{ij})_{j=1,.., n_i} forms a Markov chain.

Observation model syntax

An observation variable for ordered categorical data modeled as a discrete Markov chain is defined using the type categorical, along with the dependence definition Markov. Its additional fields are:

  • categories: List of the available ordered categories. They are represented by increasing successive integers. It is defined right after type.
  • P(Y_1=i): Initial probability of a given category integer i, for the observation named Y. This probability belongs to the first observed value. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are defined following the order of their categories. They can be provided for events where the category is a boundary, instead of an exact match. All boundaries must be of the same kind. Such an event is denoted by using a comparison operator. When the value of a probability can be deduced from others, its definition can be spared. The initial probabilities are optional as a whole, and the default initial law is uniform.
  • P(Y=j|Y_p=i): Probability of transition to a given category integer j from a previous category i, for the observation named Y. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are grouped by law of transition for each previous category i. Each law of transition provides the various transition probabilities of reaching j. They can be provided for events where the reached category j is a boundary, instead of an exact match. All boundaries must be of the same kind for a given law. Such an event is denoted by using a comparison operator. When the value of a transition probability can be deduced from others within its law, its definition can be spared.

Example

An example where we define an observation model for this case is proposed here

[LONGITUDINAL]
input = {a1, a2, a11, a12, a21, a22, a31, a32}

DEFINITION:
State = { type = categorical, categories = {1,2,3}, dependence = Markov
P(State_1=1) = a1
P(State_1=2) = a2
logit(P(State<=1|State_p=1)) = a11
logit(P(State<=2|State_p=1)) = a11+a12
logit(P(State<=1|State_p=2)) = a21
logit(P(State<=2|State_p=2)) = a21+a22
logit(P(State<=1|State_p=3)) = a31
logit(P(State<=2|State_p=3)) = a31+a32}

Observation model for a categorical data modeled as a continuous Markov chain

Observation model syntax

An observation variable for ordered categorical data modeled as a continuous Markov chain is also defined using the type categorical, along with the dependence definition Markov. But here transition rates are defined instead of transition probabilities. Its additional fields are:

  • categories: List of the available ordered categories. They are represented by increasing successive integers. It is defined right after type.
  • P(Y_1=i): Initial probability of a given category integer i, for the observation named Y. This probability belongs to the first observed value. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are defined following the order of their categories. They can be provided for events where the category is a boundary, instead of an exact match. All boundaries must be of the same kind. Such an event is denoted by using a comparison operator. When the value of a probability can be deduced from others, its definition can be spared. The initial probabilities are optional as a whole, and the default initial law is uniform.
  • transitionRate(i,j): Transition rate departing from a given category integer i and arriving to a category j. They are grouped by law of transition for each departure category i. One definition of transition rate can be spared by law of transition, as they must sum to zero.

Example

An example where we define an observation model for this case is proposed here

[LONGITUDINAL]
input={p1, q12, q21}

DEFINITION:
State = { type = categorical, categories = {1,2}, dependence = Markov
P(State_1=1) = p1
transitionRate(1,2) = q12
transitionRate(2,1) = q21}

Observation model for time-to-event data

Use of time-to-event data

Here, observations are the “times at which events occur”. An event may be one-off (e.g., death, hardware failure) or repeated (e.g., epileptic seizures, mechanical incidents, strikes). Several functions play key roles in time-to-event analysis: the survival, hazard and cumulative hazard functions. We are still working under a population approach here so these functions, detailed below, are thus individual functions, i.e., each subject has its own. As we are using parametric models, this means that these functions depend on individual parameters (\psi_i).

  • The survival function S(t, \psi_i) gives the probability that the event happens to individual i after time t>t_{\text{start}}:

    S(t,\psi_i) \ \ = \ \ \mathbb{P}(T_i>t;\psi_i) .

  • The hazard function h(t,\psi_i) is defined for individual i as the instantaneous rate of the event at time t, given that the event has not already occurred:

    h(t,\psi_i) \ \ = \ \ \lim_{dt\to 0} \frac{S(t,\psi_i) - S(t + dt,\psi_i)}{ S(t,\psi_i) \, dt} .

    This is equivalent to

    h(t,\psi_i) \ \ = \ \ -\frac{d}{dt} \log{S(t,\psi_i)} .

  • Another useful quantity is the cumulative hazard function H(a,b;\psi_i), defined for individual i as

H(a,b;\psi_i) \ \ = \ \ \int_a^b h(t,\psi_i) \, dt .

Note that S(t,\psi_i) \ \ = \ \ e^{-H(t_{\text{start}},t;\psi_i)}. Then, the hazard function h(t,\psi_i) characterizes the problem, because knowing it is the same as knowing the survival function S(t,\psi_i). The probability distribution of survival data is therefore completely defined by the hazard function.

Observation model syntax

<pstyle=”text-align: justify;”>An observation variable for time-to-event or repeated time to event data is defined using the type event. Its additional fields are:

  • eventType: Type of the events. The exact time of the events can be observed, or censored per interval. The respective keywords are exact and intervalCensored. By default, an exact time is assumed.
  • maxEventNumber: Maximum number of events. It is useful for simulation only, and by default the number of simulated events is unlimited.
  • rightCensoringTime: Right censoring time of events. It is useful for simulation only, and by default it is the actual time of the last record.
  • intervalLength: Length of censoring intervals. It is useful for simulation only, and by default it is the tenth part of the global length.
  • hazard: Hazard function.

Example

An example where we define an observation model for this case is proposed here

[LONGITUDINAL]
input={gamma, V, Cl}

EQUATION:
Cc = pkmodel(V,Cl)

DEFINITION:
Seizure = {type = event, eventType = intervalCensored, maxEventNumber = 1, 
rightCensoringTime = 120, intervalLength = 10, hazard = gamma*Cc}