## Purpose

Count, category or event data models are defined as an observation model in the DEFINITION: block of the [LONGITUDINAL] section section of the Mlxtran file. Count, categorical or event data models cannot be specified in Monolix user interface and must therefore be defined in the Mlxtran file definition.

- Observation model for count data
- Observation model for categorical ordinal data
- Observation model for categorical data modeled as a discrete Markov chain
- Observation model for categorical data modeled as a continuous Markov chain
- Observation model for time-to-event data

## Observation model for count data

*Use of count data*

Longitudinal count data is a special type of longitudinal data that can take only nonnegative integer values {0, 1, 2, …} that come from counting something, e.g., the number of seizures, hemorrhages or lesions in each given time period . In this context, data from individual is the sequence where is the number of events observed in the th time interval .

Count data models can also be used for modeling other types of data such as the number of trials required for completing a given task or the number of successes (or failures) during some exercise. Here, is either the number of trials or successes (or failures) for subject at time . For any of these data types we will then model as a sequence of random variables that take their values in {0, 1, 2, …}. If we assume that they are independent, then the model is completely defined by the *probability mass functions* for and . Here, we will consider only parametric distributions for count data.

*Observation model syntax*

Considering the observations as a sequence of conditionally independent random variables, the model is completely defined by the probability mass functions . An observation variable for count data is defined using the type count. Its additional field is:

- P(Y=k): Probability of a given count value k, for the observation named Y. k is a natural number. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The bounded variable k supersedes in this scope any predefined variable k.

*Example*

In the proposed example, the Poisson distribution is used for defining the distribution of :

where the Poisson intensity is function of time . This model is implemented as follows

[LONGITUDINAL] input = {a,b} EQUATION: lambda = a+b*t DEFINITION: y = {type=count, P(y=k)=exp(-lambda)*(lambda^k)/factorial(k)}

## Observation model for categorical ordinal data

*Use of categorical data*

Assume now that the observed data takes its values in a fixed and finite set of nominal categories . Considering the observations for any individual as a sequence of conditionally independent random variables, the model is completely defined by the probability mass functions for and . For a given , the sum of the probabilities is 1, so in fact only of them need to be defined. In the most general way possible, any model can be considered so long as it defines a probability distribution, i.e., for each , , and . Ordinal data further assume that the categories are ordered, i.e., there exists an order such that

We can think, for instance, of levels of pain (low moderate severe) or scores on a discrete scale, e.g., from 1 to 10. Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities for , or in the other direction: for . Any model is possible as long as it defines a probability distribution, i.e., it satisfies

It is possible to introduce dependence between observations from the same individual by assuming that forms a Markov chain. For instance, a Markov chain with memory 1 assumes that all that is required from the past to determine the distribution of is the value of the previous observation ., i.e., for all ,

*Observation model syntax*

Considering the observations as a sequence of conditionally independent random variables, the model is again completely defined by the probability mass functions

for each category. For a given j, the sum of the K probabilities is 1, so in fact only K-1 of them need to be defined. The distribution of ordered categorical data can be defined in the block DEFINITION: of the Section [LONGITUDINAL] using either the probability mass functions. Ordinal data further assume that the categories are ordered: . Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities for k from 1 to K-1 or the cumulative logits for k from 1 to K-1. An observation variable for ordered categorical data is defined using the type categorical. Its additional fields are:

- categories: List of the available ordered categories. They are represented by increasing successive integers.
- P(Y=i): Probability of a given category integer , for the observation named Y. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are defined following the order of their categories. They can be provided for events where the category is a boundary, instead of an exact match. All boundaries must be of the same kind. Such an event is denoted by using a comparison operator. When the value of a probability can be deduced from others, its definition can be spared.

*Example*

In the proposed example, we use 4 categories and the model is implemented as follows

[LONGITUDINAL] input = {th1, th2, th3} DEFINITION: level = { type = categorical, categories = {0, 1, 2, 3}, logit(P(level<=0)) = th1 logit(P(level<=1)) = th1 + th2 logit(P(level<=2)) = th1 + th2 + th3}

## Observation model for categorical data modeled as a discrete Markov chain

*Use of categorical data modeled as a Markov chain*

In the previous categorical model, the observations were considered as independent for individual . It is however possible to introduce dependence between observations from the same individual assuming that forms a Markov chain.

*Observation model syntax*

An observation variable for ordered categorical data modeled as a discrete Markov chain is defined using the type categorical, along with the dependence definition Markov. Its additional fields are:

- categories: List of the available ordered categories. They are represented by increasing successive integers. It is defined right after type.
- P(Y_1=i): Initial probability of a given category integer i, for the observation named Y. This probability belongs to the first observed value. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are defined following the order of their categories. They can be provided for events where the category is a boundary, instead of an exact match. All boundaries must be of the same kind. Such an event is denoted by using a comparison operator. When the value of a probability can be deduced from others, its definition can be spared. The initial probabilities are optional as a whole, and the default initial law is uniform.
- P(Y=j|Y_p=i): Probability of transition to a given category integer from a previous category , for the observation named Y. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are grouped by law of transition for each previous category . Each law of transition provides the various transition probabilities of reaching . They can be provided for events where the reached category is a boundary, instead of an exact match. All boundaries must be of the same kind for a given law. Such an event is denoted by using a comparison operator. When the value of a transition probability can be deduced from others within its law, its definition can be spared.

*Example*

An example where we define an observation model for this case is proposed here

[LONGITUDINAL] input = {a1, a2, a11, a12, a21, a22, a31, a32} DEFINITION: State = { type = categorical, categories = {1,2,3}, dependence = Markov P(State_1=1) = a1 P(State_1=2) = a2 logit(P(State<=1|State_p=1)) = a11 logit(P(State<=2|State_p=1)) = a11+a12 logit(P(State<=1|State_p=2)) = a21 logit(P(State<=2|State_p=2)) = a21+a22 logit(P(State<=1|State_p=3)) = a31 logit(P(State<=2|State_p=3)) = a31+a32}

## Observation model for a categorical data modeled as a continuous Markov chain

*Observation model syntax*

An observation variable for ordered categorical data modeled as a continuous Markov chain is also defined using the type categorical, along with the dependence definition Markov. But here transition rates are defined instead of transition probabilities. Its additional fields are:

- categories: List of the available ordered categories. They are represented by increasing successive integers. It is defined right after type.
- P(Y_1=i): Initial probability of a given category integer , for the observation named Y. This probability belongs to the first observed value. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are defined following the order of their categories. They can be provided for events where the category is a boundary, instead of an exact match. All boundaries must be of the same kind. Such an event is denoted by using a comparison operator. When the value of a probability can be deduced from others, its definition can be spared. The initial probabilities are optional as a whole, and the default initial law is uniform.
- transitionRate(i,j): Transition rate departing from a given category integer and arriving to a category . They are grouped by law of transition for each departure category . One definition of transition rate can be spared by law of transition, as they must sum to zero.

*Example*

An example where we define an observation model for this case is proposed here

[LONGITUDINAL] input={p1, q12, q21} DEFINITION: State = { type = categorical, categories = {1,2}, dependence = Markov P(State_1=1) = p1 transitionRate(1,2) = q12 transitionRate(2,1) = q21}

## Observation model for time-to-event data

*Use of time-to-event data*

Here, observations are the “times at which events occur”. An event may be one-off (e.g., death, hardware failure) or repeated (e.g., epileptic seizures, mechanical incidents, strikes). Several functions play key roles in time-to-event analysis: the survival, hazard and cumulative hazard functions. We are still working under a population approach here so these functions, detailed below, are thus individual functions, i.e., each subject has its own. As we are using parametric models, this means that these functions depend on individual parameters .

- The
gives the probability that the event happens to individual after time :*survival function* - The
is defined for individual as the instantaneous rate of the event at time , given that the event has not already occurred:*hazard function*This is equivalent to

- Another useful quantity is the
, defined for individual as*cumulative hazard function*

Note that . Then, the hazard function characterizes the problem, because knowing it is the same as knowing the survival function . The probability distribution of survival data is therefore completely defined by the hazard function.

*Observation model syntax*

<pstyle=”text-align: justify;”>An observation variable for time-to-event or repeated time to event data is defined using the type event. Its additional fields are:

- eventType: Type of the events. The exact time of the events can be observed, or censored per interval. The respective keywords are exact and intervalCensored. By default, an exact time is assumed.
- maxEventNumber: Maximum number of events. It is useful for simulation only, and by default the number of simulated events is unlimited.
- rightCensoringTime: Right censoring time of events. It is useful for simulation only, and by default it is the actual time of the last record.
- intervalLength: Length of censoring intervals. It is useful for simulation only, and by default it is the tenth part of the global length.
- hazard: Hazard function.

*Example*

An example where we define an observation model for this case is proposed here

[LONGITUDINAL] input={gamma, V, Cl} EQUATION: Cc = pkmodel(V,Cl) DEFINITION: Seizure = {type = event, eventType = intervalCensored, maxEventNumber = 1, rightCensoringTime = 120, intervalLength = 10, hazard = gamma*Cc}