Select Page

# Observation model for categorical data

### Observation model for categorical ordinal data

#### Use of categorical data

Assume now that the observed data takes its values in a fixed and finite set of nominal categories $\{c_1, c_2,\ldots , c_K\}$. Considering the observations $(y_{ij},\, 1 \leq j \leq n_i)$ for any individual i as a sequence of conditionally independent random variables, the model is completely defined by the probability mass functions $\mathbb{P}(y_{ij}=c_k | \psi_i)$ for $k=1,\ldots, K$ and $1 \leq j \leq n_i$. For a given (i,j), the sum of the K probabilities is 1, so in fact only (K-1) of them need to be defined. In the most general way possible, any model can be considered so long as it defines a probability distribution, i.e., for each $k$, $\mathbb{P}(y_{ij}=c_k | \psi_i) \in [0,1]$, and $\sum_{k=1}^{K} \mathbb{P}(y_{ij}=c_k | \psi_i) =1$. Ordinal data further assume that the categories are ordered, i.e., there exists an order $\prec$ such that

$c_1 \prec c_2,\prec \ldots \prec c_K .$

We can think, for instance, of levels of pain (low $\prec$ moderate $\prec$ severe) or scores on a discrete scale, e.g., from 1 to 10. Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities $\mathbb{P}(y_{ij} \preceq c_k | \psi_i)$ for $k=1,\ldots ,K-1$, or in the other direction: $\mathbb{P}(y_{ij} \succeq c_k | \psi_i)$ for $k=2,\ldots, K$. Any model is possible as long as it defines a probability distribution, i.e., it satisfies

$0 \leq \mathbb{P}(y_{ij} \preceq c_1 | \psi_i) \leq \mathbb{P}(y_{ij} \preceq c_2 | \psi_i)\leq \ldots \leq \mathbb{P}(y_{ij} \preceq c_K | \psi_i) =1 .$

It is possible to introduce dependence between observations from the same individual by assuming that $(y_{ij},\,j=1,2,\ldots,n_i)$ forms a Markov chain. For instance, a Markov chain with memory 1 assumes that all that is required from the past to determine the distribution of $y_{ij}$ is the value of the previous observation $y_{i,j-1}$., i.e., for all $k=1,2,\ldots ,K$,

$\mathbb{P}(y_{ij} = c_k\,|\,y_{i,j-1}, y_{i,j-2}, y_{i,j-3},\ldots,\psi_i) = \mathbb{P}(y_{ij} = c_k | y_{i,j-1},\psi_i).$

#### Observation model syntax

Considering the observations as a sequence of conditionally independent random variables, the model is again completely defined by the probability mass functions
$P(y_{ij}=c_k)$ for each category. For a given j, the sum of the K probabilities is 1, so in fact only K-1 of them need to be defined. The distribution of ordered categorical data can be defined in the block DEFINITION: of the Section [LONGITUDINAL] using either the probability mass functions. Ordinal data further assume that the categories are ordered: $c_1 \leq c_2 \leq ... \leq c_K$. Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities $P(y_j \leq c_k)$ for k from 1 to K-1 or the cumulative logits $\textrm{logit}(P(y_j \leq c_k))$ for k from 1 to K-1. An observation variable for ordered categorical data is defined using the type categorical. Its additional fields are:

• categories: List of the available ordered categories. They are represented by increasing successive integers.
• P(Y=i): Probability of a given category integer i, for the observation named Y. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are defined following the order of their categories. They can be provided for events where the category is a boundary, instead of an exact match. All boundaries must be of the same kind. Such an event is denoted by using a comparison operator. When the value of a probability can be deduced from others, its definition can be spared.

#### Example

In the proposed example, we use 4 categories and the model is implemented as follows

[LONGITUDINAL]
input =  {th1, th2, th3}

DEFINITION:
level = {type = categorical, categories = {0, 1, 2, 3},
logit(P(level <=0)) = th1
logit(P(level <=1)) = th1 + th2
logit(P(level <=2)) = th1 + th2 + th3}


Using that definition, the distribution associated to the parameters are

• Normal for th1. Elsewise, it implies that logit(P(level <=0))>0 and thus that P(level <=0).
• Lognormal for th2 and th3 to make sure that P(level <=1)>P(level <=0) and P(level <=2)>=P(level <=1) respectively.

### Observation model for categorical data modeled as a discrete Markov chain

#### Use of categorical data modeled as a Markov chain

In the previous categorical model, the observations were considered as independent for individual i. It is however possible to introduce dependence between observations from the same individual assuming that $(y_{ij})_{j=1,.., n_i}$ forms a Markov chain.

#### Observation model syntax

An observation variable for ordered categorical data modeled as a discrete Markov chain is defined using the type categorical, along with the dependence definition Markov. Its additional fields are:

• categories: List of the available ordered categories. They are represented by increasing successive integers. It is defined right after type.
• P(Y_1=i): Initial probability of a given category integer i, for the observation named Y. This probability belongs to the first observed value. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are defined following the order of their categories. They can be provided for events where the category is a boundary, instead of an exact match. All boundaries must be of the same kind. Such an event is denoted by using a comparison operator. When the value of a probability can be deduced from others, its definition can be spared. The initial probabilities are optional as a whole, and the default initial law is uniform.
• P(Y=j|Y_p=i): Probability of transition to a given category integer j from a previous category i, for the observation named Y. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are grouped by law of transition for each previous category i. Each law of transition provides the various transition probabilities of reaching j. They can be provided for events where the reached category j is a boundary, instead of an exact match. All boundaries must be of the same kind for a given law. Such an event is denoted by using a comparison operator. When the value of a transition probability can be deduced from others within its law, its definition can be spared.

#### Example

An example where we define an observation model for this case is proposed here

[LONGITUDINAL]
input = {a1, a2, a11, a12, a21, a22, a31, a32}

DEFINITION:
State = {type = categorical, categories = {1,2,3}, dependence = Markov
P(State_1=1) = a1
P(State_1=2) = a2
logit(P(State <=1|State_p=1)) = a11
logit(P(State <=2|State_p=1)) = a11+a12
logit(P(State <=1|State_p=2)) = a21
logit(P(State <=2|State_p=2)) = a21+a22
logit(P(State <=1|State_p=3)) = a31
logit(P(State <=2|State_p=3)) = a31+a32}


Using that definition, the distribution associated to the parameters are

• Logitnormal for a1 and a2 to make sure that the initial probability are well defined.
• Normal for a11, a21, and a31 to make sure that the probability is in [0, 1].
• Lognormal for a12, a22, and a32 to make sure that the cumulative probability is increasing.

### Observation model for a categorical data modeled as a continuous Markov chain

#### Observation model syntax

An observation variable for ordered categorical data modeled as a continuous Markov chain is also defined using the type categorical, along with the dependence definition Markov. But here transition rates are defined instead of transition probabilities. Its additional fields are:

• categories: List of the available ordered categories. They are represented by increasing successive integers. It is defined right after type.
• P(Y_1=i): Initial probability of a given category integer i, for the observation named Y. This probability belongs to the first observed value. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are defined following the order of their categories. They can be provided for events where the category is a boundary, instead of an exact match. All boundaries must be of the same kind. Such an event is denoted by using a comparison operator. When the value of a probability can be deduced from others, its definition can be spared. The initial probabilities are optional as a whole, and the default initial law is uniform.
• transitionRate(i,j): Transition rate departing from a given category integer i and arriving to a category j. They are grouped by law of transition for each departure category i. One definition of transition rate can be spared by law of transition, as they must sum to zero.

#### Example

An example where we define an observation model for this case is proposed here

[LONGITUDINAL]
input={p1, q12, q21}

DEFINITION:
State = {type = categorical, categories = {1,2}, dependence = Markov
P(State_1=1) = p1
transitionRate(1,2) = q12
transitionRate(2,1) = q21}