Observation model for count data

Use of count data

Longitudinal count data is a special type of longitudinal data that can take only nonnegative integer values {0, 1, 2, …} that come from counting something, e.g., the number of seizures, hemorrhages or lesions in each given time period. In this context, data from individual is the sequence \(y_i=(y_{ij},1\leq j \leq n_i)\) where \(y_{ij}\) is the number of events observed in the jth time interval \(I_{ij}\).
Count data models can also be used for modeling other types of data such as the number of trials required for completing a given task or the number of successes (or failures) during some exercise. Here, \(y_{ij}\) is either the number of trials or successes (or failures) for subject i at time \(t_{ij}\). For any of these data types we will then model \(y_i=(y_{ij},1\leq j \leq n_i)\) as a sequence of random variables that take their values in {0, 1, 2, …}.  If we assume that they are independent, then the model is completely defined by the probability mass functions \(\mathbb{P}(y_{ij}=k)\) for \(k \geq 0\) and \(1 \leq j \leq n_i\). Here, we will consider only parametric distributions for count data.

Observation model syntax

Considering the observations as a sequence of conditionally independent random variables, the model is completely defined by the probability mass functions \(P(y_{ij}=k)\). An observation variable for count data, with name Y for instance, is defined using the following syntax:

DEFINITION:
Y = {type=count, P(Y=k) = ...}
  • type=count: indicates the data type
  • P(Y=k): probability of a given count value k, for the observation named Y. The observation name is free but must be the same at the beginning of the line and for the probability definition. k is a reserved keyword and represents a positive integer. k supersedes in this scope any variable k defined previously. The probability must be in [0,1].

A transformed probability can also be provided. The transformation can be log, logit, or probit. For instance with a log-transformation:

DEFINITION:
Y = {type=count, log(P(Y=k)) = ...}

As k is only recognized within the probability definition, it is not possible to define the probability using k in an EQUATION block above. However, it is possible to use if/else statements within the probability definition:

DEFINITION:
Y = {type=count, 
if k==0
Pk = ...
else
Pk = ...
end
P(Y=k) = Pk}

Common mathematical functions to define count distributions are factorial(a), factln(a) (logarithm of factorial) and gammaln(a) (logarithm of gamma function). They can be used with a any positive numerical value (not only integers). Note that factorials grow very rapidly and can be considered as “+infinity” in a computer, even when the probability is defined as a ratio of two factorials which stays with reasonable values on paper. It is thus convenient to works with logarithms of factorials, which grow much slower (see examples).

Examples

Example 1: Poisson distribution with time evolution

In this example, the Poisson distribution is used for defining the distribution of \(y_j\):

$$y_j \sim \textrm{Poisson}(\lambda_j) \iff P(Y=k)=\frac{\lambda_j^k e^{-\lambda_j}}{k!}$$

where the Poisson intensity \(\lambda_j\) is function of time \(\lambda_j = a+bt_j\). This model is implemented as follows

[LONGITUDINAL]
input = {a,b}

EQUATION:
lambda = a+b*t

DEFINITION:
y = {type=count, P(y=k) = exp(-lambda)*(lambda^k)/factorial(k)}

 

Example 2: Binomial distribution

We consider n Bernouilli trials, each having a probability of success p. The probability of having k successes is:

$$P(Y=k)=\frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}$$

To avoid that \(k!\) be so large that it will be considered as NaN by a computer, it is good practice to define the log of the probability to convert the ratios of large number into a sum of smaller numbers:

$$\log(P(Y=k))=\log(n!) – \log(k!) – \log((n-k)!) + k \log(p) + (n-k) \log(1-p)$$

The corresponding Mlxtran model is:

[LONGITUDINAL]
input = {n, p}

DEFINITION:
CountNumber = {type=count, log(P(CountNumber=k)) = gammaln(n+1) - factln(k) - gammaln(n-k+1) + k*log(p) + (n-k)*log(1-p)}

OUTPUT:
output = Y

Example 3: Poisson distribution with zero inflation

Zero-inflations can be encoded using if/else statements:

[LONGITUDINAL]
input = {lambda, f}

DEFINITION:
CountNumber = {type=count, 
if k==0
Pk = exp(-lambda)*(1-f) + f
else
Pk = exp(k*log(lambda) - lambda - factln(k))*(1-f)
end
P(CountNumber=k) = Pk}

OUTPUT:
output = CountNumber

Library of count models

The MonolixSuite library of models includes many pre-written count data models: Count library.