## Use of count data

Longitudinal count data is a special type of longitudinal data that can take only nonnegative integer values {0, 1, 2, …} that come from counting something, e.g., the number of seizures, hemorrhages or lesions in each given time period. In this context, data from individual *i *is the sequence \(y_i=(y_{ij},1\leq j \leq n_i)\) where \(y_{ij}\) is the number of events observed in the *j*th time interval \(I_{ij}\).

Count data models can also be used for modeling other types of data such as the number of trials required for completing a given task or the number of successes (or failures) during some exercise. Here, \(y_{ij}\) is either the number of trials or successes (or failures) for subject *i* at time \(t_{ij}\). For any of these data types we will then model \(y_i=(y_{ij},1\leq j \leq n_i)\) as a sequence of random variables that take their values in {0, 1, 2, …}. If we assume that they are independent, then the model is completely defined by the *probability mass functions* \(\mathbb{P}(y_{ij}=k)\) for \(k \geq 0\) and \(1 \leq j \leq n_i\). Here, we will consider only parametric distributions for count data.

## Observation model syntax

Considering the observations as a sequence of conditionally independent random variables, the model is completely defined by the probability mass functions \(P(y_{ij}=k)\). An observation variable for count data, with name `Y`

for instance, is defined using the following syntax:

DEFINITION: Y = {type=count, P(Y=k) = ...}

- type=count: indicates the data type
- P(Y=k): probability of a given count value
`k`

, for the observation named`Y`

. The observation name is free but must be the same at the beginning of the line and for the probability definition.`k`

is a reserved keyword and represents a positive integer. k supersedes in this scope any variable k defined previously. The probability must be in [0,1].

A transformed probability can also be provided. The transformation can be log, logit, or probit. For instance with a log-transformation:

DEFINITION: Y = {type=count, log(P(Y=k)) = ...}

As `k`

is only recognized within the probability definition, it is not possible to define the probability using `k`

in an EQUATION block above. However, it is possible to use if/else statements within the probability definition:

DEFINITION: Y = {type=count, if k==0 Pk = ... else Pk = ... end P(Y=k) = Pk}

Common mathematical functions to define count distributions are `factorial(a)`

, `factln(a)`

(logarithm of factorial) and `gammaln(a)`

(logarithm of gamma function). They can be used with `a`

any positive numerical value (not only integers). Note that factorials grow very rapidly and can be considered as “+infinity” in a computer, even when the probability is defined as a ratio of two factorials which stays with reasonable values on paper. It is thus convenient to works with logarithms of factorials, which grow much slower (see examples).

## Examples

* Example 1:* Poisson distribution with time evolution

In this example, the Poisson distribution is used for defining the distribution of \(y_j\):

$$y_j \sim \textrm{Poisson}(\lambda_j) \iff P(Y=k)=\frac{\lambda_j^k e^{-\lambda_j}}{k!}$$

where the Poisson intensity \(\lambda_j\) is function of time \(\lambda_j = a+bt_j\). This model is implemented as follows

[LONGITUDINAL] input = {a,b} EQUATION: lambda = a+b*t DEFINITION: y = {type=count, P(y=k) = exp(-lambda)*(lambda^k)/factorial(k)}

* Example 2:* Binomial distribution

We consider `n`

Bernouilli trials, each having a probability of success `p`

. The probability of having k successes is:

$$P(Y=k)=\frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}$$

To avoid that \(k!\) be so large that it will be considered as NaN by a computer, it is good practice to define the log of the probability to convert the ratios of large number into a sum of smaller numbers:

$$\log(P(Y=k))=\log(n!) – \log(k!) – \log((n-k)!) + k \log(p) + (n-k) \log(1-p)$$

The corresponding Mlxtran model is:

[LONGITUDINAL] input = {n, p} DEFINITION: CountNumber = {type=count, log(P(CountNumber=k)) = gammaln(n+1) - factln(k) - gammaln(n-k+1) + k*log(p) + (n-k)*log(1-p)} OUTPUT: output = Y

* Example 3:* Poisson distribution with zero inflation

Zero-inflations can be encoded using if/else statements:

[LONGITUDINAL] input = {lambda, f} DEFINITION: CountNumber = {type=count, if k==0 Pk = exp(-lambda)*(1-f) + f else Pk = exp(k*log(lambda) - lambda - factln(k))*(1-f) end P(CountNumber=k) = Pk} OUTPUT: output = CountNumber

## Library of count models

The MonolixSuite library of models includes many pre-written count data models: Count library.