Use of count data
Longitudinal count data is a special type of longitudinal data that can take only nonnegative integer values {0, 1, 2, …} that come from counting something, e.g., the number of seizures, hemorrhages or lesions in each given time period. In this context, data from individual i is the sequence \(y_i=(y_{ij},1\leq j \leq n_i)\) where \(y_{ij}\) is the number of events observed in the jth time interval \(I_{ij}\).
Count data models can also be used for modeling other types of data such as the number of trials required for completing a given task or the number of successes (or failures) during some exercise. Here, \(y_{ij}\) is either the number of trials or successes (or failures) for subject i at time \(t_{ij}\). For any of these data types we will then model \(y_i=(y_{ij},1\leq j \leq n_i)\) as a sequence of random variables that take their values in {0, 1, 2, …}. If we assume that they are independent, then the model is completely defined by the probability mass functions \(\mathbb{P}(y_{ij}=k)\) for \(k \geq 0\) and \(1 \leq j \leq n_i\). Here, we will consider only parametric distributions for count data.
Observation model syntax
Considering the observations as a sequence of conditionally independent random variables, the model is completely defined by the probability mass functions \(P(y_{ij}=k)\). An observation variable for count data, with name Y
for instance, is defined using the following syntax:
DEFINITION: Y = {type=count, P(Y=k) = ...}
- type=count: indicates the data type
- P(Y=k): probability of a given count value
k
, for the observation namedY
. The observation name is free but must be the same at the beginning of the line and for the probability definition.k
is a reserved keyword and represents a positive integer. k supersedes in this scope any variable k defined previously. The probability must be in [0,1].
A transformed probability can also be provided. The transformation can be log, logit, or probit. For instance with a log-transformation:
DEFINITION: Y = {type=count, log(P(Y=k)) = ...}
As k
is only recognized within the probability definition, it is not possible to define the probability using k
in an EQUATION block above. However, it is possible to use if/else statements within the probability definition:
DEFINITION: Y = {type=count, if k==0 Pk = ... else Pk = ... end P(Y=k) = Pk}
Common mathematical functions to define count distributions are factorial(a)
, factln(a)
(logarithm of factorial) and gammaln(a)
(logarithm of gamma function). They can be used with a
any positive numerical value (not only integers). Note that factorials grow very rapidly and can be considered as “+infinity” in a computer, even when the probability is defined as a ratio of two factorials which stays with reasonable values on paper. It is thus convenient to works with logarithms of factorials, which grow much slower (see examples).
Examples
Example 1: Poisson distribution with time evolution
In this example, the Poisson distribution is used for defining the distribution of \(y_j\):
$$y_j \sim \textrm{Poisson}(\lambda_j) \iff P(Y=k)=\frac{\lambda_j^k e^{-\lambda_j}}{k!}$$
where the Poisson intensity \(\lambda_j\) is function of time \(\lambda_j = a+bt_j\). This model is implemented as follows
[LONGITUDINAL] input = {a,b} EQUATION: lambda = a+b*t DEFINITION: y = {type=count, P(y=k) = exp(-lambda)*(lambda^k)/factorial(k)}
Example 2: Binomial distribution
We consider n
Bernouilli trials, each having a probability of success p
. The probability of having k successes is:
$$P(Y=k)=\frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}$$
To avoid that \(k!\) be so large that it will be considered as NaN by a computer, it is good practice to define the log of the probability to convert the ratios of large number into a sum of smaller numbers:
$$\log(P(Y=k))=\log(n!) – \log(k!) – \log((n-k)!) + k \log(p) + (n-k) \log(1-p)$$
The corresponding Mlxtran model is:
[LONGITUDINAL] input = {n, p} DEFINITION: CountNumber = {type=count, log(P(CountNumber=k)) = gammaln(n+1) - factln(k) - gammaln(n-k+1) + k*log(p) + (n-k)*log(1-p)} OUTPUT: output = Y
Example 3: Poisson distribution with zero inflation
Zero-inflations can be encoded using if/else statements:
[LONGITUDINAL] input = {lambda, f} DEFINITION: CountNumber = {type=count, if k==0 Pk = exp(-lambda)*(1-f) + f else Pk = exp(k*log(lambda) - lambda - factln(k))*(1-f) end P(CountNumber=k) = Pk} OUTPUT: output = CountNumber
Library of count models
The MonolixSuite library of models includes many pre-written count data models: Count library.