Observation models for continuous data

Purpose

The observation model is the link between the prediction f of the structural model and the observation. Thus, the observational model is an error model representing the noise and the uncertainty of the measurements. The observation model can be defined only in the Monolix interface or the Mlxtran file in Simulx.

Possible observation models

For the continuous observations, the general form u(y) = u(f) + g e is considered where e is a sequence of independent random variables normally distributed with mean 0 and variance 1, and u is the transformation associated with the distribution of the observations. It is also possible to assume that the residual errors are correlated. Following is a list of distributions and residual error models that can be selected in the Monolix user interface.

Residual error models

constant: constant error model $y = f + ae$
proportional: proportional error model + power $y = f + bf^ce$
combined1: combined error model + power $y = f + (a + b f^c)e$
combined2: combined error model + power $y = f + \sqrt{a^2 + (bf^c)^2}e$ (equivalent to $y = f + ae_1 + bfe_2$ where e1 and e2 are sequences of independent random variables normally distributed with mean 0 and variance 1)

Notice that the parameter c is fixed to 1 by default. However, it can be unfixed and estimated.

Positive gain on the error model
The second parameter b in the observational models comb1 and comb1c can be forced to be always positive by selecting b>0.

Distributions

normal: u(y) = y. This is equivalent to no transformation.
lognormal: u(y) = log(y). Thus, for a combined error model for example, the corresponding observation model writes $\log(y) = \log(f) + (a + b\log(f)) \varepsilon$. It assumes that all observations are strictly positive. Otherwise, an error message is thrown. In case of censored data with a limit, the limit has to be strictly positive too.
logitnormal: u(y) = log(y/(1-y)). Thus, for a combined error model for example, corresponding observation model writes $\log(y/(1-y)) = \log(f/(1-f)) + (a + b\log(f/(1-f)))\varepsilon$. It assumes that all observations are strictly between 0 and 1. However, we can modify these bounds to define the logit function between a minimum and a maximum, and the function u becomes u(y) = log((y-y_min)/(y_max-y)). Again, in case of censored data with a limit, the limits has to be strictly in the proposed interval too.

Hence, the following observation models can be defined with a combination of distribution and residual error model:

exponential error model: $u(y) = \log(y)$ and $y = fe^{ae}$ → constant error model and a lognormal distribution
logit error model $u(y) = \log(\frac{y}{1-y})$ → constant error model and a logitnormal distribution
band(0,10): $u(y) = \log(\frac{y}{10-y})$ → constant error model and a logitnormal distribution with min and max at 0 and 10 respectively
band(0,100): $u(y) = \log(\frac{y}{100-y})$ → constant error model and a logitnormal distribution with min and max at 0 and 10 respectively

Mlxtran observational model syntax

The DEFINITION: block in the [LONGITUDINAL] section is used to define the observational model:

DEFINITION: 
observationName = {distribution = distributionType, prediction = predictionName, errorModel = errorModel(param)}

(notice that one can use type=continuous instead of distribution = distributionType)

For example, if the observation is a concentration with a combined error model (Concentration = Cc + (a+b*Cc)*e), the observational error model is written as

DEFINITION: 
Concentration= {distribution = normal, prediction = Cc, errorModel=combined1(a, b)}

When the observational error is defined in the Mlxtran model file, the user must declare the observational model parameters (a and b in the presented example) as inputs.

Rules and best practices

The eventual arguments of the error model can not be calculations, only input names.
In Monolix, the user can choose the error model through the interface.
In Monolix, the name of the error models input parameters can not have any name.
- The name of the input should correspond to the definition of the error model (ex. a for a constant error model, b for a proportional error model, (a,b) for a combined1 error model, …)
- If there are several continuous outputs, the names of the error models input parameters should be linked to the order of the outputs (1 for the first error model, …)
- For example, for a single output, a combined error model writes without any number as follows
```
DEFINITION: 
Concentration = {distribution = normal, prediction = Cc, errorModel=combined1(a, b)}
```
- For example, for two outputs, a combined error model and a constant error model write as follows
```
DEFINITION: 
Concentration = {distribution = normal, prediction = Cc, errorModel=combined1(a1, b1)}
PCA = {distribution = normal, prediction = E, errorModel=constant(a2)}
```
Notice that a parameter can not be shared by two error models. For example, in the previous Concentration/PCA example, we can not replace a2 by a1.