More general statistical models for individual or covariate parameters

Introduction

Mlxtran allows to encode very general linear and non linear statistical models, involving covariates and based on probability distributions.
Linear Gaussian model: a linear Gaussian statistical model for the variable X assumes that there exists a transformation h, a typical value X_{\rm pop}, a vector of individual covariates (c_{1}, \ldots c_{L}), a vector of coefficients (\beta_1, \ldots, \beta_L) and a random variable \eta normally distributed such that

 h(X) = h(X_{pop}) + \sum_{\ell=1}^L \beta_\ell \, c_\ell + \eta

This model can be implemented with Mlxtran, using the keywords typical, covariate and coefficient.

input = {Xpop, beta1, beta2, c1, c2, omega}
DEFINITION:
X = {distribution=lognormal, typical=Xpop, covariate={c1,c2}, coefficient={beta1,beta2}, sd=omega}

The keyword covariate is used to define the name of the covariates used in the correlation, and the coefficient keyword is used to complete the equation. Obviously, the number of parameters in the coefficient is equal to the number of covariates.

Non linear Gaussian model: a nonlinear Gaussian statistical model for the variable X assumes that there exists a transformation h, a vector of individual covariates (c_{1}, \ldots c_{L}), a vector of coefficients (\beta_1, \ldots, \beta_M), a function \mu and a random variable \eta normally distributed such that

 h(X) = \mu(c_{1}, \ldots c_{L},\beta_1, \ldots, \beta_M) + \eta

The mean of h(X) can be defined in a block DEFINITION:, with for example \mu(\beta_1,\beta_2,c_1,c_2)=\frac{\beta_1c_1}{\beta_2+c_2}

input = {beta1, beta2, c1, c2, omega}
EQUATION:
mu = beta1*c1/(beta2 + c2)

DEFINITION:
X = {distribution=lognormal, mean=mu, sd=omega}

Non Gaussian model: a non Gaussian model for X can be defined, at the condition that X can be defined as a nonlinear function of normally distributed random variables. For example, let

X = \frac{\beta_1 + \eta_1}{1+ \beta_2 \, e^{\eta_2}}

It is not possible to express explicitly the distribution of X as a transformation of a normal distribution. We therefore need a block EQUATION for implementing this model:

input = {beta1, beta2, omega1, omega2}
DEFINITION:
eta1 = {distribution=normal, mean=0, sd=omega1}
eta2 = {distribution=normal, mean=0, sd=omega2}

EQUATION:
X = (beta1 + eta1)/(1+beta2*exp(eta2))

Best practices

  • The distribution is defined by its mean and standard variation using the keywords mean (or typical) and sd. It can be defined also by its mean and its variance. In that case, the keyword var is used. However, we encourage the user to always use the standard deviation for uniformity and simplicity.
  • When defining a distribution with covariate, one can note define numerically the coefficients. For example, if we consider X=X_{pop}+c+\eta, one should write
    input = {c}
    EQUATION:
    beta = 1
    DEFINITION : 
    X = {distribution=normal, typical=Xpop, covariate=c, coefficient=beta, sd=omega}
    

    and define \beta = 1 in the section <PARAMETER> or define it in an EQUATION: block. Otherwise, putting directly 1 instead of \beta in the distribution definition will lead to an error.

  • We strongly advise to define the distribution in the more synthetic way. If for example, you want to define a log-normally distributed volume with a dependence w.r.t. the weight V=V_{pop}(w/70)^{\beta}, we encourage you not to define a lot of equations but to summarize it in the definition as for example
    input = {Vpop, w, beta}
    EQUATION:
    cov = w/70
    
    DEFINITION:
    V = {distribution=normal, typical=Vpop, covariate=cov, coefficient=beta, sd=omega}