23 brms süntaks

Symbol Example Meaning
+ + x include this variable
- - x delete this variable
: x : z include the interaction between these variables
* x * z include these variables and the interactions between them
/ x / z nesting: include z nested within x
| x | z conditioning: include x given z
^ (u + v + w + z)^3 include these variables and all interactions up to three way
poly poly(x,3) polynomial regression: orthogonal polynomials
Error Error(a/b) specify an error term
I I(x*z) as is: include a new variable consisting of these variables multiplied. (x^2) means include this variable squared, etc. In other words I( ) isolates the mathematic operations inside it.
1 - 1 intercept: delete the intercept (regress through the origin)

üldine vorm:

response ~ pterms + (gterms | group)

kasuta g1:g2 või g1/g2, kui nii g1 kui g2 on sobilikud grupeerivad faktorid.
operaator loob uue grupeeriva faktori, mis kombineerib g1 ja g2 tasemed.

/ operaator viitab nested struktuurile (kool - koolitüüp)

(1 | g1/g2), tähendab tegelikult (1 | g1) + (1 | g1:g2).

(1 | g1 + g2) on sama, mis (1 | g1) + (1 | g2).

|| kasutades (x || g1) ei modelleeri me grupi-taseme korrelatsioone. See on hea, kui mudeli fittimine muidu ei tööta.

kuidas mudeldada sama grupeeriva faktori korrelatsioone üle mitme regressioonivõrrandi? Selleks laiendame | operaatori ||, kus on suvaline väärtus. Näiteks kui termid (x1|ID|g1) and (x2|ID|g1) esinevad kuskil samas või erinevates võrrandites, modelleeritakse need kui korreleeritud.

alternatiivseid grupeeivaid struktuure saab väljendada nii:

(gterms | fun(group)).

Hetkel on meil 2 sellist fun-i: gr() annab default käitumise ja mm() annab multi-membership termid. Näiteks brm(y ~ 1 + (1 | mm(s1, s2)) modelleerib seda, kuidas lapsed võivad õppida kahes koolis (s1 ja s2) eri aegadel

gr() lisatakse muidu automaatselt, aga seda spetsifitseerides saab kirjutada

y ~ x + (1|gr(g1, by = g2)), mis tähendab, et grupeeriva muutja g1 sees lahutatakse veel gruppidesse g2 muutuja tasemete järgi - ja iga g2 grupp modelleeritakse iseseisvalt (ilma shrinkageta)

Mittelineaarne mudel:

y = b1(1 - exp(-(x/b2)**b3 )

y ja x seos parameetritega b1..b3 Oletame, et tahame kõik parameetrid fittida grupeeriva muutuja g tasemete järgi ja et grupi tasmel efektid oleks omavahel korreleeritud. Lisaks ennustame me b1-e kovariaat z järgi. See kõik läheb järgmisesse võrrandisüsteemi kus, lisaks mitte-lineaarsele võrrandile, igale parameetrile b1…b3 vastab oma lineaarne võrrand:

y ~ b1 * (1 - exp(-(x / b2) ^ b3) b1 ~ z + (1|ID|g) b2 ~ (1|ID|g) b3 ~ (1|ID|g)

lisaks on mudeli keeles silumistermid s() ehk spline ja t2() ehk bivariate tensor spline, mis tulevad mgcv paketist. Näiteks rentsqm ~ t2(area, yearc) + (1|district)

category specific effects cs,

monotonic effects mo,

noise-free effects me, or

Gaussian process terms gp.

additional information on the response variable may be specified via response | aterms ~ <predictor terms>. The aterms part may contain multiple terms of the form fun() separated by + each providing special information on the response variable. This allows among others to weight observations, provide known standard errors for meta-analysis, or model censored or truncated data.

23.1 General formula structure

response | aterms ~ pterms + (gterms | group)

The pterms - effects that are assumed to be the same across observations (‘population-level’ effects).

gterms - effects that are assumed to vary across grouping variables specified in group (‘group-level’ effects).

###Group-level terms: x||g, 1|ID|g, gr(x, by = ), mm(g1, g2)

Multiple grouping factors each with multiple group-level effects are possible.

Use || in grouping terms to prevent correlations from being modeled.

It is possible to model different group-level terms of the same grouping factor as correlated (even across different formulas, e.g., in non-linear models) by using || instead of |. All group-level terms sharing the same ID will be modeled as correlated. for instance, the terms (1+x|2|g) and (1+z|2|g) mean that correlations between x and z will be estimated. correlated group-level effects across parameters: bf(y ~ a1 - a2^x, a1 ~ 1 + (1|2|g), a2 ~ x + (x|2|g), nl = TRUE)

If levels of the grouping factor belong to different sub-populations, it may be reasonable to assume a different covariance matrix for each of the sub-populations. For instance, the variation within the treatment group and within the control group in a randomized control trial might differ. Suppose that y is the outcome, and x is the factor indicating the treatment and control group. Then, we could estimate different hyper-parameters of the varying effects (in this case a varying intercept) for treatment and control group via y ~ x + (1 | gr(subject, by = x)).

A multi-membership term with two members: (1 | mm(g1, g2)), where g1 and g2 specify the first and second member, respectively. If x varies across the levels of g1 and g2, we can save the respective x values in the variables x1 and x2 and then model the varying effect as (1 + mmc(x1, x2) | mm(g1, g2)).

multilevel w smoothing terms brmsformula(y ~ x1*x2 + s(z) + (1+x1|1) + (1|g2))

additionally predict ‘sigma’ brmsformula(y ~ x1*x2 + s(z) + (1+x1|1) + (1|g2), sigma ~ x1 + (1|g2))

23.1.1 Special predictor terms (s, t2, gp(), mo(), me(), mi(), cs())

Smoothing terms are modeled using s() and t2() in the pterms part of the formula.

Gaussian process terms can be fitted using gp() in the pterms. Similar to smooth terms, Gaussian processes can be used to model complex non-linear relationships, for instance temporal or spatial autocorrelation. However, they are computationally demanding and are thus not recommended for very large datasets.

The pterms and gterms parts may contain non-standard effect types: monotonic mo(predictor), measurement error me(predictor, sd_predictor), missing value mi(predictor), and category specific effects cs().

Category specific effects can only be estimated in ordinal models and are explained in more detail in the package’s main vignette (type vignette(“brms_overview”)).

A monotonic predictor must either be integer valued or an ordered factor. Predictor categories (or integers) are not assumed to be equidistant with respect to their effect on the response variable. The distance between adjacent categories is estimated from the data and may vary across categories. One parameter takes care of the direction and size of the effect similar to an ordinary regression parameter, while an additional parameter vector estimates the normalized distances between consecutive predictor categories.

Often predictors contain measurement error. Effects of noise-free predictors can be modeled using the me (for ‘measurement error’) function. If x is a measured predictor with known measurement error sdx, we can include it via y ~ me(x, sdx). If x2 is another measured predictor with corresponding error sdx2 and z is a predictor without error (e.g., an experimental setting), we can model all main effects and interactions: \(y = me(x, sdx) * me(x2, sdx2) * z\).

Using a variable with missing values as predictors requires two things, First, we need to specify that the predictor contains missings that should to be imputed. If x is a predictor with missings and z is a predictor without missings, we go for y ~ mi(x) + z. Second, we need to model x as an additional response with corresponding predictors and the addition term mi(). In our example, we could write x | mi() ~ z.

bf(bmi ~ age * mi(chl)) + bf(chl | mi() ~ age) + set_rescor(FALSE).

23.1.2 Additional response information

aterms part may contain multiple terms of the form fun() separated by + each providing special information on the response variable. fun can be replaced with either se, weights, cens, trunc, trials, cat, or dec.

For families gaussian, student and skew_normal, it is possible to specify standard errors of the observations, thus allowing to perform meta-analysis. Suppose that the variable y_i contains the effect sizes from the studies and se_i the corresponding standard errors. Then, fixed and random effects meta-analyses can be conducted using the formulas y_i | se(se_i) ~ 1 and y_i | se(se_i) ~ 1 + (1|study), respectively, where study is a variable uniquely identifying every study.

Meta-regression can be performed via y_i | se(se_i) ~ 1 + mod1 + mod2 + (1|study) or y_i | se(se_i) ~ 1 + mod1 + mod2 + (1 + mod1 + mod2|study), where mod1 and mod2 represent moderator variables. By default, the standard errors replace the parameter sigma. To model sigma in addition to the known standard errors, set argument sigma in function se to TRUE, for instance, y_i | se(se_i, sigma = TRUE) ~ 1.

For all families, weighted regression may be performed using weights in the aterms part. Internally, this is implemented by multiplying the log-posterior values of each observation by their corresponding weights. Suppose that variable we_i contains the weights and that yi is the response variable. Then, formula y_i | weights(we_i) ~ predictors implements a weighted regression.

With the exception of categorical, ordinal, and mixture families, left, right, and interval censoring can be modeled through y | cens(censored) ~ predictors. The censoring variable (named censored in this example) should contain the values ‘left’, ‘none’, ‘right’, and ‘interval’ (or equivalently -1, 0, 1, and 2) to indicate that the corresponding observation is left censored, not censored, right censored, or interval censored. For interval censored data, a second variable (let’s call it y2) has to be passed to cens. In this case, the formula has the structure y | cens(censored, y2) ~ predictors. While the lower bounds are given in y, the upper bounds are given in y2 for interval censored data. Intervals are assumed to be open on the left and closed on the right: (y, y2].

With the exception of categorical, ordinal, and mixture families, the response distribution can be truncated using the trunc function in the addition part. If the response variable is truncated between, say, 0 and 100, we can specify this via yi | trunc(lb = 0, ub = 100) ~ predictors. Instead of numbers, variables in the data set can also be passed allowing for varying truncation points across observations. Defining only one of the two arguments in trunc leads to one-sided truncation.

For all continuous families, missing values in the responses can be imputed within Stan by using the addition term mi. This is mostly useful in combination with mi predictor terms as explained above under ‘Special predictor terms’.

For families binomial and zero_inflated_binomial, addition should contain a variable indicating the number of trials underlying each observation: success | trials(n). If the number of trials is constant across all observations, say 10, we may also write success | trials(10).

For all ordinal families, aterms may contain a term cat(number) to specify the number of categories (e.g, cat(7)). If not given, the number of categories is calculated from the data.

Multiple addition terms may be specified at the same time using the + operator, for instance formula = y_i | se(se_i) + cens(censored) ~ 1 for a censored meta-analytic model.

23.1.3 Formula syntax for non-linear models

The non-linear model can be specified within the formula argument. Suppose, that we want a non-linear model being defined via formula = y ~ alpha - beta * lambda^x. To tell brms that this is a non-linear model, argument nl = TRUE. Now we have to specify a model for each of the non-linear parameters. Let’s say we just want to estimate those three parameters with no further covariates or random effects. Then we can pass alpha + beta + lambda ~ 1 or equivalently (and more flexible) alpha ~ 1, beta ~ 1, lambda ~ 1 to the … argument. If we have another predictor z and observations nested within the grouping factor g, we may write for instance alpha ~ 1, beta ~ 1 + z + (1|g), lambda ~ 1. We are using z and g only for the prediction of beta, but we might also use them for the other non-linear parameters.

Non-linear models may not be uniquely identified and / or show bad convergence. For this reason it is mandatory to specify priors on the non-linear parameters.

bf(y ~ a1 - a2^x, a1 + a2 ~ 1, nl = TRUE) simple nl model

23.1.4 Formula syntax for predicting distributional parameters

It is also possible to predict parameters of the response distribution such as the residual standard deviation sigma in gaussian models or the hurdle probability hu in hurdle models. The syntax closely resembles that of a non-linear parameter, for instance sigma ~ x + s(z) + (1+x|g).

Alternatively, one may fix distributional parameters to certain values. This is useful when models become too complicated and otherwise have convergence issues. The quantile parameter of the asym_laplace distribution is a good example where it is useful. By fixing quantile, one can perform quantile regression for the specified quantile. For instance, quantile = 0.25 allows predicting the 25%-quantile. Furthermore, the bias parameter in drift-diffusion models, is assumed to be 0.5 (i.e. no bias) in many applications. To achieve this, simply write bias = 0.5. Other possible applications are the Cauchy distribution as a special case of the Student-t distribution with nu = 1, or the geometric distribution as a special case of the negative binomial distribution with shape = 1. Furthermore, the parameter disc (‘discrimination’) in ordinal models is fixed to 1 by default and not estimated, but may be modeled as any other distributional parameter if desired (‘disc’ can only be positive, which is achieved by applying the log-link).

In categorical models, distributional parameters do not have fixed names. Instead, they are named after the response categories (excluding the first one, which serves as the reference category), with the prefix ‘mu’. If, for instance, categories are named cat1, cat2, and cat3, the distributional parameters will be named mucat2 and mucat3.

Some distributional parameters currently supported by brmsformula have to be positive (a negative standard deviation or precision parameter does not make any sense) or are bounded between 0 and 1 (for zero-inflated / hurdle probabilities, quantiles, or the initial bias parameter of drift-diffusion models). However, linear predictors can be positive or negative, and thus the log link (for positive parameters) or logit link (for probability parameters) are used by default to ensure that distributional parameters are within their valid intervals. This implies that, by default, effects for such distributional parameters are estimated on the log / logit scale and one has to apply the inverse link function to get to the effects on the original scale. Alternatively, it is possible to use the identity link to predict parameters on their original scale, directly. However, this is much more likely to lead to problems in the model fitting, if the parameter actually has a restricted range.

23.1.5 Formula syntax for mixture models

If not specified otherwise, all mean parameters of the mixture components are predicted using the right-hand side of formula.

distributional parameters of mixture distributions have the same name as those of the corresponding ordinary distributions, but with a number at the end to indicate the mixture component. For instance, if you use family mixture(gaussian, gaussian), the distributional parameters are sigma1 and sigma2. distributional parameters of the same class can be fixed to the same value. For the above example, we could write sigma2 = “sigma1” to make sure that both components have the same residual standard deviation, which is in turn estimated from the data.

specify different predictors for different mixture components mix <- mixture(gaussian, gaussian) bf(y ~ 1, mu1 ~ x, mu2 ~ z, family = mix)

fix both residual standard deviations to the same value bf(y ~ x, sigma2 = “sigma1”, family = mix)

In addition, there are two types of special distributional parameters. The first are named mu, that allow for modeling different predictors for the mean parameters of different mixture components. For instance, if you want to predict the mean of the first component using predictor x and the mean of the second component using predictor z, you can write mu1 ~ x as well as mu2 ~ z. The second are named theta, which constitute the mixing proportions. If the mixing proportions are fixed to certain values, they are internally normalized to form a probability vector. If one seeks to predict the mixing proportions, all but one of the them has to be predicted, while the remaining one is used as the reference category to identify the model. The softmax function is applied on the linear predictor terms to form a probability vector.

23.1.6 Formula syntax for multivariate models

Multivariate models may be specified using cbind notation or with help of the mvbf function. Suppose that y1 and y2 are response variables and x is a predictor. Then cbind(y1, y2) ~ x specifies a multivariate model, The effects of all terms specified at the RHS of the formula are assumed to vary across response variables. For instance, two parameters will be estimated for x, one for the effect on y1 and another for the effect on y2. This is also true for group-level effects. When writing, for instance, cbind(y1, y2) ~ x + (1+x|g), group-level effects will be estimated separately for each response. To model these effects as correlated across responses, use the ID syntax (see above). For the present example, this would look as follows: cbind(y1, y2) ~ x + (1+x|2|g). Of course, you could also use any value other than 2 as ID.

It is also possible to specify different formulas for different responses. If, for instance, y1 should be predicted by x and y2 should be predicted by z, we could write mvbf(y1 ~ x, y2 ~ z). Alternatively, multiple brmsformula objects can be added to specify a joint multivariate model (see ‘Examples’).

23.2 brms likelihoods

A character string naming the distribution of the response variable.

student(link = “identity”, link_sigma = “log”, link_nu = “logm1”)

bernoulli(link = “logit”)

negbinomial(link = “log”, link_shape = “log”)

geometric(link = “log”)

lognormal(link = “identity”, link_sigma = “log”)

shifted_lognormal(link = “identity”, link_sigma = “log”, link_ndt = “log”)

skew_normal(link = “identity”, link_sigma = “log”, link_alpha = “identity”)

exponential(link = “log”)

weibull(link = “log”, link_shape = “log”)

exgaussian(link = “identity”, link_sigma = “log”, link_beta = “log”)

Beta(link = “logit”, link_phi = “log”)

hurdle_poisson(link = “log”)

hurdle_negbinomial(link = “log”, link_shape = “log”, link_hu = “logit”)

hurdle_gamma(link = “log”, link_shape = “log”, link_hu = “logit”)

hurdle_lognormal(link = “identity”, link_sigma = “log”, link_hu = “logit”)

zero_inflated_beta(link = “logit”, link_phi = “log”, link_zi = “logit”)

zero_one_inflated_beta(link = “logit”, link_phi = “log”, link_zoi = “logit”, link_coi = “logit”)

zero_inflated_poisson(link = “log”, link_zi = “logit”)

zero_inflated_negbinomial(link = “log”, link_shape = “log”, link_zi = “logit”)

zero_inflated_binomial(link = “logit”, link_zi = “logit”)

categorical(link = “logit”)

cumulative(link = “logit”, link_disc = “log”, threshold = c(“flexible”, “equidistant”))

  • Family gaussian with identity link leads to linear regression.

  • Family student with identity link leads to robust linear regression that is less influenced by outliers.

  • Family skew_normal can handle skewed responses in linear regression.

  • Families poisson, negbinomial, and geometric with log link lead to regression models for count data.

  • Families binomial and bernoulli with logit link leads to logistic regression and family categorical to multi-logistic regression when there are more than two possible outcomes.

  • Families cumulative, cratio (‘continuation ratio’), sratio (‘stopping ratio’), and acat (‘adjacent category’) leads to ordinal regression.

  • Families Gamma, weibull, exponential, lognormal, frechet, and inverse.gaussian can be used (among others) for survival regression.

  • Families weibull, frechet, and gen_extreme_value (‘generalized extreme value’) allow for modeling extremes.

  • Family asym_laplace allows for quantile regression when fixing the auxiliary quantile parameter to the quantile of interest.

  • Family exgaussian (‘exponentially modified Gaussian’) and shifted_lognormal are especially suited to model reaction times.

  • The wiener family provides an implementation of the Wiener diffusion model. For this family, the main formula predicts the drift parameter ‘delta’ and all other parameters are modeled as auxiliary parameters (see brmsformula for details).

  • Families hurdle_poisson, hurdle_negbinomial, hurdle_gamma, hurdle_lognormal, zero_inflated_poisson, zero_inflated_negbinomial, zero_inflated_binomial, zero_inflated_beta, and zero_one_inflated_beta allow to estimate zero-inflated and hurdle models. These models can be very helpful when there are many zeros in the data (or ones in case of one-inflated models) that cannot be explained by the primary distribution of the response. Families hurdle_lognormal and hurdle_gamma are especially useful, as traditional lognormal or Gamma models cannot be reasonably fitted for data containing zeros in the response.

A list of all possible links for each family:

  • The families gaussian, student, skew_normal, exgaussian, asym_laplace, and gen_extreme_value accept the links (as names) identity, log, and inverse;

  • families poisson, negbinomial, geometric, zero_inflated_poisson, zero_inflated_negbinomial, hurdle_poisson, and hurdle_negbinomial the links log, identity, and sqrt;

  • families binomial, bernoulli, Beta, zero_inflated_binomial, zero_inflated_beta, and zero_one_inflated_beta the links logit, probit, probit_approx, cloglog, cauchit, and identity;

  • families cumulative, cratio, sratio, and acat the links logit, probit, probit_approx, cloglog, and cauchit; family categorical the link logit;

  • families Gamma, weibull, exponential, frechet, and hurdle_gamma the links log, identity, and inverse; families lognormal and hurdle_lognormal the links identity and inverse;

  • family inverse.gaussian the links 1/mu^2, inverse, identity and log; family von_mises the link tan_half; family wiener the link identity.

The first link mentioned for each family is the default.

23.3 priors

23.3.1 Population-level (‘fixed’) effects

If y is predicted by x1 and x2 then x1 and x2 have regression parameters b_x1 and b_x2. The default prior for population-level effects (including monotonic and category specific effects) is an improper flat prior. Other common options are normal priors or student-t priors. If we want to have a normal prior with mean 0 and standard deviation 5 for x1, and a unit student-t prior with 10 degrees of freedom for x2, we can specify this via set_prior(“normal(0,5)”, class = “b”, coef = “x1”) and set_prior(“student_t(10,0,1)”, class = “b”, coef = “x2”). To put the same prior on all population-level effects at once, set_prior(“”, class = “b”). This also leads to faster sampling, because priors can be vectorized in this case. Both ways of defining priors can be combined using for instance set_prior(“normal(0,2)”, class = “b”) and set_prior(“normal(0,10)”, class = “b”, coef = “x1”) at the same time. This will set a normal(0,10) prior on the effect of x1 and a normal(0,2) prior on all other population-level effects. However, this will break vectorization.

The intercept has its own parameter class named “Intercept” and priors can thus be specified via set_prior(“”, class = “Intercept”). Setting a prior on the intercept will not break vectorization of the other population-level effects. Note that technically, this prior is set on an intercept that results when internally centering all population-level predictors around zero to improve sampling efficiency. On this centered intercept, specifying a prior is actually much easier and intuitive than on the original intercept, since the former represents the expected response value when all predictors are at their means. To treat the intercept as an ordinary population-level effect and avoid the centering parameterization, use 0 + intercept on the right-hand side of the model formula.

A special shrinkage prior to be applied on population-level effects is the horseshoe prior. See horseshoe for details. Another shrinkage prior is the so-called lasso prior. See lasso for details.

In non-linear models, population-level effects are defined separately for each non-linear parameter. Accordingly, it is necessary to specify the non-linear parameter in set_prior so that priors can be assigned correctly. If, for instance, alpha is the parameter and x the predictor for which we want to define the prior, we can write set_prior(“”, coef = “x”, nlpar = “alpha”). As a shortcut we can use set_prior(“”, nlpar = “alpha”) to set the same prior on all population-level effects of alpha at once.

If desired, population-level effects can be restricted to fall only within a certain interval using the lb and ub arguments of set_prior. This is often required when defining priors that are not defined everywhere on the real line, such as uniform or gamma priors. When defining a uniform(2,4) prior, you should write set_prior(“uniform(2,4)”, lb = 2, ub = 4). When using a prior that is defined on the positive reals only (such as a gamma prior) set lb = 0. In most situations, it is not useful to restrict population-level parameters through bounded priors (non-linear models are an important exception), but if you really want to this is the way to go.

23.3.2 Standard deviations of group-level (‘random’) effects

Each group-level effect of each grouping factor has a standard deviation named sd__. Consider, for instance, the formula y ~ x1 + x2 + (1 + x1 | g). We see that the intercept as well as x1 are group-level effects nested in the grouping factor g. The corresponding standard deviation parameters are named as sd_g_Intercept and sd_g_x1 respectively. These parameters are restricted to be non-negative and, by default, have a half student-t prior with 3 degrees of freedom and a scale parameter that depends on the standard deviation of the response after applying the link function. Minimally, the scale parameter is 10. This prior is used (a) to be only very weakly informative in order to influence results as few as possible, while (b) providing at least some regularization to considerably improve convergence and sampling efficiency. To define a prior distribution only for standard deviations of a specific grouping factor, use set_prior(“”, class = “sd”, group = “”). To define a prior distribution only for a specific standard deviation of a specific grouping factor, you may write set_prior(“”, class = “sd”, group = “”, coef = “”). When defining priors on group-level parameters in non-linear models, please make sure to specify the corresponding non-linear parameter through the nlpar argument in the same way as for population-level effects.

23.3.3 Correlations of group-level (‘random’) effects

If there is more than one group-level effect per grouping factor, the correlations between those effects have to be estimated. The prior “lkj_corr_cholesky(eta)” or in short “lkj(eta)” with eta > 0 is essentially the only prior for (Cholesky factors) of correlation matrices. If eta = 1 (the default) all correlations matrices are equally likely a priori. If eta > 1, extreme correlations become less likely, whereas 0 < eta < 1 results in higher probabilities for extreme correlations. Correlation matrix parameters in brms models are named as cor_, (e.g., cor_g if g is the grouping factor). To set the same prior on every correlation matrix, use for instance set_prior(“lkj(2)”, class = “cor”).

23.3.4 Splines

Each spline has its corresponding standard deviations modeling the variability within this term. This parameter class is called sds and priors can be specified via set_prior(“”, class = “sds”, coef = “”). The default prior is the same as for standard deviations of group-level effects.

23.3.5 Gaussian processes

Gaussian processes have two parameters, the standard deviation parameter sdgp, and characteristic length-scale parameter lscale (see gp for more details). The default prior of sdgp is the same as for sd-s of group-level effects. The default prior of lscale is an informative inverse-gamma prior specifically tuned to the covariates of the Gaussian process (for more details see https://betanalpha.github.io/assets/case_studies/gp_part3/part3.html). This tuned prior may be overly informative in some cases, so please consider other priors as well to make sure inference is robust to the prior specification. If tuning fails, a half-normal prior is used instead.

23.3.6 Autocorrelation parameters

The autocorrelation parameters are named ar (autoregression), ma (moving average), arr (autoregression of the response), car (spatial conditional autoregression), as well as lagsar and errorsar (Spatial simultaneous autoregression). Priors can be defined by set_prior(“”, class = “ar”) for ar and similar for other autocorrelation parameters. By default, ar and ma are bounded between -1 and 1, car, lagsar, and errorsar are bounded between 0, and 1, and arr is unbounded (you may change this by using the arguments lb and ub). The default prior is flat over the definition area.

23.3.7 Parameters for specific families

Families gaussian, student, skew_normal, lognormal, and gen_extreme_value need the parameter sigma to account for the residual standard deviation. By default, sigma has a half student-t prior that scales in the same way as the group-level standard deviations. Family student needs the parameter nu representing the degrees of freedom of students-t distribution. By default, nu has prior “gamma(2, 0.1)” and a fixed lower bound of 1.

Families gamma, weibull, inverse.gaussian, and negbinomial need a shape parameter that has a “gamma(0.01, 0.01)” prior by default.

Every family specific parameter has its own prior class, so that set_prior(“”, class = “”) is the right way to go. All of these priors are chosen to be weakly informative, having only minimal influence on the estimations, while improving convergence and sampling efficiency. Often, it may not be immediately clear, which parameters are present in the model. To get a full list of parameters and parameter classes for which priors can be specified use function get_prior.

library(tidyverse)
library(mice)