Analysis of threshold traits


Some important traits observed in forestry are dichotomous (0 or 1). They have only two categories; such as survival (alive or dead), disease incidence (infected or not infected), forkness (forked or not forked), etc. When a randomize complete block field design is used, the analysis of binomial traits can be carried out using arcsin transformed plot percentages. However, this analysis method may not be applicable when other plot configurations are used such as single-tree plots.

Measuring a phenotypic expression on the 0-1 scale produces a large amount of measurement error (environmental variance). The error is the least when the proportion of the incidence is p=0.5 and becomes larger when p increases or decreases (see Falconer and Mackay 1996).

where m= population mean, x= the distance of the threshold value from m, or deviation of the threshold value from the mean in standard deviation, T= threshold value, i= mean of the individuals affected and p= proportion of individuals that are affected.

Given a p value, x and i can be obtained from a standardized normal distribution. For example, if p=24% of the individuals are affected, then x=2.820 and i=3.117.

Heritability estimated on the observed scale (0-1) needs to be transformed to continuous scale in order to remove the scale effect by the following equation.

Lynch and Walsh sugegsted below equation for estimation of heritability on the observed scale.

where, SigmaA(0,1) is additive genetic variance, Phi(1-Phi) is the phenotypic variance on the observed scale. Phi_p is the frequency of incidence in the population. The relationship between observed and underlying scale heritabilities is given as follows.

where the numerator [p(xp)]^2 is the mean of affected individuals of the population.

See Falconer and Mackay 1996, page 299-311, and Lynch and Walsh 1998, page 738-744 for details.


 

Example 1 (fitting the model with ASReml)

Fusiform rust incidence in a clonally replicated trial of loblolly pine was observed as infected (1) and not infected (0). The probability of disease (p) was modeled with the generalized linear mixed model (GLMM).

log[p/(1-p)]=µ+r+f+c.f+e

Since the disease incidence was observed on the 0-1 scale, incidence frequency was not used in the analysis but 0-1 values. The GLMM was fitted using the logit function and fitting the model with ASReml program.

!part rust #univariate model for rust at age 3
rust3 !BIN !LOGIT ~ mu rep !r family clone.family
0

The proportion of infected individuals is p=(538/2369)= 0.23. Using this proportion, we can easily obtain deviation of threshold value from the mean as a standard deviation (x=2.860) and the mean of individuals affected (i=3.125). Variance components from the output:

Source Estimate
Family 0.58
Clone.Family 2.71
Error 1.00

 



Example 2 (fitting the model with SAS GLIMMIX)

Analysis of binomial traits can be carried out by a SAS macro code called GLIMMIX.SAS. The code can be downloaded from the SAS web site. Just type 'glimmix' in the search window and you will find several macro programs for different versions of SAS.

*Call the macro program ;
%inc 'c:\mysasfiles\glmmix macro.sas' ;
run;

* run the macro ;
%glimmix(data=a, procopt=method=reml,
stmts=%str( class rep family clone;
model y = block / s;
random family clone(family)/s ;
),
error=binomial,
link=logit );
run;

You may change the class variables, fixed and random effects according to your model. If you are not interested in the Best Linear Unbiased Predictors of the random factors in your model, you may remove the / and 's'.

The output includes model information, variance components as well as solutions for fixed effects (BLUE) and solutions for random effects (BLUPs). Click here to see an output. Be aware that linear predictors for rust incidence were computed on logit scale. To calculate predicted probability (p) of clones or families, one needs to apply the inverse link function. For example, probability of a genotype being infected by the disease can be calculated as follows:

p = exp[BLUP(genotype)]/[1+exp(BLUP(genotype)]

Predicted probability values range between 0-1 and they are more meaningful to interpret. For example, a p=0.16 value indicates that probability of a genotype being infected is 16%.


Caution: SAS GLMMIX codes are still experimental. ASReml and GLIMMIX analysis of the same threshold trait gave somewhat different error variances. ASReml by default sets the environmenatl error to '1.00'.

last moified: April 27, 2004

Fikret ISIK fisik@ncsu.edu