Return to Fikret Isik's home page

Bivariate/multivariate analysis of progeny data using SAS Proc Mixed
-Calculation of genetic correlation and its standard error

SAS Proc Mixed can be used to estimate genetic correlations among traits and their standard errors. Lets say we have 120 half-sib families of loblolly pine and they were tested at two locations using RCB design single-tree plots. Two traits were measured e.g., var1 var2. We want to estimate genetic correlation between var1 and var2. The original data are as follows;

data B;
input location $ rep family $ tree var1 var2 ;
datalines ;

AL 1 F1 1 128 214
AL 1 F1 2 107 165
AL 1 F1 3 187 162
. . . . .
;

For bivariate/multivariate analysis, a special data structure is needed. First, variables for each individual tree should be put into one column and must be named as Y . Second, a dummy variable named TRAIT needs to be created .

data A;
set B(rename=(var1=y))
b(rename=(var2=y));
if var1=. then trait=1 ;
if var2=. then trait=2 ;
drop var1 var2;

run;

When a portion of the data is printed using the Proc PRINT of SAS, the output will be as follows:

proc print data=A (obs=6); run;
title 'Arranged data';

Arranged data

Obs location rep family tree y trait
1 AL 1 F1 1 128 1
2 AL 1 F1 2 107 1
3 AL 1 F1 3 187 1
4 AL 1 F1 1 214 2
5 AL 1 F1 2 165 2
6 AL 1 F1 3 162 2

As seen above, each tree of a family has two observations rather than one, e.g. tree 1 of family F1 has 128 and 214. For simplicity, I included 2 traits, however, you may include 3 or more traits.

The following code fits a mixed model to data.


proc mixed data=A covtest asycov ;
Class trait location rep family tree;
model y =trait location rep(location) ;
random trait /type=un sub=family g gcorr;
repeated /type=UN sub=family*location*tree r rcorr;
ods output covparms=_varcomp asycov=_cov ;
run;

  1. The fixed effect class (independent) variables (trait location rep(location) are listed after the dependent variable Y.
  2. The variables TRAIT (which is a dummy), FAMILY and FAMILY*LOCATION*TREE are random.
  3. The G option requests genetic covariance-variance matrix and the GCORR requests genetic correlation matrix between two traits.
  4. The R matrix requests the first block diagonal matrix and RCORR requests within family correlation matrix.
  5. Finally, two output files (variance components, covariance matrix) were requested using the ODS of SAS. Here, COVPARMS is the SAS table name of the variance components, _VARCOMP is a given name (you may give a different name). ASYCOV is the variance-covariance matrix of variance components.
  6. The TYPE=UN option requests an unstructured covariance matrix for each SUBJECT=FAMILY. This structure does not make any assumptions of covariances. The covariance matrix for three traits would be as follows:
cov={
11
12
13
12
22
23
13
23
33
};

Depending on the data, you may fit different covariance structures, such as compound symmetry (CS) or autoregressive (AR) etc.

A partial output of the Mixed Procedure is given below:

Estimated R Matrix for family F1
Row Col1 Col2
1 568.75 570.64
2 570.64 878.04

Estimated R Correlation Matrix for family F1
Row Col1 Col2
1 1 0.8075
2 0.8075 1

The above two tables are for within family or Error covariance and correlation matrices. 570.64 is Error covariance between traits var1 and var2, 568.75 is Error variance of trait var1, 878.04 is error variance of trait var2. In the R CORRELATION MATRIX, 0.8075 is the Error Correlation between two traits, which can also be obtained from R MATRIX as r(E)=570.64 / sqrt (568.75*878.04).

 


Estimated G Matrix
Row Effect family trait Col1 Col2
1 trait F1 1 37.96 48.32
2 trait F1 2 48.32 68.71

The elements of the G MATRIX is genetic (family) are variances and the covariance of two variables. 37.96 is the genetic variance of trait var1, 48.32 is the genetic covariance between two traits, and 68.71 is the genetic variance of trait var2.



Estimated G Correlation Matrix
Row Effect family trait Col1 Col2
1 trait F1 1 1.00 0.9461
2 trait F1 2 0.9461 1.00

By requesting GCORR option after the RANDOM statement, the code produces genetic correlation between two traits. Here, genetic correlation is 0.9461. We can calculate genetic correlation using the G MATRIX components as r(G) = 48.32 / sqrt(37.96*68.71).

 



Covariance Parameter Estimates
Cov Parm Subject Estimate Standard Error Z Value Pr Z var
UN(1,1) family 37.96 6.56 5.79 <.0001 43.05
UN(2,1) family 48.32 8.31 5.81 <.0001 69.04
UN(2,2) family 68.71 11.5 5.99 <.0001 131.45

The above table is covariance parameters ((37.96=1/4 additive genetic variance for var1, 48.32=1/4 additive genetic covariance between var1 and var2, and 68.71=1/4 additive genetic variance for var2). The approximate standard errors of genetic variances and the covariance were given after the estimates, and the variances (VAR) of the estimates were given in the last column.



Asymptotic Covariance Matrix of Estimates
Row Cov Parm CovP1 CovP2 CovP3
1 UN(1,1) 43.05 52.02 62.84
2 UN(2,1) 52.02 69.04 90.88
3 UN(2,2) 62.84 90.88 131.45
....

Variances of variance components and variance of genetic covariance is given in the table of Asymptotic Covariance Matrix above. The diagonal elements are the variances of var1, cov(var1,var2) and var2. The offdiagonal elements are covariances between var1, cov(var1,var2), var2. For example. 52.02 is the covariance between the variance of genetic variance of trait var1 and the variance of genetic covariance between traits var1 and var2.

For more details, see : Littell et al. 1996. SAS System for mixed models. SAS Institute Inc. Cary, NC.

Using the Covariance Parameters and Asymptotic Covariance Matrix Tables we can easily calculate standard error of genetic correlation. The method used here is called the DELTA, which is simply taking the variances of functions (see Lynch and Walsh 1998 for details).

 

/* You do NOT need to change the following lines, except, If the name of the genotype in your data is not 'family' then change it accordingly */

/* Start IML */

proc iml ;

/* Go to output file '_varcomp', and create a 3-row vector for 'family' group only. The 1st estimate (row) is genetic variance of Trait1, the 2nd is the genetic Covariance, and the 3rd is genetic variance of Trait2 */

use _varcomp;
read all var {Estimate} where(Subject="family") into _varcomp;
close _varcomp;

/* Create Asymptotic Covariance Matrix for 'family' group, select the first 3x3 block matrix of variances of estimates and covariances between estimates */

use _cov;
read all var {CovP1 CovP2 CovP3} into _cov;
close _cov;

/* Genetic correlation */

r = _varcomp[2,1]/sqrt(_varcomp[1,1]*_varcomp[3,1]);

/* Standard error of genetic correlation */

a =_cov[1,1]/(4*(_varcomp[1,1])**2) ;
ab=_cov[2,2]/((_varcomp[2,1])**2) ;
b =_cov[3,3]/(4*(_varcomp[3,1])**2) ;

c1=(2*_cov[1,3])/ (4*(_varcomp[1,1]*_varcomp[3,1]));
c2=(2*_cov[1,2])/ (2*(_varcomp[1,1]*_varcomp[2,1]));
c3=(2*_cov[2,3])/ (2*(_varcomp[2,1]*_varcomp[3,1]));

var_r=(r*r)*(a+b+ab+c1-c2-c3); ** Variance of genetic correlation ;
SE_r =sqrt(var_r) ; ** Standard error of genetic correlation ;

print r var_r SE_r ;


run; quit;

 

OUTPUT of the IML code

R
VAR_R
SE_R
0.946
0.0002798
0.0167276