Fikret Isik's home pageSAS Codes for Progeny Test Data Analysis
Disclaimer
Sometimes I use SAS software for progeny data analyses, because it is a flexible and a powerful program. Though, sometimes it is painstaking too. Please be aware that when using the SAS code examples given here, you may need to modify them according to your model. Any feedback will be appreciated. The ultimate source of the SAS procedures is SAS Manuals and related publications.
Fikret IsikLatest updates:
May 10, 2004. SAS/IML code to estimate genetic parameters was expanded and simplified.
May 18, 2004. SAS code for multivariate analysis of progeny data and calculation of genetic correlation (SE) was expanded.
1. Using a SAS/IML code to calculate genetic parameters (heritability, standard error etc...)
2. Bivariate/multivariate analysis of progeny data using SAS Proc Mixed and calculation of Genetic Correlation (Std. Error)
3. A SAS Macro code to Analyze Diallels
4. A SAS Macro code to analyze cloned factorial and cloned diallel mating designs
5. Analysis of Threshold Traits
Using IML to estimate genetic parameters
Estimation of genetic parameters (heritabilities, genetic correlations etc.) and their standard errors can be tedious. I sometimes use the following SAS IML code to estimate genetic parameters and their SE. The green text is my comment about the each step of the code. You may modify the code easily for different linear models (Credit to Bin Xiang, for writing the original SAS IML code)./*----------------------------------------------------------------------|
| We have data from a half-diallel design (no selfs and reciprocals).
| The progeny were tested at two locations using a RCB field design.
| The Linear Mixed Model is;
| Y = u + site + gca + sca + ge + plot + e, ge=gca by location interaction.
| We have variance components (VC) and covariance matrix (COV) from analysis
| e.g., from SAS Mixed Procedure output, and we want to calculate;
| (1) Percent of variance explained by each random factor in the model,
| (2) Standard Errors of variances,
| (3) Narrow-sense heritability, h2_i= 4gca/( 2gca+sca+2ge+plot+e )
| (4) Full-sib family heritability, h2_f= 2gca/(2gca+sca+2ge/s+plot/sb+e/sbn)
| (5) Standard error of heritabilities (SE) using the DELTA method.
|----------------------------------------------------------------------*//* Start IML */
proc iml ;
/*---------------------------------------------------------------------|
| Input Variance Components vector VC (e.g. from SAS MIXED ouput).
| The order of the variances is {GCA, SCA, GE, PLOT, E}
----------------------------------------------------------------------*/VC={91.74, 0.87, 8.92, 46.49, 432.5};
rown={GCA SCA GE Plot E};
coln={Estimates};/*----------------------------------------------------------------------|
| Input covariance matrix COV. The diagonal numbers are
| the variances of variance components and the off diagonal numbers are
| the covariances between random factors in the model.
|----------------------------------------------------------------------*/
COV={ 3674 -1.32 -6.01 -1.82 0.91 , -1.32 14.78 0.85 -7.31 -0.40 , -6.01 0.85 35.79 -8.57 -0.06 , -1.82 -7.31 -8.57 163.72 -67.49 , 0.91 -0.40 -0.06 -67.49 327.13 };/*----------------------------------------------------------------------|
| Input numerator row vector AU to create a matrix to initialize a
| coefficient vector for numerator.
| For NS heritability, the first term in the AU vector (GCA) is
| multiplied by 4, the rest terms by 0. -------------------------------*/
/*----------------------------------------------------------------------|
| OR we could have created Coefficients for the Numerator Vector AU using
| the SHAPE function of SAS/IML. e.g. Create a matrix named AU,
| with NROW (number of rows in the VC vector and 1 column) and
| make the all elements zero. AU=shape(0,nrow(VC),1). Then multiply GCA
| scalar by 4. AU=[1,1]=1*4. ------------------------------------------*/AU={4, 0, 0, 0, 0} ;
/*----------------------------------------------------------------------|
| Input denominator row vector AV. GCA multiplied by 2, the rest by 1.
| OR you may use the SHAPE function again to initialize a coefficient
| vector for denominator. AV=shape(1,nrow(VC),1). All elements are 1.
| Then multiply the first element (GCA) by 2. AV=[1,1]=1*2.-------------*/AV={2, 1, 2, 1, 1} ;
/* ----------------------------------------------------------------------|
| Estimate parameters. You do not need to change anything below this.
|-----------------------------------------------------------------------*/Total=VC[+,1]; *<-- Take SUM of VC column vector;
phen=AV`*VC ; *<-- Phenotypic variance. ` is transpose sign;
h2_i=AU`*VC/Phen ; *<-- NS heritability =Additive/Phen;
VC_pct=VC/Total*100; *<-- % of variances for each factor over Total;
Var_VC=vecdiag(Cov); *<-- Variances of variances. Diagonal elements of COV;
SE_VC=sqrt(Var_VC); *<-- Standard Errors of variances ;* Using Delta method to estimate standard error of heritability ;
* See an Apendix in Lynch and Walsh (1998) for details of DELTA method ;
var_U =AU`*Cov*AU ; *<---variance of numerator ;
var_V =AV`*Cov*AV ; *<---variance of denominator ;
cov_UV=AU`*Cov*AV ; *<--covariance of two traits;
seh2_i=sqrt( (h2_i*h2_i) * ((var_U/(AU`*VC)**2)+(var_V/(AV`*VC)**2)
-(2*cov_UV/(AU`*VC)/(AV`*VC))));print VC[rowname=rown colname=coln]
SE_VC[format=6.3]
VC_pct[format=6.1]
phen h2_i[format=6.2]
seh2_i[format=6.3] ;OUTPUT
VC ESTIMATES SE_VC VC_PCT PHEN H2_I SEH2_IGCA 91.74 60.61 16 672.26 0.55 0.26SCA 0.87 3.84 0GE 8.92 5.98 2PLOT 46.49 12.80 8E 432.5 18.09 75
/* ----------------------------------------------------------------------|
| Full-sib Family Mean heritability.
| h2_f = 2gca/(2gca + sca + 2ge/s + plot/sb + e/sbn).
|-----------------------------------------------------------------------*/
/* ----------------------------------------------------------------------|
| Numerator of heritability. Multiply GCA by 2, the rest terms by 0.
| if it was Half-Sib family then, AU={1,0,0,0,0}------------------------*/AU={2, 0, 0, 0, 0} ;
/*----------------------------------------------------------------------|
| S= # of locations, B= # of reps, N= Average # of trees/rep/family.
| The above coefficients are needed for family mean phenotypic variance.
|----------------------------------------------------------------------*/S=4; B=6; N=4.3;
/* ----------------------------------------------------------------------|
| Denominator of heritability or Full-sib Family Phenotypic variance.
| Create row vector AV from VC vector. All elements are 0.
| Then, make all elements equals to 1 and multiply each with different
| coefficients (e.g., 1/S, 1*2) for full-sib family phenotypic variance.---*/AV=shape(0,nrow(VC),1);
AV[1,1]=1*2; *<--assign coeff. For first Variance (gca) in the model;
AV[2,1]=1;
AV[3,1]=1*2/S; *<---assign coefficient for GE Variance in model etc.;
AV[4,1]=1/S/B;
AV[5,1]=1/S/B/N;
* If you print AV vector, the results would be ={2, 1, 0.25, 0.0417, 0.00969} ;/* ---- You do not need to change anything below this line ----*/
Phen_f= AV`*VC; *<---Family Mean Phenotypic variance ;
h2_f=AU`*VC/(AV`*VC); *<---Family Mean heritability =2GCA/Phen_f;var_U=au`*cov*au ; *<---variance of GCA variance (var(GCA));
var_V=av`*cov*av ; *<---variance of family means variance. Sigma(f) ;
cov_UV=au`*cov*av ;seh2_f=sqrt( (h2_f*h2_f) * ((var_u/(au`*vc)**2)+(var_v/(av`*vc)**2)
-(2*cov_UV/(au`*vc)/(av`*vc))));print AV h2_f[format=6.2] seh2_f[format=6.3];
quit ;
OUTPUT
AV PHEN_F h2_f SEh2_f2 192.70797 0.95 0.0371 0.25 0.0416667 0.0096899
3. Using a Macro SAS code to analyze diallel tests
Diallel mating designs are commonly used in forestry. The analyses of diallel designs are not straightforward and different from nested or factorial mating designs, because the same individual contains two levels of the same effect. A new SAS code was developed to analyze diallels (For details: Xiang, B. and Li, B. 2001. A new mixed analytical method for genetic analysis of diallel data. Can. J. For. Res. 31:2252-2259). The program can be obtained from the co-author Dr. Bailian LI.
Here, I explained how to use the macro code to analyse diallels.
The data set must have the following factors:
AGE SER $ DL $ FEMALE $ MALE $ CROSS $ TEST REP TREE TRAITwhere SER is diallel serial number, DL is diallel number. SER, DL, FEMALE, MALE and CROSS must be character, the rest can be either numeric or character. If the data set does not include some of the above factors, they should be created as dummy variables e.g.
data a ; set a ;
dl='1' ; age='-1' ; ser='0' ;*The program can be called by %inc statement;
%include 'd:\myfolder\Macro_Diall_Mixed_IBV 0502.sas' ;*The name of the macro to run the code is %mixeddial.;
%mixeddial(serid=0,dlset=mydata,Y=volume,
outpath=d:\myfolder\, mtest=N);DLSET is the data set to be analyzed. The programs assumes multiple test sites. If there is only 1 site, then create the variable as TEST=1, and use mtest=N option. If not, remove the mtest=N. The program analyze one trait at each time. Change Y= to analyze another trait. Along with GCA and SCA variances, the program estimates GCA values of parent trees, SCA values of CROSS and Individual tree breeding and genetic values. if there is no individual tree ID numbers in the data set, then suppress TREE by IBV=N. The program can estimate BLUE (Best Linear Unbiased Estimates) of fixed effects (TEST and REP) and BLUP (Best Linear Unbiased Predictors) of random genetic effects (female, male, cross, interactions).
Cloned factorial and cloned diallel mating designs.
%inc 'D:\myfolder\clonal_bv.SAS';
%MixedFactCln(dset=mydata, y=volume, female=mother, male=father,
rep=block, msite=N);%MixedFactCln is for factorial mating and %MixedDialCln is for diallel mating. Change the macro name accordingly. clone=clone, clone variable name (coding must be unique for each). Data set should have the following variables: FEMALE MALE CROSS CLONE SITE REP Y.
/*---------------------------------------------------------|
Once the analysis is completed, Adjusted BV and GV values
can be calculated from the output file=CLONEBV.
Definition of the columns in the CLONEBV output file:
AF=GCA_Female, AM=GCA_Male, AW=within family BV,
GW=Within family genetic value,
CBV=Clonal Breeding Value, SE_CBV(SE of CBV),
CGV=Clonal Genetic Value, SE_CGV (SE of CGV),
CBV=AF+AM+AW <--clonal breeding value
CGV=AF+AM+SCA+GW <---clonal genetic value
GRANDMEAN= from fixed effects solution (S_f)
---------------------------------------------------------*/data clonebv ; set work.Clonebv ;
Adj_CBV=cbv + grandmean ;
Adj_CGV=cgv + grandmean ;
run;The macro sas code assumes balanced Female, Male, Clone and Reps.
Thus, factors having blank or missing values should be deleted from the data set.
Last updated, June 8, 2007