Fikret Isik's home pageDisclaimer
In general, I use SAS software for analyses, because it is a flexible and a powerful program to manage data. Though, sometimes it is painstaking too. I decided to put some useful SAS statements in my home page, for myself, but also for students who might find those statements useful. Please be aware that when using the SAS code examples given here, you may need to modify them according to your model. Some of the codes may not work for your case. I will be happy if you let me know any mistakes or unclear explanations. Always keep in mind that the ultimate source of the SAS functions, operators and procedures is SAS Manuals.
Fikret Isik
SAS Tips for data handling
Most of the examples, particularly the ones about the analysis are for users who has some experience with SAS, but as well as familiar with statistics and quantitative genetics. If you are a beginner for SAS, I recommend you to start with some other publications. 'The Little SAS Book, a primer' by Lora D. Delwiche and Susan J. Slaughter is an excellent book for the SAS beginners.
- Importing external data files (Excel, Access and Delimited)
- PUT/INPUT: Converting a numeric variable to a character or vice versa
- COMPRESS: Creating a new variable by merging several variables
- SUBSTR: Creating new variables by partitioning a variable
- Removing the blank from a variable and aligning
- TRANSLATE: Converting a decimal point from comma (,) to period (.)
- Using the AND and OR operators
- Using DIF and LAG functions to create a new variable
- LOWECASE/UPCASE: Converting a variable into Lowercase or Uppercase
- Arranging 3 variables for a given individual into one column
- Creating an output text data file
- Changing observations to variables (Transpose)
- Grouping observations with IF/THEN ELSE statements
- Using WILDCARDS (* and ?) to concatonate multiple external files
- Adding the overall mean to every observation in the original data set
Data Modification Tips
There are many data tips in SAS. Here, I put some of them that
I frequently use.
1. Importing external data files (Excel, Access and delimited). See PROC IMPORT for details.
Importing an excel file
Proc import datafile='d:\myfiles\Excel File.xls'
out=mydata replace ;
sheet='sheet1';
run;Importing Access table
proc import table="test1"
out=mydata dbms=access;
database="c:\myfiles\testdata.mdb";
run;Importing an external delimited file
Region,State,Month,Expenses,Revenue
Southern,GA,JAN97,2000,8000
Southern,GA,FEB97,1200,6000
Northern,NY,MAR97,6000,5000
proc import datafile="\myfiles\mydata"
out=mydata dbms=dlm replace;
delimiter=',';
getnames=yes;
run;
2. Converting a numeric variable to a character or vice versa
Var_cha=put(test, $2.); * to convert to character;
Var_num=input(test,4.1); * to convert to numeric;
3. Creating a new variable by compressing several variables
test family rep female male tree 1 A 1 F43 M28 1 1 B 1 F22 M02 2 1 B 2 F22 M02 4data a ; set a;
ID=compress(test||family||rep||tree); *Combine factors using ||, remove any trailing blank using COMPRESS;
CROSSno=trim(female||male); *Combine FEMALE and MALE variables with ||, remove any trialing blank using TRIM;
CROSSblank=female||male ; * Trim or Compress is not used. New variable may contain blanks;
Proc print data=a ; run;
Result:
test family rep female male tree ID CROSSno CROSSblank 1 A 1 F43 M28 1 1A11 F43M28 F43 M28 1 B 1 F22 M02 2 1B12 F22M02 F22 M02 1 B 2 F22 M02 4 1B24 F22M02 F22 M02
4. Creating new variables by partitioning a variable using the SUBSTR
* Let tree ID is combination of test#=1, family=A, rep#=2, tree#=3;
ID vol dbh ht 1A23 12.1 4.5 2.5 1A24 14.5 5.2 2.7 1B15 11.8 3.9 2.1data a ;
set a;
tree=substr(ID,4,1); *Use ID, take 4th character to create TREE number;
family=substr(ID,2,1); * create family code ;
reptree=substr(ID,3,2); *Start from 3rd column, pick 3rd and the next;
Result:
proc print data=a; run;
treeID vol dbh ht tree family reptree 1A23 12.1 4.5 2.5 3 A 23 1A24 14.5 5.2 2.7 4 A 24 1B15 11.8 3.9 2.1 5 B 15
5. Using the TRIM to remove trailing blank
*TRIM copies a character argument, removes all trailing blanks, and returns the trimmed argument as a result;p1=trim(left(p1)); *trim romeves blank, left aligns to left;
p2=trim(left(p2));* or ;
data a;
input part1 $ 1-10 part2 $ 11-20;
hasblank=part1||part2;
noblank=trim(part1)||part2;
put hasblank;
put noblank;
datalines;* SAS Statements Results;
x="A"||trim(" ")||"B";
put x;
Result: A B
x=" ";
y=">"||trim(x)||"<";
put y;
Result: > <
6. Removing the blank and aligning LEFT
a=' DUE DATE';
b=left(a);
put a;
Result: a='DUE DATE'
7. Converting a period separated value to a comma separated value (TRANSLATE)
dep=input(translate(depcomma,'.',','),8.3);The Translate statement converts trait depcomma from 12.45 to 12,45. Input statement converts it to numeric.
8. Using the AND and OR operators,
You can combine two or more conditions. AND combines two conditions by selecting values that satisfy both conditions, and OR combines two conditions by selecting values that satisfy either or both conditions.
Data a ;
set b (keep=ID distance dens ringno type);
*where ID='4N14' | ID ='4N15' | ID ='4N21' ;
where ID in ('4N14', '4N15', '4N21');
* where type=E & experience=4 ; *&=AND ;
run;
9. Using DIF and LAG functionsDIF function creates a new variable (D) by taking the difference of observations of another variable (X) (See SAS manual). Syntax, DIF<n>(argument); *<-- n specifies number of LAGS
data two;
input X @@;
Z=lag(x);
D=dif(x);
datalines;
2 6 4 7;
proc print data=two;
run;
Results of the PROC PRINT step follow:
X Z D
1 . .
2 1 1
6 2 4
4 6 - 2
10. Converting a variable into Lowcase or Uppercasedata a; set a;
x='INTRODUCTION';
y=lowcase(x);
put y;
run;
11. How to put 3 variables of a genotype given in different columns into one column
fam tree ht dbh vol 1 1 12 4.5 125 1 2 10 3.9 109data a;
set b(rename=(ht=y))
b(rename=(dbh=y))
b(rename=(vol=y));
run;
Result:
fam tree y 1 1 12 1 1 4.5 1 1 125 1 2 10 1 2 3.9 1 2 10912. Creating an OUTPUT Text data file (ASCII) from SAS file
* The first part creates a file putting Column HEADINGS only;
data _null_ ;
file 'c:\myfolder\data.txt' ;
put 'factor var1 var2 var3' ; run;
* 2nd part puts data in columns, appends this file to the first;
data _null_ ;
set mydata;
file 'c:\myfolder\data.txt' mod ;
put factor 1-8 (var1-var2) (7.1) ; run;* PC: the 'mod' statement appends the second file to the first.
Delete the output file (data.txt) before running the code again;
data a ;
input chain $ stat $ g1 g2 g3 ;
datalines;
378oneI1 MEAN 0.5130 0.1976 0.7106
378oneI1 MEDIAN 0.4675 0.1703 0.6712
378oneI1 STD 0.3235 0.1339 0.3634
378oneI2 MEAN 0.5257 0.1977 0.7233
378oneI2 MEDIAN 0.4810 0.1707 0.6818
378oneI2 STD 0.3267 0.1309 0.3699
;
proc sort data=a ; by chain ; run;
* If you have Gruoping (here Chain) use BY statement. It is not transposed.
the ID statement names the variable whose values become new variable names
VAR variables are the transposed values;
proc transpose data=a out=b;
by chain;
var g1 g2 g3;
id stat;
run;
data c; set b;
meanstd=input(compress(mean||'±'||std), $12.) ;
run;
proc print data=c noobs; run;
chain _NAME_ MEAN MEDIAN STD meanstd 378oneI1 g1 0.5130 0.4675 0.3235 0.513±0.3235 378oneI1 g2 0.1976 0.1703 0.1339 0.1976±0.133 378oneI1 g3 0.7106 0.6712 0.3634 0.7106±0.363 378oneI2 g1 0.5257 0.4810 0.3267 0.5257±0.326 378oneI2 g2 0.1977 0.1707 0.1309 0.1977±0.130 378oneI2 g3 0.7233 0.6818 0.3699 0.7233±0.369 14. Grouping observations with IF/THEN ELSE statements
* grouping variables or creating a new variable using
IF/THEN ELSE statement;data a;
input dens percent @@;
datalines;
0.453 0.42..0.470 0.46..0.396 0.39 .0.430 0.46
;
run;data a; set a ;
if dens>=0.450 and percent>=0.40 then type='T';
else type='J' ;
run;
Result:dens . percent type
0.453... 0.42... T
0.470... 0.46... T
0.396... 0.39 ...J
0.430... 0.46... J
15. Using * or ? WILDCARDS to read many external files into one file
The * and ? wildcards can be used in either the external file name or file extension for matching input file names. Use ? to match a single character and the * to match one or more characters. The below code combines all the data sets starting with 'wild' in the 'myfolder'.Example1:
filename wild 'd:\myfolder\wild*.dat';
data a;
infile wild;
input var1 var2 var3;
run;'wild2?4.dat' links all the data sets 'wild204.dat', 'wild254.dat', 'wild2A4.dat', but not 'wild104.dat'
Example2: This code combines all the data sets in the 'myfolder' subdirectory regardles of their extention.
data a;
infile 'myfolder/*.*' dlm='09'x dsd missover firstobs=2;
input station week time tmax tmin prec snow;
run;Example3: This code combines all the data sets having .txt extention in the 'myfolder' subdirectory
data a;
infile 'myfolder/*.txt' dlm='09'x dsd missover firstobs=2;
input station week time tmax tmin prec snow;
run;
16. Adding overall mean to a data set
data a;
input x @@;
cards;
5 3 4 2 4 ;proc means data=a noprint;
output out=sumdata(keep=xmean) mean=xmean;data b;
if _n_=1 then set sumdata;
set a;
Last updated, October 31, 2006