今天开始注重变量操作。
SAS支持基本的加减乘除,值得一提的是它的**代表指数,而不是^。* Modify homegarden data set with assignment statements; DATA homegarden; INFILE 'c:\MyRawData\Garden.dat'; INPUT Name $ 1-7 Tomato Zucchini Peas Grapes; Zone = 14; Type = 'home'; Zucchini = Zucchini * 10; Total = Tomato + Zucchini + Peas + Grapes; PerTom = (Tomato / Total) * 100; RUN; PROC PRINT DATA = homegarden; TITLE 'Home Gardening Survey'; RUN;
* Modify homegarden data set with assignment statements;
DATA homegarden;
INFILE 'c:\MyRawData\Garden.dat';
INPUT Name $ 1-7 Tomato Zucchini Peas Grapes;
Zone = 14;
Type = 'home';
Zucchini = Zucchini * 10;
Total = Tomato + Zucchini + Peas + Grapes;
PerTom = (Tomato / Total) * 100;
RUN;
PROC PRINT DATA = homegarden;
TITLE 'Home Gardening Survey';
RUN;
但是如果有缺失值的话,SAS的加法会生成缺失值而不是自动按0处理。为了避免这一点,应该调用sum()函数而不是直接写+。
SAS的函数调用很简单:AvgScore = MEAN(Scr1, Scr2, Scr3, Scr4, Scr5); DayEntered = DAY(Date); Type = UPCASE(Type);
AvgScore = MEAN(Scr1, Scr2, Scr3, Scr4, Scr5);
DayEntered = DAY(Date);
Type = UPCASE(Type);
函数有文本类、数字类、日期类等等。
如果,则:IF then: IF Model = 'Mustang' THEN Make = 'Ford';
IF then: IF Model = 'Mustang' THEN Make = 'Ford';
还可以执行多项命令,需要嵌套do;可以用and和or:IF Year IF Model = 'Corvette' OR Model = 'Camaro' THEN Make = 'Chevy'; IF Model = 'Miata' THEN DO; Make = 'Mazda'; Seats = 2; END;
IF Year IF Model = 'Corvette' OR Model = 'Camaro' THEN Make = 'Chevy';
IF Model = 'Miata' THEN DO;
Make = 'Mazda';
Seats = 2;
END;
还可以if else:IF Cost = . THEN CostGroup = 'missing'; ELSE IF Cost ELSE IF Cost ELSE CostGroup = 'high';
IF Cost = . THEN CostGroup = 'missing';
ELSE IF Cost ELSE IF Cost ELSE CostGroup = 'high';
用if可以选择数据子集:IF Sex = 'f'; IF Sex = 'm' THEN DELETE;
IF Sex = 'f'; IF Sex = 'm' THEN DELETE;
SAS中保留和累加
比如要求累加值(等价于R里面的cumsum),需要:* Using RETAIN and sum statements to find most runs and total runs; DATA gamestats; INFILE 'c:\MyRawData\Games.dat'; INPUT Month 1 Day 3-4 Team $ 6-25 Hits 27-28 Runs 30-31; RETAIN MaxRuns; MaxRuns = MAX(MaxRuns, Runs); RunsToDate + Runs; RUN; PROC PRINT DATA = gamestats; TITLE "Season's Record to Date"; RUN;
* Using RETAIN and sum statements to find most runs and total runs;
DATA gamestats;
INFILE 'c:\MyRawData\Games.dat';
INPUT Month 1 Day 3-4 Team $ 6-25 Hits 27-28 Runs 30-31;
RETAIN MaxRuns;
MaxRuns = MAX(MaxRuns, Runs);
RunsToDate + Runs;
RUN;
PROC PRINT DATA = gamestats;
TITLE "Season's Record to Date";
RUN;
看一眼最终数据:
累加效果出来了~还有一栏是迄今最大值。这也是我觉得sas和R很不同的一点:sas是指针式操作,一行行往下读;而在R里面我们更多是向量或者矩阵式运算,感觉还是有所区别的...
这个就更有点矩阵的味道了,不过还是偶尔感觉怪怪的...感觉数据整理和操纵方面,SAS还是比不上R灵活...
例子为替换为缺失值:* Change all 9s to missing values; DATA songs; INFILE 'c:\MyRawData\WBRK.dat'; INPUT City $ 1-15 Age domk wj hwow simbh kt aomm libm tr filp ttr; ARRAY song (10) domk wj hwow simbh kt aomm libm tr filp ttr; DO i = 1 TO 10; IF song(i) = 9 THEN song(i) = .; END; RUN; PROC PRINT DATA = songs; TITLE 'WBRK Song Survey'; RUN;
* Change all 9s to missing values;
DATA songs;
INFILE 'c:\MyRawData\WBRK.dat';
INPUT City $ 1-15 Age domk wj hwow simbh kt aomm libm tr filp ttr;
ARRAY song (10) domk wj hwow simbh kt aomm libm tr filp ttr;
DO i = 1 TO 10;
IF song(i) = 9 THEN song(i) = .;
END;
RUN;
PROC PRINT DATA = songs;
TITLE 'WBRK Song Survey';
RUN;
这样9就全部替换为缺失值了。把后面10列认为是一个数组,可以直接操作。
SAS还有若干变量名的快捷方式,暂不赘述了...
扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有