About the Data Step

A Data step begins with the declaration "DATA aDataSetName;", where you can put in any name you want. If you create several datasets in a single program, these names are required to tell the PROC statements which data they ought to use.

The Data step can be used to

1. Organize the input of numbers from within the program itself or from another file.

2. Recode, assign missing values, or create new variables.

3. Create randomly distributed variables for use in Monte Carlo projects or other simulation.

Free Field or Fixed Field Format?

Data can either be "free field", meaning numbers are separated by spaces or commas or such, or it can be "fixed field", meaning that column 1 has a particular value, columns 2-3 have another value, and so forth.

Free Field Format Example

If you have raw numbers in free field format, meaning they are separated by spaces, then you need a data step that uses the combination of "input" and "cards". Here is a skeleton of a sas program that would work:

data whatever;
input  x y z z1 z2;
cards;
    2 32 55 22 55
    44 11 44 113133 11
   (**put more numbers here ***)
;

proc print;

The name of the dataset, the name that other steps in the program need to use when they refer to this data, is "whatever". You could change that to something descriptive to make it easier to remember.

The input statement lists the names of the variables. You have to use these names to refer to the variables when you have procs that do something to the data.

The term cards is old fashioned. It refers to the olden days when we used Holerith (sp?) cards, one for each line of data. It is your way of telling SAS "here come some lines of data, get ready".

The semicolon at the end of the numbers is required.

Read Data in from a Text File.

Suppose you have data in a "flat" text file. Its just numbers, no formatting, it is not compressed, it has nothing in it but numbers that have spaces between them.

You could just copy the numbers into the middle of a sas program, and then you would have some card-image numbers and you could proceed. That can get a bit messy, however, if you are writing ten or twenty programs that use the same data. Think of all those copies of a dataset using up your precious space!

Lucky for you, SAS has an "infile" command that can get the data and read it in for you. The exact format of this command seems to depend on which sort of computer you have, but it works for me like this.

Suppose you want to create a sas dataset called "powerHit". You have the numbers stored in a text file called "hitter.data" in the SAME DIRECTORY where you have saved the SAS program.

Data powerHit; infile 'hitter.data'; input varName1 varName2;

Proc REG DATA=powerHit; model varName1=varName2;

I just threw in the proc to emphasize the fact that the dataset named powerHit is accessed by that name, and that the model statement uses the variable names declared in the SAS job.

It is Vital To Study Some Examples

In the Example Code directory, I have sample SAS programs that you should look over. Please scan carefully for dataset "names" and see how they are used.

  • introductory_example.sas: This file shows the fundamentals of a free-field dataset. There is a readme file with it. Copy it to your account, run it, have fun with it!

  • crime.sas. This is a fixed field dataset with a README file.

  • bank.sas. This shows examples of how to create new variables and recalculate values with IF THEN statements.

-- PaulJohnson - 10 Dec 2002

 
|Powered by TWiki