Using the SAS System and SAS Enterprise Miner for Data Mining: A Study of Cancer Survival at Mayo Clinic
This paper evaluates and predicts a certain epidemiological (cancer survival) condition using data-mining techniques in SAS. A data set that contains information about the survival of lung-cancer patients from a study at the Mayo Clinic was extracted from the R survival package. Data-mining techniques - namely linear and logistic regression models, regression and classification trees, and nearest-neighbor analysis - are used for the analysis to see which method is best for determining cancer survival. Both a continuous response variable and a dichotomous response variable are selected and used to evaluate cancer survival of the patients. Linear regression, regression trees, and nearest-neighbor analysis are used to analyze the continuous response variable; logistic regression, classification trees, and nearest-neighbor analysis are used for the dichotomous response variable.