Data science with SAS Base and SAS Enterprise Miner
- Module 1: Modifying and Combining SAS Data Sets
- Modifying a Data set using the SET statement
Using Data statement to Stack multiple data sets
Using Data statement to Merge multiple data set - Module 2: Manipulating and Transforming Data
Informants and Formats
Data Set Option (Drop, Keep, Firstobs, Obs)
PROC STEP - Module 3: Introduction to Data Mining Using SAS Miner
Data Mining Methodologies
Definition, Description and Business Application
Best Practice for Data Mining
Nodes Description and Modelling Steps in SAS Miner - Module 4: Getting Started with Predictive Modelling
Opening SAS Enterprise Miner
Creating a New Project in SAS Enterprise Miner
The SAS Enterprise Miner Window
Creating a Data Source for Modelling
Creating a Process Flow Diagram
Create Data Source from SAS Tables
Use the Basic Metadata Advisor
Set Role and Level meta data for data source variables
Set the Role of the table (raw, scoring, transactional, etc) - Module 5: Explore and Assess Data Source for Modelling
Create and interpret plots, including Histograms, Pie charts, Scatter plot,
Time series, Box plot
Identify distributions
Find outlying observations
Find number (or %) of missing observations
Find levels of nominal variables
Explore associations between variables using plots by highlighting and selecting data
Compare balanced and actual response rates when oversampling has been performed
Explore data with the STAT EXPLORER node.
Explore input variable sample statistics
Browse data set observations (cases) - Module 6: Modify Data Source During Model Building
Replace zero values with missing indicators using the REPLACEMENT node
Use the TRANFORMATION node to be able to correct problems with input data sources, such as variable distribution or outliers.
Use the IMPUTE node to impute missing values and create missing value indicators
Reduce the levels of a categorical variable
Use the FILTER node to remove cases - Module 7: Describe Key Predictive Modelling Terms and Concept
Data partitioning: training, validation, test data sets
Observations (cases), independent (input) variables, dependent (target) variables
Measurement scales: Interval, ordinal, nominal (categorical), binary variables
Prediction types: decisions, rankings, estimates
Dimensionality, redundancy, irrelevancy
Decision trees, neural networks, regression models
Model optimization, overfitting, underfitting, model selection
- Module 8: Build Predictive Model using Decision Trees
- Explain how decision trees identify split points
Build Decision Trees in interactive mode
Change splitting rules
Explain how missing values can be handled by decision trees
Assess probability using a decision tree
Prune decision trees
Interpret results of the decision tree node, including: trees, leaf statistics, treemaps, score rankings overlay, fit statistics, output, variable importance, subtree assessment plots
Explore model output (exported) data sets - Module 9: Build Predictive Model using Regression
Explain the relationship between target variable and regression technique.
Explain linear regression.
Explain logistic regression (Logit link function, maximum likelihood)
Explain the impact of missing values on regression models.
Select inputs for regression models using forward, backward, stepwise selection techniques.
Adjust thresholds for including variables in a model.
Interpret a logistic regression model using log odds.
Interpret the results of a REGRESSION node (Output, Fit Statistics, Score
Ranking Overlay charts).
Use fit statistics and iteration plots to select the optimum regression model for different decision types. - Module 10: Predictive Model Assessment
Explain reasons for oversampling data.
Adjust prior probabilities.
Build a profit/loss matrix.
Add a profit/loss matrix to a predictive model.
Determine an appropriate value to use for expected profit/loss for primary outcome (from the data, possibly a mean value).
Optimize models based on expected profit/loss.
Compare Models suing Model assessment statistics.
ROC Chart.
Score Rankings Chart, including (cumulative) % response chart, (cumulative) Lift chart, gains chart.
Total expected profit.
Effect of oversampling. - Module 11: Model Implementation
Score data sets within SAS Miner
Configure a data set to be scored in Enterprise Miner
Use the SCORE node to score new data
Save scored data to an external location with the SAVE DATA node