Data science with SAS Base and SAS Enterprise Miner

  • Module 1: Modifying and Combining SAS Data Sets
  • Modifying a Data set using the SET statement
    Using Data statement to Stack multiple data sets
    Using Data statement to Merge multiple data set
  • Module 2: Manipulating and Transforming Data
    Informants and Formats
    Data Set Option (Drop, Keep, Firstobs, Obs)
    PROC STEP
  • Module 3: Introduction to Data Mining Using SAS Miner
    Data Mining Methodologies
    Definition, Description and Business Application
    Best Practice for Data Mining
    Nodes Description and Modelling Steps in SAS Miner
  • Module 4: Getting Started with Predictive Modelling
    Opening SAS Enterprise Miner
    Creating a New Project in SAS Enterprise Miner
    The SAS Enterprise Miner Window
    Creating a Data Source for Modelling
    Creating a Process Flow Diagram
    Create Data Source from SAS Tables
    Use the Basic Metadata Advisor
    Set Role and Level meta data for data source variables
    Set the Role of the table (raw, scoring, transactional, etc)
  • Module 5: Explore and Assess Data Source for Modelling
    Create and interpret plots, including Histograms, Pie charts, Scatter plot,
    Time series, Box plot
    Identify distributions
    Find outlying observations
    Find number (or %) of missing observations
    Find levels of nominal variables
    Explore associations between variables using plots by highlighting and selecting data
    Compare balanced and actual response rates when oversampling has been performed
    Explore data with the STAT EXPLORER node.
    Explore input variable sample statistics
    Browse data set observations (cases)
  • Module 6: Modify Data Source During Model Building
    Replace zero values with missing indicators using the REPLACEMENT node
    Use the TRANFORMATION node to be able to correct problems with input data sources, such as variable distribution or outliers.
    Use the IMPUTE node to impute missing values and create missing value indicators
    Reduce the levels of a categorical variable
    Use the FILTER node to remove cases
  • Module 7: Describe Key Predictive Modelling Terms and Concept
    Data partitioning: training, validation, test data sets
    Observations (cases), independent (input) variables, dependent (target) variables
    Measurement scales: Interval, ordinal, nominal (categorical), binary variables
    Prediction types: decisions, rankings, estimates
    Dimensionality, redundancy, irrelevancy
    Decision trees, neural networks, regression models
    Model optimization, overfitting, underfitting, model selection

Go the top

  • Module 8: Build Predictive Model using Decision Trees
  • Explain how decision trees identify split points
    Build Decision Trees in interactive mode
    Change splitting rules
    Explain how missing values can be handled by decision trees
    Assess probability using a decision tree
    Prune decision trees
    Interpret results of the decision tree node, including: trees, leaf statistics, treemaps, score rankings overlay, fit statistics, output, variable importance, subtree assessment plots
    Explore model output (exported) data sets
  • Module 9: Build Predictive Model using Regression
    Explain the relationship between target variable and regression technique.
    Explain linear regression.
    Explain logistic regression (Logit link function, maximum likelihood)
    Explain the impact of missing values on regression models.
    Select inputs for regression models using forward, backward, stepwise selection techniques.
    Adjust thresholds for including variables in a model.
    Interpret a logistic regression model using log odds.
    Interpret the results of a REGRESSION node (Output, Fit Statistics, Score
    Ranking Overlay charts).
    Use fit statistics and iteration plots to select the optimum regression model for different decision types.
  • Module 10: Predictive Model Assessment
    Explain reasons for oversampling data.
    Adjust prior probabilities.
    Build a profit/loss matrix.
    Add a profit/loss matrix to a predictive model.
    Determine an appropriate value to use for expected profit/loss for primary outcome (from the data, possibly a mean value).
    Optimize models based on expected profit/loss.
    Compare Models suing Model assessment statistics.
    ROC Chart.
    Score Rankings Chart, including (cumulative) % response chart, (cumulative) Lift chart, gains chart.
    Total expected profit.
    Effect of oversampling.
  • Module 11: Model Implementation
    Score data sets within SAS Miner
    Configure a data set to be scored in Enterprise Miner
    Use the SCORE node to score new data
    Save scored data to an external location with the SAVE DATA node

Go to the top