What you'll learn
- Basic statistical skills (fundamental theories, terminology)
- Confidence in using SAS as a tool for statistical analysis
- Understand and apply statistical models
- Understand and know how to convey results to the intended audience
Diploma in Probability and Statistical Analysis
Introduction to Statistical Analysis and SAS
In this lesson, you will be introduced to statistical analysis. We will discuss what the difference between this course and the data analysis course is and who this course is aimed at. Thereafter, we will introduce the tool we will utilize throughout this module, called SAS. The lesson ends with a fun practical demonstration in SAS.
Getting to Know Your Data
This lesson will be all about study design, data types and a sneak peak into summarizing data. Understanding the study design and the type of scales that are used to measure data, is a crucial step in analysing the data accurately. only after we know the pros and cons of the way the data was gathered, can we start describing the data.
This lesson will mainly focus on the different methods of summarising data. The previous lesson introduced measures of central tendency to the student. Lesson 3 will elaborate on that concept with measures of spread as well as various ways of visualising data through plots in SAS Studio. Each topic will be consolidated through a practical demonstration in SAS Studio.
This lesson will introduce the student to the concepts of probability theory. This lesson includes concepts like samples and populations. The lesson will define the basic definitions and rules of probability. This lesson will end by touching on more advanced concepts like mutually exclusive events, independent events, non independent events, and non mutually exclusive events.
Lesson 5 aims to initiate the student's understanding of random variables and a various number probability distributions known and identified. Together with these concepts, this lesson will showcase the famous Central Limit Theorem (a fundamental concept to the understanding of sample sizes to be discussed in the next lesson).
Sample Sizes and Sampling
This lesson will answer the well known question of "how many observations does the study need in order to be statistically significant?". Many studies skip this fundamental step and end up not being able to prove statistical significance as a result. Lesson 6 will understand what it means for a result to be statistically significant and why it is so important for the sample to be large enough.
Lesson 7 will focus on questions about a single group. This lesson starts to uncover the concepts of inferential statistics; statistical methods used to draw conclusions from the sample in order to make conclusion about the population. Prior lessons focused on descriptive statistics, because they helped the student to describe and summarise the data through various methods like plots and summary statistics.
Practical Module Recap
The last lesson in module 1 will recap on all the principles covered in module 1 through a practical example.
Intermediate in Probability and Statistical Analysis
Cleaning and Merging Data
The first lesson of this module is focused on enhancing your knowledge on cleaning and merging data. Lesson 1 will take it back to more basic principles, but nonetheless some of the most important principles of an analysts' work cycle, cleaning and merging data. It is estimated that up to 60% of an analysts' time is spent on cleaning data, therefore this step in the journey is crucial to manage efficiently.
Lesson 2 will teach you how to effectively manage your data. Creating new variable and transforming variables all form apart of this step of the process.
Working with Dates and Times
Working with dates and times is a vital skill in your Probability and Statistical Analysis journey, this lesson is geared to help you master this skill effectively. Dates and times are concepts integrated into most data sets and working with them, might be considered an art form for some. In lesson 3, the student will learn tips and tricks to deal with dates and times in SAS Studio in order to make their handling as painless as possible.
Introducing Linear Regression
Halfway through module 2, our focus is shifts to modelling data, by introducing linear regression, the most well-known modelling procedure.
Linear Regression Continued
Understanding linear regression requires a little bit more time, in part 2 of this lesson, we delve a little deeper into this complex topic.
Multiple Linear Regression
Multiple regression is an extension of linear regression using multiple explanatory variables, this lesson is focused on understanding this in more detail.
Introducing Logistic Regression
Logistic regression usually uses a logistic function to model a binary dependent variable, we explore this concept in this lesson and identify where more complex extensions exist.
Logistic Regression Continued
We end off module 2 by delving into the finer details of logistic regression before moving to our more advanced concepts in module 3.
Advanced in Probability and Statistical Analysis
Linear and Quadratic Discriminant Analysis
We start off our module 3 by identifying and describing linear and quadratic discriminant analysis. This lesson will also investigate the well-known Bayes Theorem for classification.
Lesson 2 will delve deeper into cross-validation with k-fold cross-validation, leave-one-out cross-validation and the validation set approach to name but a few.
Bootstrapping estimates the properties of an estimator by calculating the properties when sampling from a sample. Lesson 3 will consider this powerful statistical tool in more depth.
Subset selection, like best subset selection and step-wise subset selection, will be the focus of lesson 4.
Shrinkage methods, like the Ridge and Lasso techniques, are two of the best-known techniques for shrinking the regression coefficients.
Dimension Reduction Methods
High-dimensional data sometimes needs to be reduced to low-dimensional data in order to only include meaningful variables in the results. Dimension reduction methods discussed in this lesson, will help the student to understand these methods and be able to reduce high-dimensional data.
The field of high-dimensional statistics studies data whose dimension is larger than dimensions considered in classical multivariate analysis. High-dimensional statistics relies on the theory of random vectors. This lesson explores the limitations of this type of data and how you can go about interpreting it.
Tying it All Together
Lesson 8 of the penultimate module, once again, revisits all the concepts and skills gained through a comprehensive practical demonstration.
Proficient in Probability and Statistical Analysis
Not normal? What now?
What do we do when the data is not normally distributed? Although the normal distribution is a common distribution, there are cases where the data is, for example, the exponential distribution. Lesson 1 of the ultimate module will focus on non normal distribution. By the end of this lesson, the student will be able to identify non-normality.
Breaking the Function
Our lesson on Breaking the Function is all about polynomial regression and step functions.
How do you know which basis function to use? This lesson breaks down these important concept into understandable portions.
Regression Splines is a non-linear approach which uses a combination of linear/ polynomial functions to fit the data. This lesson is geared to help you understand where this is applicable to use.
Smoothing the Data
Now it's time to learn how to smooth the Splines and the Spline parameters.
Nearing the end of our final module, we learn more about local polynomial regression and moving regression.
GAMs, or Generalized Additive Models, is a useful way of fitting 'flexible' non-linear relationships between variables.
Reporting and Presenting
The final lesson of the Probability and Statistical Analysis course will celebrate all the skills gained throughout the course, with a fun-filled practical experiment in SAS Studio. This lesson will end by going over some vital principles needed for reporting and presenting the data.