Math 5305

Detailed Syllabus

(Taught in summer)

 

1.     Overview of Statistics

      Deterministic vs. Statistical questions

      Statistics process

           Problem conceptualization

               populations, samples, parameters

           Data collection

                Types of studies:  experiments and observational studies

                Sampling:  SRS, bias

           Descriptive statistics

           Formal inference

                Why statistics works

       (Tutorial in using the SAS/WINDOWS system)

 

2.     Descriptive Statistics for Univariate Data

      Population Distribution of a response variable

          Qualitative vs. Quantitative (discrete, continuous)

          how we represent it

          what does it mean

          how to estimate it from data

      Descriptive parameters of the population distribution

           Location measures (mean and median)

               Estimation based on raw, grouped data

           Dispersion measures (variance, standard deviation)

               The empirical rule for the normal shape

           Measures of position (quantiles)

               Estimation based on raw data

               5-number summary, Box and whisker plot

     (SAS tutorial 1.  Basic SAS data management maneuvers, univariate

      descriptive statistics from SAS)  

     (HW#1.  Using SAS to do univariate descriptive statistics)

 

3.     Descriptive Statistics for Bivariate Data

      Population Joint Distribution of two response variables

          How this distribution is described for qualitative, quantitative discrete

             and continuous variables (bivariate frequency distributions, bivariate

             histograms)

          How this distribution is used to describe joint variation of the responses

          Parameters to describe this distribution

              Location and dispersion

              Correlation

      Conditional Distributions – studying relationships between a Dependent

              variable and an Independent variable

          How to estimate these conditional distributions for the different types of

              response variables

              Both variables qualitative:  using bivariate frequency distributions

              Dependent variable quantitative, Indep. Var. qualitative: Comparative

                    box and whisker plots

              Dependent var. qualitative, Indep. Var quant: grouping the independent

                    variable

              Both variables quantitative:  regression line, slope, conditional variance

        (SAS tutorial 2.  Bivariate statistics from SAS)

       (HW#2.  Bivariate data analysis)

 

3.     Statistical Reporting

      Journal articles on writing technical reports, numeracy, displaying data in graphs

      Exploratory descriptive reports

(Project 1:  SENIC data analysis – exploratory descriptive report – requires extensive

use of SAS on a large data set for discovering and describing relationships with a

quantitative dependent variable, dealing with issues of causality (can we infer it ?),

confounders, etc.)

 

 

4.   Probability Pre-requisites for Inference

   The Normal Distribution

     Calculating Normal probabilities – the N(0,1) table and standardization

     Obtaining the quantiles of  a general normal distribution

     Q-Q plots for determining if data are normal

   Sampling distributions

     Central Limit Theorem

 

5.  Introduction to Statistical Inference

     The main ideas introduced assuming data are N(mu,sig2), where sig2 is known

        point estimation of mu

            bias and variance

            optimality of sample mean

            standard error of sample mean and its interpretation

       confidence interval for mu

            derivation using the properties of the normal distribution

            factors affecting its length (n, sig, etc)

       hypothesis testing (introduced in the context of a real problem)

           motivation for decision rule

            Type I and II errors

            operational procedure of doing a test

            discussion of the power function of the test

 

 

6.   Inference for Normally distributed data (mu and sigma unknown)

      One-sample problem

          point estimation of mu and sigma

          confidence intervals for mu and sigma

          hypothesis tests about mu and sigma

      Two independent samples

          point estimates, confidence intervals and t-test for difference

             in the means

          test for equality of variances

          large-sample test for means when variances unequal

      Paired data

          reduction to a single sample by differences

          point estimation, confidence interval and paired t-test for mean difference

    (HW#4.  Normal inference – practice with the calculations)

 

7.  Experimental Design

      Introduction

        problems with  confounders – randomization

        problems with variability – how to defeat it

      Survey of basic treatment structures in CR design and RCB design

          Two-treatment experiments

               In CR design

                    Sample size calculations (power)

                   How to randomize

                   How to analyze with normal data

               In Matched-pairs Design

                   When is matched-pairs better than CR?

                   How to randomize the matched-pairs design

                   How to analyze the matched-pairs design with normal data

     (Project 2.   Design and analysis of a two-treatment experiment in both CR

                        and Matched-pair designs, do power calculations for both with a

                        specified power constraint, collect data and analyze, dealing with

                        outliers, etc.)

           One-way layout treatment structure

                    How to randomize the CR and RCB

                   Anova, multiple comparisons (LSD) for CR

                   Anova, multiple comparisons (LSD) for RCB

            Two-factor factorial treatment structure

                    How to randomize in CR and RCB

                   Anova in CR

                       Analysis / interpretation of main effects and interaction

                       cell mean diagrams

                   Anova in RCB

        (Project 3.  Design and analysis of a two-factor factorial experiment, including

                          randomization of e.u.s, cell mean diagrams, dot plots, outlier

                          assessment)

 

 

8.  Inference for Dichotomous Data

     One sample problem (inference about p=prob of success)

     Two Independent samples

     Paired Data

     One-way layout

         In CR

         In RCB

 

 

Textbook:    The material in the course is not contained in any one textbook.

                    Much of the material is contained in a note set available from the

                    Copy Center for  about $15.00.

                    Recommended references are:

 

1.      Introduction to the Practice of Statistics  by D.S. Moore

2.      Statistics for Experimenters by Box, Hunter and Hunter