Suppose that we draw all possible samples of size n from a given population. Suppose further that we compute a statistic (e.g., a mean, proportion, standard deviation) for each sample. The probability distribution of this statistic is called a sampling distribution.
Sampling distributions play a critical role in inferential statistics (e.g., testing hypotheses, defining confidence intervals). To make use of a sampling distribution, analysts must understand the variability of the distribution and the shape of the distribution. This lesson introduces those topics.
The variability of a sampling distribution depends on four factors:
When the population standard deviation is known, the standard deviation of a sampling distribution can be computed. When the population standard deviation is not known, the standard deviation of a sampling distribution can be estimated from sample data. The estimate of the standard deviation of a sampling distribution is called the standard error.
Formulas for computing the standard deviation of a sampling distribution differ, depending on the statistic in question. Similarly, formulas for computing the standard error of a sampling distribution differ, depending on the statistic in question. In future lessons, we present formulas for computing the standard deviation and the standard error for different kinds of statistics.
Note: If the population size is much larger than the sample size, then the sampling distribution has roughly the same standard deviation and the same standard error, whether we sample with or without replacement. On the other hand, if the sample represents a significant fraction (say, 1/20) of the population size, the standard deviation and the standard error will be meaningfully smaller, when we sample without replacement.
In some situations, a sampling distribution will be approximately normal in shape. In those situations, a researcher can use the normal distribution for analysis. In other situations, a sampling distribution will more closely follow a t distribution; and a researcher can use the t distribution for analysis.
Let's look at some guidelines for determining when a sampling distribution will be shaped like a normal distribution or a t distribution.
The central limit theorem states that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough.
How large is "large enough"? The answer depends on two factors.
In practice, some statisticians say that a sample size of 20 is large enough when the population distribution is roughly bell-shaped. Others recommend a sample size of at least 30. But if the original population is distinctly not normal (e.g., is badly skewed, has multiple peaks, and/or has outliers), researchers like the sample size to be even larger.
If the underlying population distribution is normally distributed, the sampling distribution will be shaped like a t distribution. This is true, even when the sample size is small.
In practice, many statisticians relax the normality requirement. They are comfortable using the t distribution when the population distribution is roughly bell-shaped, even if it is not exactly normal.
The t distribution and the normal distribution are both bell-shaped distributions. This suggests that we might use either the t-distribution or the normal distribution to analyze sampling distributions that are roughly bell-shaped. Which should we choose?
Guidelines exist to help you make that choice. Some focus on the population standard deviation.
Other guidelines focus on sample size.
In practice, researchers employ a mix of the above guidelines. On this site, we use the normal distribution when the population standard deviation is known and the sample size is large. We use the t distribution when standard deviation is unknown, although the t distribution and the normal distribution are nearly identical when the sample size is very large. We use the t-distribution when the sample size is small, unless the underlying distribution is distinctly not normal. The t distribution should not be used with small samples from populations that are not approximately normal.
If you would like to cite this web page, you can use the following text: