1 Summary of discussion

1. Quantitative (numerical variable) vs qualitative (categorical variable)

2. Discrete distribution vs continuous distribution

3. Mean vs median

4. Mean vs proportion

5. Population vs sample

6. Some issues that you need to be aware of

(a) Is the sample representative?

(b) Is there measurement error?

2 Task: learn population, (random) sample

Goal: understand the diﬀerence between population and sample.

Reading: Appendix C.1 of the textbook

Discuss

1. What is population?

2. How to deﬁne the population if our goal is to show the relationship between class size

(number of students in a class) and the rating of the eco201 instructor?

3. What is sample?

4. Is our sample, which is the Excel ﬁle I provided, representative if the population is

all the Miami students that have taken eco201?

5. Is our sample representative if the population is all the Miami econ-major students

that have taken eco201?

Deﬁne your population appropriately. Do not over-generalize the result based on your sample

6. If the population is all the Miami students that have taken eco201, how to design a

new survey so that a more representative sample can be obtained? Comment on the

following ideas

(a) Go to the Rec Center, and do the new survey

(b) Go to the main lobby of FSB building, and do the new survey

You can assume the sample is random, or obtain the random sample using random numbers .

Discuss

1. Where to ﬁnd the random number? How about using the phone number or SSN?

2. Can we use computer to generate random number?

Math: population and random sample

Goal: understand the properties of random sample.

Reading: Appendix B.1, B.2, B.3, B.4 of the textbook

1. Statistically speaking, a population is an (unknown) distribution of certain variable.

2. For example, the population for our purpose may be the distribution of the Miami

econ-major students’ ratings of their eco201 instructors.

3. The population distribution can be characterized by parameters such as population

mean (expected value) denoted by Ey or µ

, population variance denoted by var(y)

or σ

, etc. Usually those parameters are unknown, and we want to estimate them.

4. A sample is a part (portion or subset) of the population.

5. For example, one sample is the ratings provided by the students in this eco311 class

(section). Another class may provide a diﬀerent sample.

6. Obtaining a sample is much easier than obtaining a population.

7. Statistics is about using the sample to estimate (make inference) the unknown pop-

ulation distribution and its parameters.

8. Intuitively, the estimate is “good” if the sample is “good.” Random sample is such a

good sample.

9. A random sample {y

, y

, . . . , y

}, or {y

}

i=1

, is a special sample with nice properties

(a) E(y

) = µ

, (i = 1, 2, . . . , n). In words, all observations have identical mean.

(b) var(y

) = σ

, (i = 1, 2, . . . , n). In words, all observations have identical variance.

, x

) = 0, ∀i ̸= j. In words, all observations are independent, so they have

zero covariance with each other.

10. Put diﬀerently, a random sample is i.i.d sample . i.i.d stands for identically and

independently distributed.

11. A biased (non-random) sample arises typically because people choose (or select) to be

the sample.

12. One example of biased sample is the sample of students who ﬁnish the online course

evaluations. Those students choose to do so because either they like the instructor

or hate the instructor. This biased sample cannot represent the students who fail to

do the evaluation. Mathematically, the students who ﬁnish and who do not ﬁnish the

evaluations follow diﬀerent distributions. So “identically distributed” is violated.

Critically thinking:

1. How to get the random sample?

2. Is the time series of US GDP from 2001 to 2012 a random sample?

3. Is the sample of econ honor students a random sample for estimating the average gpa

of econ-major students?

4. Is the sample of Miami students a random sample for estimating the average family

income of all US college students?

Check “identically” and “independently” for the sample that you intend to use

Deﬁne or choose the population appropriately. Do not over-generalize your result

3 Task: Estimation

Goal: understand estimator and property of its sampling distribution.

Reading: Appendix C.2, C.3, C.4, C.5 of the textbook

Key points

1. We use a sample to estimate the population parameter. For instance, we use sample

mean, sample variance, etc, to estimate the population mean, population variance, etc

2. The sample mean and sample variance are examples of estimators. The value of the

estimator obtained from a given sample is called estimate.

3. We do NOT expect the sample mean is the same as the population mean because

by deﬁnition a sample is just part of population. The diﬀerence between sample and

population gives rise to sampling error. The sampling error is random because diﬀerent

samples can be used.

4. People can use diﬀerent samples, and obtain diﬀerent sample means (for the same

population means). For example, you may use the men in Alabama (sample one), or

the men in Ohio (sample two), to estimate the average height of US men. This fact

highlights that

(a) Sample mean is a random variable . It is random due to the sampling error.

(b) The distribution of the sample mean is called sampling distribution. Do not

confuse the sampling distribution with the population distribution.

5. Discuss

(a) Compute the sample mean of the family income y using the Excel ﬁle I give you.

The stata command to get the sample mean (and other descriptive statistics) is

sum y

(b) There are other sections of ECO 311 being taught. Do you think the other sections

will get the same sample mean of the family income as our section?

(d) Is population mean a random variable?

6. Sample mean is just one estimator for population mean. There are other estimators.

7. Sample mean is the most popular estimator for population mean because its sampling distribution

has some nice properties

(a) (unbiasedness): the average of (indeﬁnitely many) sample means obtained from

diﬀerent samples is the same as the population mean

(b) (eﬃciency): the variance of sample mean is smaller than some other estimators.

This means the sample mean does not vary much across diﬀerent samples.

population mean. This result is called law of large number.

Math: sample mean and its sampling distribution

Goal: understand the mean and variance of sample mean

Reading: Appendix A.1, C.2, C.3, C.4, C.5 of the textbook

1. We want to show the properties of the sample mean obtained from a random sample

2. Random sample means

3. The formula for the sample mean ¯y is

¯y =

∑

i=1

≡

+ y

+ . . . + y

(1)

where the sigma notation is the shorthand for sum (summation operator).

4. The sample mean obtained from the random sample is an unbiased estimator for pop-

ulation mean because

E(¯y) = E

(

+ y

+ . . . + y

)

+ µ

+ . . . + µ

= µ

(2)

where E is the expectation operator. We use the property that the expectation of sum

is the sum of expectation:

E(y

+ y

) = E(y

) + E(y

) (3)

5. Result (2) implies the center (or mean) of the sampling distribution of ¯y is the popu-

lation mean µ

. In short, the average of the sample mean is the population mean. In

a particular sample, the sample mean can be diﬀerent from the population mean.

6. Discuss

(a) why do we emphasize random sample?

(b) please ﬁnd E(¯y) if E(y

) ̸= µ

, E(y

) = µ

, (∀i ≥ 2). Is this a random sample? Is

the sample mean unbiased?

7. The variance of the sample mean (based on random sample) is

var(¯y) = var

(

+ y

+ . . . + y

)

+ σ

+ . . . + σ

(4)

Here we use the facts that

var(cy

) = c

var(y

) (5)

var(y

+ y

) = var(y

) + var(y

) + 2cov(y

, y

) (6)

cov(y

, y

) = 0, (for random sample) (7)

See equation [C.6] on page 760 for more details. Remarks

(a) Formula (5) shows we need to square the constant if taking it out of the variance

(b) Formula (6) shows the variance of sum equals the sum of variance plus covariance.

the covariance term if observations are independent.

8. Result (4) shows that as the sample size n rise the variance of sample mean falls.

Result (2) and (4) jointly explain why the sample mean is a consistent estimator.

9. Discuss. Suppose anothera (bad) estimator for the population mean is ˜y =

This estimator only uses the ﬁrst two observations in the sample (and ignore other

observations). By contrast the sample mean uses all observations. Please show

(a) ˜y is an unbiased estimator

(b) var(˜y) > var(¯y) when n > 3. This fact shows the bad estimator is less eﬃcient

than the sample mean because its variance is bigger.

10. Discuss. What happens to the mean and variance of the sample mean when n → ∞?

lim

n→∞

E(¯y) =

lim

n→∞

var(¯y) =

What does the sampling distribution of the sample mean look like when the sample

size rises?

11. An estimator is consistent if (1) it is (asymptotically) unbiased; (2) its variances goes

to zero as n rises. The sample mean is an example of consistent estimator. A consistent

estimator is desirable because it will get close to the true value of the parameter as the

sample gets larger.

Critical thinking:

1. Does formula (4) hold for time series data? How to modify the formula (4) for the

time series data? The situation where cov(y

, y

) ̸= 0 for time series data is called

serial correlation.

2. Does formula (4) hold if var(y

) ̸= var(y

)? The situation where variances are unequal

(non-constant) is called heteroskedasticity.

3. Please shows ˜y =

is inconsistent estimator. What is the intuition?

4. Why do we prefer big sample over small sample?