Showing posts with label sample size. Show all posts
Showing posts with label sample size. Show all posts

Wednesday, June 01, 2011

Ask CRRC | Population Sizes and Sample Sizes

Q: The 2010 Caucasus Barometer includes about 2,000 completed interviews in each country: Armenia, Azerbaijan, and Georgia. However, the three countries vary in size; the population of Armenia is just under 3 million, Georgia has a population of about 4.6 million, and the population of Azerbaijan is about 8.4 million (according to the CIA World Factbook). How can the same or a similar sample size be appropriate for each country?


A: Great question! Contrary to popular belief, the total population size has little effect on the necessary sample size. Necessary sample size is more dependent on the amount of variability between members of a population. Only one person would need to be sampled if there were no variability in a population and every member would give identical answers.


Let’s use a physical example to make this more clear:



The two populations above have the same average height, but the members of Population B have much more variability in height than the members of Population A. Thus, if you were sampling Population B you would need a much larger sample size in order to reach the same level of certainty about the population’s average height than you were if you were sampling Population A. In short, the greater the amount of variability in the population, the larger your sample size needs to be in order to capture that variability.

Other issues that affect sample size include how accurate you want conclusions drawn from the sample to be and how certain you want those conclusions to be. In making a precise statement, you could say, for example, that “from the 2010 Caucasus Barometer, our best estimate of the proportion of Tbilisi residents who have travelled to another country is 18.5%, and we are 95% sure that the true value is between 15.5% and 21.5%.” Technically speaking, 95% is our confidence level and our margin of error is 3%. Therefore, we are 95% sure that the true value lies within the range of our best estimate plus or minus 3%. To increase your level of confidence or reduce the margin of error, you would need a larger sample size -- and more money to pay for the extra interviews.

Here is one more thing worth knowing about sampling. Imagine a country of 5 million people, and a village of 500 inhabitants (both with the same amount of variability). Let’s say you require a sample of 200 from the country to reach a 95% level of confidence and a 3% margin of error. How many inhabitants of the village should be sampled to reach that same level of confidence and the same margin of error? Take a guess.

Done? The number is surprisingly high: we still need to sample one hundred and forty three inhabitants from the village. So while the country is 10,000 times the size of the village, it only requires an extra 57 people in the sample to achieve the same margin of error at the same level of confidence. In other words, one entirely counter-intuitive aspect about sampling is that small populations may still require a large proportion to be sampled to get representative findings.

In summary, while population size is one of the four factors that influence the necessary sample size for any survey (and even more factors have to be considered for complex surveys like the CB), its influence is relatively negligible.

Do you have further questions? Write a comment and let us know.

Wednesday, March 02, 2011

Ask CRRC | Sample Size

Q: In the last posting you said that in order for the sample to be representative of the entire population, every member of the population had to have some chance of being selected for the sample. However, you didn’t say anything about sample size. Doesn’t sample size matter?

A: As long as the sample size is not tiny, then the sample can be representative of the population – having 200 respondents or 2,000 respondents does not make a difference in whether you can call the sample representative of the population. Where sample size does make a difference is in how accurate your conclusions about the population of interest will be. Let’s explain what that means with an example:

Suppose we are interested in the population of voters in Rustavi and that we are interested in the proportion of residents who find the availability of gas to be an important local issue. We take a list of the 98,492 registered voters in Rustavi and randomly select a sample for interview. Now, let’s imagine two different scenarios: In the first, we randomly select 200 respondents and interview them. In the second, we randomly select 2,000 respondents and interview them. Now, imagine that in the first scenario, 64 respondents mentioned the availability of gas as an important local issue and 138 did not. Imagine that in the second scenario 640 respondents mentioned it and 1,380 did not. Because 64/200=0.32 and 640/2,000=0.32, in both scenarios exactly 32% of the respondents said that the availability of gas is an important local issue.

Both of these samples are representative of the population of Rustavi because every resident had a chance to be in the sample. In both cases, our best estimate of the proportion of Rustavi residents who consider the availability of gas to be a major issue is the same. This is the proportion that we encountered in each sample: 32%.

However, the two different sample sizes allow us to say two different things about the greater population of Rustavi. This is because in general the larger the sample size, the smaller the margin of error. The margin of error tells us how wide the range is within which we are sure that the true value for the entire population lies. For example, in the first scenario, using statistical formulas we can calculate that there is a 95% chance that the proportion of the entire population of 98,492 registered voters that considers the availability of gas to be an important issue is between 25.5% and 38.5%. However, in the second scenario, our calculations will tell us that we can be 95% confident that the proportion is between 30% and 34%.

That is, in the first scenario, we were 95% confident that the proportion was between 32% - 6.5% and 32% + 6.5%. In the second scenario, we were 95% confident that the proportion was between 32% - 2% and 32% + 2%. In other words, in the first scenario, the margin of error is 6.5% and in second scenario the margin of error is 2%. To conclude, different sample sizes can still be representative of a population. However, the margin of error varies with respect to the sample size and can tell us how accurate conclusions are about the population of interest.