There are lies, damn lies – and statistics.
-Mark Twain

### 1. Biased Sample Fallacy

Statistical fallacies are common on the GRE as a form of causal fallacies. The biased sample fallacy is committed whenever the data for a statistical inference is drawn from a sample that is not representative of the population under consideration.

A recent study showed that over 60% of Oregon residents watched cartoons. Based on this study, executives at Cartoon Channel spent \$10 million to expand their access to Oregonians, who appear to be avid fans of cartoons.

Note that this survey doesn’t say anything about the specific Oregon residents polled. Are they school children? The results seem to indicate so. A sample must be representative of the overall population that we want to study in order to make a general conclusion.

Here is another example:

In a recent survey conducted by Wall Street Weekly of its readers, 80% of the respondents indicated their strong disapproval of increased capital gains taxes. This survey clearly shows that increased capital gains taxes will be met with strong opposition from the electorate.

The data for the conclusion of this argument is drawn from a sample that is not representative of the entire electorate. The survey was conducted by people who invest and not random members of the electorate. People who read about investing are more likely to have an opinion on the topic of taxes on investments than the population at large.

John: I don’t want to die in an accident. Every few days on the TV news, I hear of a major plane crash somewhere in the world. I would never fly in planes; they are too dangerous.

Ted: Nonsense, statistics show that airplanes are the safest mode of transportation on a per-mile basis.

John: The answer then is not to travel such long distances.

### Analysis

Analysis: John is pointing out that plane crashes are always in the news, therefore they must be very dangerous. The TV news, however, is a biased sample of all accidents. Minor traffic fatalities around the world rarely make the news; but plane crashes do.

Ted points out the obvious fact that on a per-mile basis, planes are safer, yet planes can travel ten thousand miles, so a long trip does entail risk. John’s final analysis is to play it safe and not to travel long distances at all.

### 2. The Texas Sharpshooter

This is cherry picking a data cluster to suit the argument or finding a patter to fit a presumption after the fact (another type of biased sampling fallacy). So, you come up with a theory and then scour data to cherry pick to establish a causal relationship. This is teleological thinking (that means working backwards from something that happened trying to find its cause). A good example of this is a scandal involving a professor who looked through data to draw conclusions, after the fact to make remarkable conclusion.

### 3. Insufficient Sample Fallacy

The Fallacy of the Insufficient Sample (also called the “hasty generalization”) is committed whenever an inadequate sample is used to justify the conclusion drawn. In a Biased Sample, people are pulled from a non-representative group; in an Insufficient Sample, not enough people are polled to yield a statistically significant result.

I have worked with three people from New York City and found them to be obnoxious, pushy, and rude. It is obvious that people from New York City have a bad attitude.

Observations of three people are not sufficient to support a conclusion about 10 million. Bad luck could account for meeting three bad people. Try this one:

After living and working in New York City for 12 years, I have met thousands of people and, with very rare exceptions, I have found them to be obnoxious, pushy, and rude. It is obvious that people from New York City have a bad attitude.

This latter argument is something to be taken more seriously given the larger pool from which the observation is drawn.

Anecdotal
Using a personal experience or an isolated example instead of a sound argument or compelling evidence.

The Gambler’s Fallacy

Thinking that there are streaks on independent items, such as dice rolling.
December 2000: S1, Q24

### 4. Correlation vs. Causation

A correlation is a statistical linking between two items that seem to be parallel. One of the GMAT’s “Greatest Hits” you see time and time again is the attempt to link up two separate items that seem to statistically correlate and then establish one of the two as the “cause.”

The relation between a cause and an association is difficult.

1. Heavier people tend to be taller.
2. Weight is correlated with height.
3. Gaining weight will make you taller.

This argument assumes a relationship between correlated data and thus concludes that by changing one element, you can change the other.

Another obvious one:

1. More fire trucks tend to be at more serious fires.
2. We can reduce the severity of fires by reducing the number of fire trucks.

Here is a more challenging example:

1. Young people who watch more TV violence are more likely to engage in violence.
2. The recent increase in TV violence is associated with an increase in violence society-wide.
3. If children would watch less TV, they would be less violent.

### Explanation

This one seems intuitive enough and it’s the “sentimental favorite”, but the reality is that (3) can’t be proven from (1) and/or (2). You can’t assume that just because things correlate you can change one factor and it will automatically change the other. Children who watch large amounts of TV may have inattentive parents, and this may be the underlying hidden causal factor — not watching too much TV violence in itself. This argument could use more evidence, like a study showing that violent children are more successfully rehabilitated by cutting off violent shows.

Example

Studies have shown that men aged 18-27 who have owned a pet for at least 2 years before marrying are 35% less likely to divorce. Researchers conclude that caring for a pet prepares men for long-term, healthy relationships in marriage.

Which of the following, if true, most strengthens the conclusion that men who have owned pets are prepared for healthy marriages?

1. Studies have shown that pet ownership drastically reduces daily stress levels.
2. Many successful marriages are based on emotional investment in a common interest, such as a pet.
3. Many men who have been married for 25 years or more continue to own pets.
4. Men who have not owned pets for at least two years before marrying are more likely to divorce.
5. Men whose wives owned a pet for at least two years are equally as unlikely to divorce.

### Explanation

Situation: Researchers have concluded that men who have owned a pet for at least 2 years are prepared for healthy marriages.

Reasoning: Which option most strengthens the conclusion? Researchers base their conclusion on an assumed connection between sustained care for a pet and care for a spouse. Men who care for pets before marriage, the argument runs, are also statistically more likely to sustain marriage relationships. The problem is that correlation doesn’t prove causality so that link alone is not enough.

1. While this may be true, it does not introduce additional evidence to support the conclusion.
2. This option does not address the question of why men who own pets are less likely to divorce.
3. The question concerns men who have owned pets before marrying, not after.
4. Correct. This option provides additional evidence of a causal correlation between pet ownership and the likelihood of divorce.
5. The question concerns men, not their wives.

### 5. Confounding Factors

A confounding factor (also called “lurking variables”) is an additional factor that may be responsible for a correlation. “Con” is a Latin root for “with”, so “confounding” means literally to “found with.”

Does marriage cause happiness because it is correlated with it?

Or, people who get married are already happy (a confounding factor). Confounding factors weaken the causal relationships of correlations.

Example 1: The Miracle Hospital

A sports injury treatment center in New York has the lowest rate of recovery for sports injuries. A treatment center in rural Pennsylvania has the highest and quickest recovery rate.

If you have just been severely injured while playing softball, should you go to Pennsylvania?

### Explanation Example 1

In this example, it appears pretty obvious that this hospital in New York is bad for your health. So you follow the statistics and go to Pennsylvania, right? The treatment center in New York is an option of last resort for serious sports injury patients like you. The Pennsylvania hospital is so poor that no one with a serious injury ever goes there. The hospital’s patients consist of those with minor injuries who recover quickly. The hidden confounding factor in this argument is that people with more severe injuries are choosing to go to New York, meaning that looking at their injuries is a biased sample.

Example 2: The Secret Conspiracy Against Men

At the Apex Institute of Technology, the school had a much lower acceptance rate for men than for women, and administrators could not determine why since the male applicants had higher SAT scores and grades.

Are the lower admissions rates of men a result of systematic bias?

### Explanation Example 2

Looking at the information, it appears that someone in the admissions department doesn’t like men and has been secretly rejecting their applications.

When looking more carefully at the data, men were much more likely to apply to the highly-competitive engineering program. The result was that men had lower rates of admission overall at Apex Institute of Technology. In non-engineering programs, however, the acceptance rates were identical. So gender played no direct role in admissions rates; the factor was the major chosen by the applicants.