One of the things that I notice about this generation is the mass fragmentation that happens. When I was a child or a teenager, we did not have the internet. Therefore, we got our data from common sources. You could not go to Pandora, therefore, you listened to a few radio stations that were available. This means that groups would be divided into maybe 10 major chunks, and at school you would find the group that listened to the same station.

The same was true for television. There were very few channels, and cable was just starting. This means that everybody would pretty much watch the same type of shows and have a common base of knowledge and ability to relate.

This gave everybody a common thread or base of understanding. This means that there is less fracturing, and, in many ways, more heroes that everybody knows.

What happens if you were in a niche? Even the niches were much more common. If you were into science, you did not have many different sources. Because my father was an engineer, we had a subscription to Popular Science. Although my father was not much of a physicist, from time to time, he would subscribe to Scientific American. Scientific American was designed to take the difficult and theoretical subjects of the day, and create article that could be read by those that may not want to wade through the math, but did want an understanding of what these new theories might say. It was never simple, but it was never out of reach.

When the magazine came, I would often open it, flip through it, then find myself on the back cover of the magazine right by the section written by Martin Gardner, the person in the black and white photo above. All of the nerds and science geeks new Gardner, due to his playful column in Scientific American. He was known as being a magician, which attracted him to me because I was heavily into illusions and magic as a child. Much of what he liked to do was show readers how they could play around with numbers.

What does this have to do with conditional probability or Thomas Bayes? Gardner famously created (or told) the two child problem, which we want to go through today in this post. If you can get a sense of the two child problem, you will start to understand why probability is so hard to grasp.

We are going to tell the problem in a slightly different form to prevent some of the problems (which Gardner admits to) in his first writing.

Here is the problem: You are are an accountant that works in megacorporation. Megacorporation asks you and your computer operator mysterious questions that you must answer. Many times you don’t know why they ask you the questions, you only know that they ask you in mysterious ways. Because you are secretly trying to find the power behind Megacorporation for a book your are working on, you are happy for your job as you gather secret notes.

One day, a command comes down from on high and it asks you to have you pull all the demographic data for a town called Childville. In particular, it wants you to pull the data for all families that have two children where at least one of the children is a girl. Your computer operator says that it will take a minute to run the data, and he’d like to go to lunch. You want to agree to go to lunch with him, and you agree that you can take a look at the data when you get back.

When you get back, the query has stopped running. You walk over to the data base, but before you can look at the data, you get a phone call A mysterious person on the other end of the phone says, “Of the new data set, what is the chance that both children will be girls?”

What will your answer be?

What is the answer?

If you have one in your mind, you probably have the wrong one. If you remember the dice problem from the last blog post, you might think of the answer as a real life example. Now here is where the beauty of Bayes comes in. We know that it is very tricky for us to figure out what the percentage chance of the second child being a girl if we have the solution set where one of the children is a girl. However, we can easily see that if both children are girls, then the second child will obviously be a girl.

Let’s say that Mr. Jones is walking down the street. There are two children that he is pushing in an extended baby carriage. Mr. Jones says to you, “My first child is a boy, what is the chance that the second child is a boy?”

Since you know the first child is a boy, you know that the chance of the second child being a boy is right around 50% (and for purposes of this post, let’s say that boy and girls are always born at the 50% level). If you didn’t know that the first child was a boy, the chances of having two boys would be 25%. As soon as you know the first child, however, then it becomes a coin flip to get the second child.

Therefore, it is very tempting to tell the mysterious voice on the telephone that the answer is 50%, because you have pulled all of the pairs of children with one girl. This must mean the second child has a 50% chance of being a girl also. Right?

The answer is wrong.

This feels very wrong, and Gardner decided to point this out. To understand why it is wrong, the only way to quickly to see the answer is to simply write out the possible solution set, similar to the solution set we showed in the last post on dice.

Luckily for you, this problem is so well known that it is on Wikipedia. Here is the way to think about Mr. Jones. You start by listing out the solution set of all the children.

First Child | Second Child |

Girl | Girl |

Girl | Boy |

Boy | Girl |

Boy | Boy |

With Mr. Jones, you know that the first child is a girl. Therefore, you get to knock out the the cells where the first child is a boy.

First Child | Second Child |

Girl | Girl |

Girl | Boy |

Thus there are only two options for the second child, boy or girl.

However, the computer data set was picked differently. It sorted through all the families that had two children. So the beginning solution set looks the same as the Mr. Jones entry solution set.

First Child | Second Child |

Girl | Girl |

Girl | Boy |

Boy | Girl |

Boy | Boy |

However, the computer doesn’t know about first child versus second child. Remember the data that you pulled is for all families where one of the children is a girl. This means that you solution set will look like the following:

First Child | Second Child |

Girl | Girl |

Girl | Boy |

Boy | Girl |

What you have done is only removed the families where they had two boys. In the first case, you removed not only the case where the families had two boys, but the very moment you knew that the first child was a girl, you removed the case where the first case was a boy.

This should make you feel uncomfortable. In our brains, we should say, “wait a minute, in both cases we find out that one of the children is a girl. Why should knowing the position of which one of them is a girl change everything? This is true of everybody. It is well documented that this is a cognitive hole in our brains.

If you look at the charts above, the more you think about it, should yield an “Ah hah!” moment. If you think about it for a while, you’ll start to understand that the two questions filtered the original set in two different ways.

In the computer situation, we only filtered out the boy-boy pairs. In the Mr. Jones situation, we filtered out all of the answers where the first child was a boy. Although they look very similar, in reality, the data has been filtered in two very different ways.

This filtering of data is what Bayesian Statistics is all about. Because we often make mistakes in probability, by using Bayesian tools, we can formally solve these filtering problems.

This filtering of probability is called conditional probability. Basically, it means that something happens to cause the data to be filtered one way or the other. Once you have filtered the data, you now have a new solution set that you can pick from. The normal way of writing this conditional probability is simply Probability of the event A accounting for event B, or simply P( A | B ). Bayes said that this probability is derived by the following, which Wikipedia nicely draws as the following:

With some work, we can create a different form of Bayes theorem that is as follows:

We can set up two cases:

The P(A) = both children are girls = 1:4

The P(B) = one of the children is a girl = 3:4

If both children are girls, then the chance that the second child is a girl is 100% or 1.

In Bayes equation, this is shown by P(B|A) = 100% or 1.

Now we can fill in our equation: P(A|B) which means “the chance that the both children are girls if one child is a girl” is equal to the following:

= 1 (the chance that your second child is a girl if both are a girl) * 1/4 (the chance that both are girls) divided by 3/4 (the chance that one of the children is a girl) = 1/3.

You should be able to see the advantage of Bayes. You can take any probabilities, and with a little work, you can quickly find an equation that allows you to figure out the conditional probability. However, from an intuitive standpoint, I like the filtering idea because it helps us to understand what is happening.

Leonard Mlodinow wrote the drunkard’s walk to describe the history and the subjects that we discussed in this blog post. I would recommend this book, and he not only gives examples of probability gone wrong, but also a fascinating history of this subject. In one case, he describes being given a blood test for his life insurance, and when the test came back as hi having HIV, he was told that the test was 99.9% accurate, and he was sure to die.

It turns out that the test is 99.9% accurate for a given community (homosexuals) because the incidences of HIV is very common. It turns out that for other communities, white hetrosexual men in married relationships, the incidence of HIV is exceedingly low.

This is a common error in testing, and well worth spending a moment on.

If you have a test that is 99.9% accurate, it sound like it is perfect. Anybody that gets a positive on the test is going to be infected, right? The answer is that you don’t even have the ability to answer the question. What does 99.9% accurate mean?

You need to know the time that somebody has a disease and it finds it, and the times that it wrongly identifies a disease, even though the person doesn’t have the disease. Finding a disease when none exists is often called “overkill” because it says that something is dead that isn’t. For instance, a test may find the disease 99.9% time, but it correctly overkills 50% of the time.

In this case for HIV, if you gave the test to 1000 people, it would incorrectly “overkill” one person. So now, all you need to know is what is the base rate of the population. (Base rate is an extremely important cognitive thing to understand and Daniel Kahneman talks about it a lot in Thinking Fast and Slow, and it can also be called “unconditional probability.) If the base rate of the population is 2 sick people in every thousand, it should identify the sick people 99.9% of the time, and it will identify a non-sick person as HIV positive 1 time. This means that if you got identified as having HIV in this scenario, your chance of being sick is 1 in 3. This is much better than 1 in 1000.

The question is given the probability of A, what is the probability of B. Mlodinow (as well as Taleb in Fooled by Randomness) points out that Alan Dershowitz in the OJ Simpson trial used the fact that only 1 in 2500 wives that are abused are murdered by their husbands. This sounds like a very unlikely event, therefore somebody may say “well then there is very little chance that OJ did the crime.” The problem with this statement is that because an event is extremely unlikely doesn’t mean that an event hasn’t happened. Therefore, when an extremely unlikely event happens, the first thing you don’t say is “that event did not happen because it is very unlikely.”

No, in unlikely events, the first thing you say is “an unlikely event happened, how did it happen?” In the case of women that have been abused, 90% of the death caused by murder were committed by the abusing husband. In other words, the right question is “given the fact that 90% of abused women that are murdered are murdered by their abusing husband, what is the chance that OJ Simpson did the murder.” This means that the base rate in this case is 90% pointing toward the ex-husband. Base rates are extremely important in real world use of conditional probability.

The incorrect and fallacious use of base rates is covered in the Wikipedia article of the same name. I can heartily suggest reading the article as it should shape your life.

Conditional probability is something that our minds don’t deal with well. To be a critical thinker, you will need to understand that our brain doesn’t deal well with probability. If you have the advantages of mathematical leanings, I would strongly suggest spending some time with Bayes. However, the math is very tough.

To get around this, Gigerenzer and Hoffrage published a marvelous little paper in 1995 in Psychological Review. Rather than show the results as probabilities, you can take all this math and put it in a natural frequency table. This is shown below, and it should make easy sense on why this pulls the confusion out. Gigerenzer goes this all of the following in this website.

I will conclude this post with a short story. I am good friends with a person that is very good with math that has a major Ph.D, and he has spent time publishing articles using some of the mathematical extensions of Bayes work. He is very sharp and a person that I will have conversation with on this. I can only do a rudimentary level of stats with Bayes Theorem, but I think I understand the principles behind Bayes and how to apply it to real life. Much of what I know is down in this posting. He, on the other hand, had overlooked the practical application of Bayes theorem in a recent conversation. He finally came back and admitted that he had an oversight about the use of Bayes theorem in practical manners.

The moral is that is knowing the math and being extremely bright still leads to misses, you and I don’t always have the ability to pick this up. The key to remember that in unlikely scenarios, you need to be very careful. Or, as my nephew once said that my brother-in-law, that does research in physics, would tell him as a child, “You need to cherish your anomalies”

Cherish and be careful in understanding these results.