Sunday, August 24, 2014

“Mind and Spirit”–> Conditional Probability Part 2

One of the things that I notice about this generation is the mass fragmentation that happens.  When I was a child or a teenager, we did not have the internet.  Therefore, we got our data from common sources.  You could not go to Pandora, therefore, you listened to a few radio stations that were available.  This means that groups would be divided into maybe 10 major chunks, and at school you would find the group that listened to the same station.

The same was true for television.  There were very few channels, and cable was just starting.  This means that everybody would pretty much watch the same type of shows and have a common base of knowledge and ability to relate.

This gave everybody a common thread or base of understanding.  This means that there is less fracturing, and, in many ways, more heroes that everybody knows. 

What happens if you were in a niche?  Even the niches were much more common.  If you were into science, you did not have many different sources.  Because my father was an engineer, we had a subscription to Popular Science.  Although my father was not much of a physicist, from time to time, he would subscribe to Scientific American.  Scientific American was designed to take the difficult and theoretical subjects of the day, and create article that could be read by those that may not want to wade through the math, but did want an understanding of what these new theories might say.  It was never simple, but it was never out of reach.

When the magazine came, I would often open it, flip through it, then find myself on the back cover of the magazine right by the section written by Martin Gardner, the person in the black and white photo above.  All of the nerds and science geeks new Gardner, due to his playful column in Scientific American.  He was known as being a magician, which attracted him to me because I was heavily into illusions and magic as a child.  Much of what he liked to do was show readers how they could play around with numbers. 

What does this have to do with conditional probability or Thomas Bayes?  Gardner famously created (or told) the two child problem, which we want to go through today in this post.  If you can get a sense of the two child problem, you will start to understand why probability is so hard to grasp.

We are going to tell the problem in a slightly different form to prevent some of the problems (which Gardner admits to) in his first writing.

Here is the problem:  You are are an accountant that works in megacorporation.  Megacorporation asks you and your computer operator mysterious questions that you must answer.  Many times you don’t know why they ask you the questions, you only know that they ask you in mysterious ways.  Because you are secretly trying to find the power behind Megacorporation for a book your are working on, you are happy for your job as you gather secret notes.

One day, a command comes down from on high and it asks you to have you pull all the demographic data for a town called Childville.  In particular, it wants you to pull the data for all families that have two children where at least one of the children is a girl.  Your computer operator says that it will take a minute to run the data, and he’d like to go to lunch. You want to agree to go to lunch with him, and you agree that you can take a look at the data when you get back.

When you get back, the query has stopped running.  You walk over to the data base, but before you can look at the data, you get a phone call  A mysterious person on the other end of the phone says, “Of the new data set, what is the chance that both children will be girls?”

What will your answer be?

What is the answer?

If you have one in your mind, you probably have the wrong one.  If you remember the dice problem from the last blog post, you might think of the answer as a real life example. Now here is where the beauty of Bayes comes in. We know that it is very tricky for us to figure out what the percentage chance of the second child being a girl if we have the solution set where one of the children is a girl. However, we can easily see that if both children are girls, then the second child will obviously be a girl.

Let’s say that Mr. Jones is walking down the street.  There are two children that he is pushing in an extended baby carriage.  Mr. Jones says to you, “My first child is a boy, what is the chance that the second child is a boy?”

Since you know the first child is a boy, you know that the chance of the second child being a boy is right around 50% (and for purposes of this post, let’s say that boy and girls are always born at the 50% level).  If you didn’t know that the first child was a boy, the chances of having two boys would be 25%.  As soon as you know the first child, however, then it becomes a coin flip to get the second child.

Therefore, it is very tempting to tell the mysterious voice on the telephone that the answer is 50%, because you have pulled all of the pairs of children with one girl.  This must mean the second child has a 50% chance of being a girl also.  Right?

The answer is wrong.

This feels very wrong, and Gardner decided to point this out.   To understand why it is wrong, the only way to quickly to see the answer is to simply write out the possible solution set, similar to the solution set we showed in the last post on dice. 

Luckily for you, this problem is so well known that it is on Wikipedia.  Here is the way to think about Mr. Jones.  You start by listing out the solution set of all the children.

First Child Second Child
Girl Girl
Girl Boy
Boy Girl
Boy Boy

With Mr. Jones, you know that the first child is a girl.  Therefore, you get to knock out the the cells where the first child is a boy.

First Child Second Child
Girl Girl
Girl Boy
Boy Girl
Boy Boy

Thus there are only two options for the second child, boy or girl.

However, the computer data set was picked differently.  It sorted through all the families that had two children.  So the beginning solution set looks the same as the Mr. Jones entry solution set.

First Child Second Child
Girl Girl
Girl Boy
Boy Girl
Boy Boy

However, the computer doesn’t know about first child versus second child.  Remember the data that you pulled is for all families where one of the children is a girl.  This means that you solution set will look like the following:

First Child Second Child
Girl Girl
Girl Boy
Boy Girl
Boy Boy

What you have done is only removed the families where they had two boys.  In the first case, you removed not only the case where the families had two boys, but the very moment you knew that the first child was a girl, you removed the case where the first case was a boy.

This should make you feel uncomfortable.  In our brains, we should say, “wait a minute, in both cases we find out that one of the children is a girl.  Why should knowing the position of which one of them is a girl change everything?  This is true of everybody.  It is well documented that this is a cognitive hole in our brains. 

If you look at the charts above, the more you think about it, should yield an “Ah hah!” moment.  If you think about it for a while, you’ll start to understand that the two questions filtered the original set in two different ways. 

In the computer situation, we only filtered out the boy-boy pairs.  In the Mr. Jones situation, we filtered out all of the answers where the first child was a boy.  Although they look very similar, in reality, the data has been filtered in two very different ways. 

This filtering of data is what Bayesian Statistics is all about.  Because we often make mistakes in probability, by using Bayesian tools, we can formally solve these filtering problems.

This filtering of probability is called conditional probability.  Basically, it means that something happens to cause the data to be filtered one way or the other.  Once you have filtered the data, you now have a new solution set that you can pick from.  The normal way of writing this conditional probability is simply Probability of the event A accounting for event B, or simply P( A | B ).  Bayes said that this probability is derived by the following, which Wikipedia nicely draws as the following:

P(A|B) = \frac{P(B | A)\, P(A)}{P(B)}\cdot

With some work, we can create a different form of Bayes theorem that is as follows:

P(A|B) = \frac{P(B|A)\,P(A)}{ P(B|A) P(A) + P(B|\neg A) P(\neg A)}\cdot

We can set up two cases:

The P(A) = both children are girls = 1:4

The P(B) = one of the children is a girl = 3:4

If both children are girls, then the chance that the second child is a girl is 100% or 1.

In Bayes equation, this is shown by P(B|A) = 100% or 1.

Now we can fill in our equation:  P(A|B) which means “the chance that the both children are girls if one child is a girl” is equal to the following:

= 1 (the chance that your second child is a girl if both are a girl) * 1/4 (the chance that both are girls) divided by 3/4 (the chance that one of the children is a girl) = 1/3.

You should be able to see the advantage of Bayes.  You can take any probabilities, and with a little work, you can quickly find an equation that allows you to figure out the conditional probability.  However, from an intuitive standpoint, I like the filtering idea because it helps us to understand what is happening.

Leonard Mlodinow wrote the drunkard’s walk to describe the history and the subjects that we discussed in this blog post.  I would recommend this book, and he not only gives examples of probability gone wrong, but also a fascinating history of this subject.  In one case, he describes being given a blood test for his life insurance, and when the test came back as hi having HIV, he was told that the test was 99.9% accurate, and he was sure to die.

It turns out that the test is 99.9% accurate for a given community (homosexuals) because the incidences of HIV is very common.  It turns out that for other communities, white hetrosexual men in married relationships, the incidence of HIV is exceedingly low.

This is a common error in testing, and well worth spending a moment on.

If you have a test that is 99.9% accurate, it sound like it is perfect.  Anybody that gets a positive on the test is going to be infected, right?  The answer is that you don’t even have the ability to answer the question.  What does 99.9% accurate mean?

You need to know the time that somebody has a disease and it finds it, and the times that it wrongly identifies a disease, even though the person doesn’t have the disease. Finding a disease when none exists is often called “overkill” because it says that something is dead that isn’t.  For instance, a test may find the disease 99.9% time, but it correctly overkills 50% of the time. 

In this case for HIV, if you gave the test to 1000 people, it would incorrectly “overkill” one person.  So now, all you need to know is what is the base rate of the population.  (Base rate is an extremely important cognitive thing to understand and Daniel Kahneman talks about it a lot in Thinking Fast and Slow, and it can also be called “unconditional probability.)  If the base rate of the population is 2 sick people in every thousand, it should identify the sick people 99.9% of the time, and it will identify a non-sick person as HIV positive 1 time.  This means that if you got identified as having HIV in this scenario, your chance of being sick is 1 in 3.  This is much better than 1 in 1000.

The question is given the probability of A, what is the probability of B.  Mlodinow (as well as Taleb in Fooled by Randomness) points out that  Alan Dershowitz in the OJ Simpson trial used the fact that only 1 in 2500 wives that are abused are murdered by their husbands.  This sounds like a very unlikely event, therefore somebody may say “well then there is very little chance that OJ did the crime.”  The problem with this statement is that because an event is extremely unlikely doesn’t mean that an event hasn’t happened.  Therefore, when an extremely unlikely event happens, the first thing you don’t say is “that event did not happen because it is very unlikely.”

No, in unlikely events, the first thing you say is “an unlikely event happened, how did it happen?”  In the case of women that have been abused, 90% of the death caused by murder were committed by the abusing husband. In other words, the right question is “given the fact that 90% of abused women that are murdered are murdered by their abusing husband, what is the chance that OJ Simpson did the murder.”  This means that the base rate in this case is 90% pointing toward the ex-husband.  Base rates are extremely important in real world use of conditional probability.

The incorrect and fallacious use of base rates is covered in the Wikipedia article of the same name.  I can heartily suggest reading the article as it should shape your life.

Conditional probability is something that our minds don’t deal with well.  To be a critical thinker, you will need to understand that our brain doesn’t deal well with probability.  If you have the advantages of mathematical leanings, I would strongly suggest spending some time with Bayes.  However, the math is very tough.

To get around this, Gigerenzer and Hoffrage published a marvelous little paper in 1995 in Psychological Review.  Rather than show the results as probabilities, you can take all this math and put it in a natural frequency table.  This is shown below, and it should make easy sense on why this pulls the confusion out.  Gigerenzer goes this all of the following in this website.

ggspyhand08.gif (10940 Byte)

I will conclude this post with a short story.  I am good  friends with a person that is very good with math that has a major Ph.D, and he has spent time publishing articles using some of the mathematical extensions of Bayes work.  He is very sharp and a person that I will have conversation with on this.  I can only do a rudimentary level of stats with Bayes Theorem, but I think I understand the principles behind Bayes and how to apply it to real life.  Much of what I know is down in this posting.  He, on the other hand, had overlooked the practical application of Bayes theorem in a recent conversation.  He finally came back and admitted that he had an oversight about the use of Bayes theorem in practical manners.

The moral is that is knowing the math and being extremely bright still leads to misses, you and I don’t always have the ability to pick this up.  The key to remember that in unlikely scenarios, you need to be very careful.  Or, as my nephew once said that my brother-in-law, that does research in physics, would tell him as a child, “You need to cherish your anomalies” 

Cherish and be careful in understanding these results.

Sunday, August 17, 2014

“Mind and Spirit”–> Conditional Probability Part I

Thomas Bayes.gif

As much as I might wish it, this is not a picture of Thomas Bayes, although it is purposed to be one.  This picture comes from Wikipedia, which reports that the best guess is that this picture was created for a 1936 book called “History of Life Insurance” as a placeholder for somebody to think about Thomas Bayes.

As far as we know, there are no known pictures of this famous clergyman, but his work has inspired a great body of learning.  One that extends the previous subjects that we have discussed about when covering Type 1 and Type 2 thinking or Nassim Nicholas Taleb’s writings.

Born in 1701, Bayes was a son of a clergyman to a family of nonconformists.  In his day, to be a nonconformist meant that you did not necessarily buy off on the idea of the Church of England.  In today’s vernacular, many of the Churches that came out of this movement are known as the Free Churches, which is to say that they believe that the Church should be separated from the control of the state.  The history of being Protestant is the history of fracture.  Rather than being a hindrance, this movement can result in new discovery and stronger thoughts.  I would like to think that Bayes belief in our Lord Jesus Christ and his familiarity with nonconformist thought generated what I consider one of the greatest branches of mathematics.  Bayes gave the world an understanding of how to deal with randomness and statistics.

While Bayes published a one mathematical publication in his life time, what the world now calls “Bayesian Statistics” was left in a document that was willed to a friend, Richard Price, along with 100 pounds at Bayes death.  Price, being a man of good intent, made sure that his paper was first read to the Royal Society in London, then published.  In this paper, An Essay of Solving A Problem in The Doctrine of Chance, Bayes attacks how to understand conditional probability.

If you have taken any probability courses, one of the first things that you are taught is the idea that a random event does not care about what happened before it.  Almost all probability theory will point out that if you have flipped a coin and heads have come up 5 times, the next time you flip a coin, the coin still has a 50/50 chance that it will come up heads.  This fact disturbs most of us, and it should to some extend.  The reason is that in the long run, eventually we know that the coin should come up around 50 percent of the time as heads and 50 percent of the time as tails.  Therefore, we feel that the coin should somehow be “building up” a tails after it has been flipped 5 times.  Our intuition says, “there must be some type of a mechanism to now start pushing the coin back to the tails side.

Now in the long term of flipping, we do expect that the results will be very close to 50 percent tails and 50 percent heads, and this long term trend is called regression to the mean, which is often not understood correctly.  In the most simple case, this means that any random or even partially random event may have events that deviate from the underlying mean, but eventually the events will center around some type of a mean.  However, the regression to the mean is only accomplished by a random 50 percent chance at every flip of our coin.  This idea that each flip is independent of the last flip means that we will often have streaks where the coin may come up heads many times in a row. 

If you don’t understand this, the deviation from the mean will seem to be a pattern, even though none exists.   Taleb has argued that most most stock pickers get confused between these streaks (which we might call “good luck”) and real skill in picking stocks.  I completely agree with him.  Many things in our life is governed by this streaking nature.  image

However, let us look at the chart above.  This is a chart showing my progress from trying to learn how to golf left handed in 2013.  You can see that I started off very poorly with scores at around 145 strokes per round.  However, in April of that year, I had a round at a course where everything went right, and I shot a 102.  Once I had shot this 102, I could have said, “Oh look, I now know how to shoot a 102, so I’ll shoot this from now on.”  However, because this had a large part of random chance in my game, it look me 36 more rounds, or approximately 200 hours of playing golf, before I could equalize this score  However, if you look at the general trend, which is the solid red line, you can see that my mean results were slowly getting  better.  Games that were above or below the mean simply means that I would eventually return to the underlying mean either up or down.

This graph shows the proportion of heads when one tosses a fair coin from once to 500 times. As the tossing times increases, the proportion of heads becomes stable around 0.5.On a coin flip, we can see the exact same idea.  The graph to the right is from a Penn State course on probability.  As you can see, they started flipping a coin.  During the first 100 flips, the coin appeared to be broken, and during the first 50 flips it was very broken.  Somehow, the coin was coming up heads only 40% of the time.  If you can been betting on heads, you would have lost a lot of money.  However, the coin later reverses itself, and the line trends to the high side, until it gets very close to 50 percent heads.  So, while there is a trend under the data, there is no mechanism under the coin flip to push it one way or the other.  In truth, many random events are extremely lumpy.  In other words, these events seem to push one way for a quite a while.  It is impossible to predict when a truly random event will regress to the mean, and certain trends will stay present for a very long time.

From a theological standpoint, how should we as Christians think about probability? 

There is remarkably little on this in the Bible other than Proverb 16:33:

The lot is cast into the lap,
    but its every decision is from the Lord.

What we do know from the Bible is that God clearly has taken charge of what are random events to make decisions in particular events.  There were several times in the Bible that lots (or dice) were used to determine a path that God’s people should take, and God did not criticize them for using this method.  However, I am going to submit that God allows truly random events to happen.  If you read the scripture above from Proverbs, you can see that the language is specific and unique.  The lot is cast.  The lot’s decision is from the Lord.  It does not say, “The Lord causes the lots to fall in a particular way.”  In other words, the Lord chose a particular number to come up, or the number may be truly random.  In other words, I believe there are two cases:

1. Random events that are not random because God is actually controlling the exact turn of the die or lot.  These can be thought of as miracles, where God intervenes in the machinery that he has set up to break his own physical laws.

2. Random events that are random because God is allowing the random events to happen.

Although beyond the scope of this blog, you cannot not have free will without having randomness.  The two are linked and tied together.  Much of our Christian life is our ability to understand that the world that we live in is random and filled with probabilities.  It is our mission to bring our lives to a place where God breaks down the rule of randomness, and he plays with his own machinery to get the results that glorifies himself most.  Probability exists because it is something that we are called on to embrace and attack through prayer and supplication.

However, let us turn from the side bar on chance, free will and probability to get back on subject.

So after taking multiple courses in stats and probability, you’ll start to understand this idea of “random is random.” If you are very lucky, you will have somebody explain that these random events will regress to a mean, and you will feel that you really understand randomness.

The truth is that we have only gone half way in our journey of understanding probability. 

What we have discussed so far is random independent events.  As we discussed, it is easy to get mislead when thinking about these events.

The reason that these are called independent events is that the first coin flip has nothing to do with the second coin flip.  There is nothing tying these things together.  When two events are truly independent, we say that there is no conditions on the event.

We have talked about coin flips, but lets make it just a little more complicated.  Let’s talk about dice, and more specifically, rolling two dice.  The excellent graph to the left, which I took from the internet, shows the results of rolling a black and white die together, at the same time, and getting results at the same time.  Because each side has exactly 1 out of 6 chances of coming up, we can simply start listing all of the combinations that might come up if we roll the dice together.  There are 36 different combinations that can come up.  The 36 combinations can be made up of the values shown in the graphic.  This graphic is called our “solution space” or “solution set.”  In other words, we know that each dice may come up 1 out of 6 ways, and when we put all of these results together, we can see that there are 36 possible ways of combining the dice.  However, not all numbers are of equal weight. 

Why?  To get a 2 on the dice, you need to roll each die and get it to come up as a 1.  This means that you must have the extremely lucky chance of rolling the white die to come up a 1 (which only happens 1 out of 6 times), then you must have the black die also come up as a 1.  This means, on average, you must roll the dice 36 times because you need to roll the white die 6 time to get a 1, and you need to roll the black die 6 times to get a 1.  Hopefully, it makes intuitive sense that the first series that you roll on the white die of 6 times means that you will probably get a 1, but there is only a 1 out of 6 chance that the black die will also be a 1.  Therefore, we need to roll the dice 36 times to get the double 1s.  (And remember, I am simplifying this because these results assume a regression to the mean, but in the instantaneous case, it may take many times more or less to get our result of 2.)

However, getting a 7 is much easier.  If we look at our solution set, we can see that out of our random rolls, we would expect 7 to come up 6 times in 36 rolls.  The chance of getting a 7 is six times more likely than getting a 2.  In other words, when we roll the dice 36 times, we might expect that the 2 result will come up 1 time but the 7 result will come up 6 times.

Once you go through some basic stats and probability courses, you might heuristically start to recognize short cuts to get to the desired results.  In our dice story, a graduate of a Stats 101 course in college might be able to say, “The chance of getting a 2 is 1/6 *1/6 or 1/36.”  This is correct, but you will probably stop them from quickly calculating what is the chance of getting a 10, because you cannot simply take two outlier conditions that are simple and multiply them together.  Instead, you must write out the solution space, then think to yourself how all the solutions could happen, and what is the chance of how it will come up.

Now imagine that the dice are loaded, or have a slight weighting toward one of the numbers.  This weighting will now influence the randomness.  If they are both weighted toward showing a 1 maybe 10% more often, your results would be 1/36 * 110%.  This would mean that you would need to go back to your solution set, and you would need to adjust all of the solutions up (if they had a 1 in them) or down (if they had no 1 in them).

Here is the kicker.  Life is loaded with loaded dice.  As a matter of fact, random numbers are extremely useful to do things like encode information.  If you are into encryption, you will find out that the most powerful way to encrypt something is to have a list of random numbers that only you and a companion have.  Then you encode your message using only the random number by offshifting the letter by the random number.  If the numbers are truly random, nobody can guess what the underlying letters are.  However, if you use a pair of dice to encode the numbers (which have 7 as much more popular than 2), then somebody simply needs to start using a lot of 7s to figure much of your secret message. 

It turns out that creating truly random numbers is so challenging that many programing languages and/or silicon create “pseudo random” numbers.  These numbers look a lot like random numbers, but if we look at them long enough, we can find some type of pattern in them.

So we come to the end of our blog post on random numbers.  We have found out that predicting even simple random events is very difficult.  Humans don’t have brains that are wired to deal even with simple random events.  Many gamblers have lost a lot of money because they kept thinking “my luck has to change.”  All of us gamble in one form or the other every day.  Having some idea of randomness can help us understand the world that we are in and not take too much (or too little) responsibility for what happens to our lives.

What we have looked at is the chance of two dice rolling a number.  Now we will conclude with what Bayes tried to describe.  What happens when the events are no longer independent. 

Let’s say that we don’t throw the dice at the same time.  Let’s say that we throw one die, then the other.  The chances of throwing both dice and coming up with a 1 is 1/36.  However, let’s say that we throw one die and it comes up as a 1.  What is the chance that the other die will come up as a 1?  It should be obvious that we have taken the odds from 1 out of 36 to 1 out of 6.  Once the first die came up as a 1, we increased our chances to get a 2 out of our die by 6 times. 

This is now conditional probability.  The chance of getting a 2 result has changed dramatically because of the conditions.  Once you have established that you could get a 1 on the white dice, the chance of getting a 2 has become much, much more probable.  This change of probability is understandable in our context, but it turns out that this change of conditions that impact the underlying probability is something that our brains simply don’t process well.  As a matter of fact, this conditional probability issue is something that prone to errors, and an area which I am sure you have made mistakes about in the past.  It is this changes in conditions that we will attack in the next section, and we’ll discuss how to think about it.