Wednesday, March 31, 2010

Tom Izzo: Is he the real March Magician?



This past week much as been written and talked about Tom Izzo at Michigan State as being the March Magician. The Spartans and Tom Izzo have delivered 6 Final Fours out of the last 13 straight NCAA appearances. Tom Izzo has been head coach at MSU now for only 15 years. The real question I want to answer is how unlikely is this success.

According to Nate Silver at FiveThirtyEight.com, Tom Izzo should have succeeded only twice to the Final Four according to seedings. Nate presents a very interesting logistic regression that ties the seeds to the probability of reaching the final four. According to the seedings the Spartans have over performed. This making the accomplishment of 6 final fours in the last 13 years to be an unusual event.

I did some analysis myself and come to some slightly differing conclusions:

First, you might think going to the NCAA tournament 13 of 15 years is a difficult accomplishment. However with 347 teams in the NCAA Div I someone will due this just by chance and here is why. The selection process gives 30 bids to conference winners and 34 bids at large split among the 8 major conferences. This gives the Big Ten conference about 5.25 bids each year on average. For any given Coach the odds are against you, but remember in the Big Ten there are 11 coaches playing the game and in the 8 conferences about 80 coaches competing against each other. As such the probability that one of these coaches makes 13 out of 15 years is about 0.5102 probability. This is about 1:1 odds. In the end its a good likelihood that somebody does 13 out of 15 years in any 30 year time period.

Now the problem is conditional. Once you are the one who makes a tournament 13/15 years what are your chances of making the final four for 6 or more occasions. It is important to know that of the 347 schools, only 43 of these have made the final four in the past 26 years. Since there are four final four selections each year, then on average these 43 schools have had multiple final four appearances. (Following list is shortened.- Data since 1985)

The average probability of making the final four of the schools from this select group of 43 is approximately 0.205. Now using the binomial distribution function again in Excel for 6 or more final four appearances in 13 trials, the probability of this occurrence is 0.0317.

Since we have two independent events 1) making the NCAA 13/15 years = 0.5102 and 2) winning 6 of 13 final four appearances 0.0317, we multiply the two to obtain our final probability for any coach matching Izzo's accomplishment.

Prob(total) = 0.5102*0.0317=0.0162
Odds = 100/1.62-1 = 60.7:1
(Recognize my figures are not a result of detailed analysis but an estimate. )

In conclusion, Tom Izzo did not really accomplish a very unusual event. Only about 1 in 61. Interpreting this it tells us that given the 15 year career of 60-70 coaches this accomplishment is bound to happen to one of them. A good start to Izzo head coaching career. In fact, certainly this is a lifetime career start for any coach. However it is within the realm of feasibility that it happened. After all 1 in 61 events do occur. If you look at the average frequency that Duke or North Carolina convert to the final four, MSU still lags and is not the best performer by this metric.

So is Tom Izzo the March Magician. Of course he is! Certainly, not only does he benefit from some luck, during March he finds ways to get more out of his team. MSU does still have Magic! Good Luck on Saturday against Butler.

Friday, March 26, 2010

User Review: Using Google Analytics

It's been a year now since I started using Google Analytics to track the blog and I am impressed with the quality and level of information obtained.

In the past year "The Improvement Guru" Blog has:
  • Received visits from 2,335 Unique Visitors
  • Visits from 83 countries and territories across the globe
  • Visits from all 50 states within the United States
  • A rate of 33% for returning visitors - Loyalty is increasing.
  • And an average time on site of 1 minute and 31 seconds.
For the most part referrals to the blog come from Google, Twitter, Blogger, LinkedIn, Bing, Digg and Facebook in order. I have only recently started using Facebook so it's not a surprise that this is low on the list. I was really surprised to see Technorati 19th on the list, I was expecting more from the site dedicated to cataloging Bloggers.

As for content, clearly "Best Numbers for that SuperBowl Pool" was the most popular post of the year with 514 page views. Not surprising the Superbowl is a popular event. In second place was the post on "Wendy's vs McDonalds" Probably these get googled quite often. User Reviews of Software on Risk Engines and Berkeley Madonna Software were also 3 and 4 respectively.

Overall, Google Analytics is easy to use. After sign up you place some tracking HTML code to your blog. Google does the rest. Reports are easy to manage and review. There are good print report features as well. They will benchmark your site relative to similar web sites. Also a new feature I have yet to set in play is Intelligence and Alerts about about usage.

If you like statistics and have an interest in finding out how your website compares to others, Google Analytics is a great FREE tool that can be used to obtain some primary data on traffic and content on your website.

Sunday, March 21, 2010

About False Positives. Do you have XYZ disease?

We spend a good portion of the Research Statistics classes discussing Type I and Type II errors. Type I error is also known as a False Positive. If you go to the doctor and are tested for a rare disease and the results are positive, does this necessarily mean you have the disease. Sometimes the tests are in error and will give a positive answer when they should give a negative result.

Here is a brain teaser I offer to my classes:

"Let's discuss a situation that can occur in real life in the medical field. Let's assume you go in for a test for a certain disease and the test returns positive. It could be cancer, HIV or other things. Suppose the lab test has a false positive (type I) rate probability of 0.05 or 5% of the time it returns a false positive. Now considering this the incidence of the disease in the general population is 0.01 or 1% or about 1 in 100. Can you calculate the probability that you actually have the disease? Incidentally, most Doctors are not good with probabilities and are likely to give you the wrong information in this situation."

Most people will see the false positive rate as 5% and assume there is only a 5% chance the test was wrong. Doctors will often advise you there is 95% certainty you have the disease. They will get it wrong also.

The answer is best seen though the use of a Venn Diagram:

Let's assume 100 people are tested for a certain disease - XYZ.
  • Knowing we have a 5% false positive rate we will have we will have 5 people test positive but these will not have the disease.
  • Knowing the incidence rate is 1 in 100 or 0.1% we will have 1 person that tests positive that actually has the disease.
  • 94 people will test negative for XYZ


To answer the question about the probability of actually having the disease it is realtively easy to calculate. 6 people tested positive and 1 person had the disease. The probability is 1/6 or about 16.7% chance of having the disease.

The conclusion is that whenever we test positive for a medical issue it is important to know three things.

  1. What is the false positive rate for the test.
  2. What is the incidence of the disease in the general population.
  3. If I am not in the high risk group what is the incidence of the disease outside of the high risk group.
It's an interesting problem for students to think about especially when studying probabilities and statistics.