Quote:
Originally Posted by Cali Panthers Fan
Well I don't agree that there are far too many outliers, but that's me. You're free to find fault if you require more data points, that's your prerogative. Even if there are outliers, they usually correct themselves by next season if the same pattern exists. You don't see teams that are continuously in the outlier area, so that suggests that over small sample sizes (and I would qualify an entire season as a small sample) that you can have teams that buck the trend.
|
Let's analyze this for a second. What are outliers? Teams that have bad corsi but win a lot (let's call this type A), and teams that have good corsi but don't win a lot (type B).
What are "not outliers"? Teams that have good corsi and win a lot (type C) and teams that have bad corsi and lose a lot (type D)
So when you're saying that these outliers "usually correct themselves IF the same pattern exists", what you're saying is that "outliers" will "usually" become "not outliers", or in other words:
- Type A teams become C or D
- Type B teams will become C or D. (For example: Oilers will either start doing better or their corsi numbers will go down.)
Well, which is it?
I think we're already starting to see the problem with this claim; when you say "if they keep having this corsi they'll get better results",
you're not actually predicting the thing that's interesting; is this team likely to get better within an interesting timeframe?
But this isn't the end of it, because
even if we assume that shots towards the net have absolutely nothing to do with winning hockey games (or: corsi and winning are completely unrelated),
the claim "outliers usually correct themselves" would still be statistically true simply because that's how odds work. (And thus it's not a sign of predictive power.)
For example; if we assume that a team has a 50/50 chance of having a good season and a 50/50 chance of having good corsi, a team that had a bad season with good corsi has about 75% chance of EITHER having a good season OR having a bad corsi next season.
Also, let's remember that
a good statistical analysis usually starts with eliminating outliers from the data and
not by only looking at the outliers, such as the no good Oilers or any cup winner.
The worse a team is, the more likely it is to do better simply because it's not likely to do worse (well this of course doesn't really apply to the Oilers, but you know what I mean). Also if we assume that there is some correlation between shots and winning (which there probably is), then the more extreme an outlier you pick as your example (for example a cup winner), the more likely it is that even weak and generally irrelevant correlations will start showing up.
If you want to use outliers to prove your point, you would really need to show that EVERY outlier CLEARLY shows what you're talking about. Vice versa, if even the extreme outliers (the best and worst teams in the league) fail to lineup with your hypothesis, then your hypothesis isn't doing very well.
Here's a practical exercise;
pick your next playoff brackets simply with corsi stats. If there's a notable correlation, you should have good results. I've done this, and I didn't even get half the first round winners right.
Which brings us to another problem that regularly shows up in pro-corsi argumentation;
using past corsi to "predict" past results. First of all this habit is extremely prone to cherrypicking (or in other words; cognitive bias). Second, this isn't predicting anything. In other words: if you say that "cup winner C had good corsi",
this is not a prediction. It's not in any way interesting. Past cup winners are likely to have all sorts of good numbers. That doesn't say anything about their predictive power for the next season.
If you assume that corsi and winning have some correlation, then of course you will have a situation where teams that did well during a certain stretch also had a good corsi during that stretch. This is not interesting information. It's like saying that there's often a lot of water on the ground when it rains.
If you say that corsi is a good predictor of future success, you have to use past corsi to predict
future success. For example, by making that a playoff bracket based on how the corsi, or at the very least developing a system which is very good at predicting future results where corsi is a significant part of the equation.