08-09-2021, 01:19 PM
|
#301
|
Franchise Player
|
Quote:
Originally Posted by CorsiHockeyLeague
But you just keep saying "I'm not convinced", in different words:
Why not? You just keep saying "you simply can't, it's impossible". Well, these models generally try to do so, and use a lot of data to increase their accuracy. I think everyone acknowledges that it's impossible to be perfect because of factors like the ones I mentioned with Tanev, but they end up being fairly accurate. So if you're going to just say "no, not good enough, you can't do it", the burden is really on you to show that their conclusions aren't useful enough for us to rely on to compare players, even though we acknowledge that there's going to be some error.
I mean, in this thread we have Nurse at about 25% projected WAR while Jones is at 3%. 25% is about 8 times higher than 3%, but that doesn't mean Nurse is 8 times the player. It simply suggests that neither of them is worth their contract and that Nurse might be slightly more likely to be worth his. That's a fair thing to conclude based on the data. We don't need the math to be that precise to use these metrics as a helpful basis to make comparisons and predictions about the future.
|
LOL
The burden of proof is on the model.
And you appear to be accepting it simply because it is there.
|
|
|
08-09-2021, 01:41 PM
|
#302
|
Franchise Player
|
Well, no, I'm accepting it because the guy who created it explained in great detail what is process was, and the testing he performed to try to make it as accurate as possible, leading to a very high R2 value compared to standings points over a number of years... just like the way every other statistical model in every other area where statistics is used does. They're not banging rocks together, there are accepted methods of doing this type of work.
Quote:
The burden of proof is on the model.
|
Well, the model author has explained why he thinks his model is reasonably accurate and predictive. He also has a lot of education in this area that you lack (as far as I know; maybe you have some education in stats that you haven't disclosed so correct me if I'm wrong), and while it's a bit of an appeal to authority I'm always inclined to listen to the expert who has provided a long detailed explanation about why he's right over the layperson who has provided no justification for his beliefs.
So I disagree. I would say that the burden is definitely on you as to why we should believe you.
__________________
"The great promise of the Internet was that more information would automatically yield better decisions. The great disappointment is that more information actually yields more possibilities to confirm what you already believed anyway." - Brian Eno
|
|
|
08-09-2021, 01:48 PM
|
#303
|
Owner
Join Date: Dec 2001
Location: Calgary
|
When data becomes a model you have assumptions. Those create leaps, and with that a challenge from time to time.
Some make leaps that make sense, other make leaps that don't. Heck some make leaps to hide what they don't want to see.
So I can never speak to a guy's model as I'm not in the know on what those leaps are.
But in the case of the underlying data, it's pretty hard to run from what a guy is.
Nurse has terrible simple counting stats (no model required) for shot attempts against, and shots against. Those are simple counts.
He also has terrible summaries for scoring chances against and high danger chances against. These aren't as simple as the definition of both are defined and therefore not just simple counting. If I for example said that a scoring chance would include any time I take a shot from my own zone 150 feet down the ice, I'd have a lot of opposition. If you agree with the definition though you're on board, and once again it's simple counting.
He's also terrible in xGA, which is another step. This is now a model that comes up with an expected total goals against based on what the player gives up on average compared to his peers. This isn't just say JFresh's number as many use it.
So Nurse and his WAR ranking etc can be challenged, it's pretty hard to challenge the simple notion that you want a high paid player to bleed shot attempts, shots, scoring chances and high danger chances against.
|
|
|
The Following 4 Users Say Thank You to Bingo For This Useful Post:
|
|
08-09-2021, 01:52 PM
|
#304
|
Franchise Player
|
Corsi: have you ever built statistical models? Real, largely-scoped, used by the public, models?
Because I have. And most people have no idea how difficult it actually is. I take that back - they are easy to create - but how difficult it is to actually create useful models that do what you want them to do.
Any model will spit out results. And, assuming your inputs and variables are relevant to the subject, the model will spit out results that appear to be doing what the model is intended to do.
However, proving that the results are valid is the hard part.
I am not going to go into an essay on how to stress-test statistical results - there is an internet for that. But I can say this much with complete accuracy and justification: the burden of proof is on the model. And knowing A) that there is really not very much data when it comes to hockey, and B) the data that we have is almost impossible to completely isolate (which is what we really need), the likelihood of obtaining truly valid results for individual player performance that we can accurately and fairly compare against other players' performance is quite low.
And then there is the added problem that, with hockey stats, the vast majority of the users of the data aren't in a position to properly filter and interpret the results.
|
|
|
The Following 12 Users Say Thank You to Enoch Root For This Useful Post:
|
Cali Panthers Fan,
Fighting Banana Slug,
getbak,
jaikorven,
Lanny_McDonald,
Phaneufenstein,
powderjunkie,
Redrum,
Since1984,
the2bears,
TheIronMaiden,
Yobbo
|
08-09-2021, 01:57 PM
|
#305
|
Franchise Player
|
Quote:
Originally Posted by CorsiHockeyLeague
Well, no, I'm accepting it because the guy who created it explained in great detail what is process was, and the testing he performed to try to make it as accurate as possible, leading to a very high R2 value compared to standings points over a number of years... just like the way every other statistical model in every other area where statistics is used does. They're not banging rocks together, there are accepted methods of doing this type of work.
Well, the model author has explained why he thinks his model is reasonably accurate and predictive. He also has a lot of education in this area that you lack (as far as I know; maybe you have some education in stats that you haven't disclosed so correct me if I'm wrong), and while it's a bit of an appeal to authority I'm always inclined to listen to the expert who has provided a long detailed explanation about why he's right over the layperson who has provided no justification for his beliefs.
So I disagree. I would say that the burden is definitely on you as to why we should believe you.
|
We were typing at the same time.
Yes, I have a fair bit of training and experience with statistical modelling and analysis.
And again, no, the burden is always on the model. Even if the author has followed proper modelling techniques (they have), that still doesn't ensure valid results. What we have with hockey stats is 'well, this is the best we can do at this point'. That is not a criticism to those producing them, it is simply the limitations that they face.
|
|
|
The Following User Says Thank You to Oil Stain For This Useful Post:
|
|
08-09-2021, 02:14 PM
|
#307
|
Scoring Winger
|
Building a model, you make assumptions - like has been mentioned here, but if the fundamentals changes, then all of a sudden you get an outlier, and if everyone adopts the new change and it becomes the standard, then the model doesn't predict reality and needs to be changed to try and reflect the new standard. But a model I build, will reflect what I, or those who helped me input the criteria into it thinks a "good player" should be - so my biases are reflected in the model I build. Some players have a few bad habits the would give them a lower score, but they have a few things ("tricks") that erase those bad habits and somehow works for them.... then there are players that do most things perfect, and are predicable, and make one or two bad mistakes that really costs their team... these are the things, the little nuances that make modelling extremely hard.
Patrick Roy is a great example of a player who would break the model at the time. Hasek was a guy a cringed at - but he stopped pucks somehow flailing around on the ice. Obviously Wayne Gretzky and the Oilers back in the 80's changed the game and rules were even changed to compensate for them (4 on 4 play for example).
Models are good as a resource, but you need to watch someone's game if you're going to get those really special (elite) players.... to me, Elite players are somehow better than the sum of their skills (parts).
|
|
|
08-09-2021, 02:24 PM
|
#308
|
Owner
Join Date: Dec 2001
Location: Calgary
|
Quote:
Originally Posted by Oil Stain
Hockey Analytics has always seemed to have a tough time adequately quantifying defenceman performance.
Oilers fans have been in on it from the start, and a few of the earlier pioneers now work in analytics for teams like the Capitals and Devils.
We saw statements like Marc-Andre Bergeron was better than Matt Greene.
Martin Marincin was going to be sorely missed.
Jeff Petry was a great defender and it would be dumb to trade him.
Matt Benning was better than Darnell Nurse.
Now they were right about Petry, but they got a lot of other stuff very wrong. I didn't see any better predictive power when it comes to defencemen than random fans giving their opinions.
Dellow wrote a bit on why he thought the analytics might be boosting d-men that were actually subpar. He observed through watching game tape that some coaches would throw their bottom pairing d-men over the boards against elite opposition when the puck was going towards the opponents end of the ice, and it was later in the shifts for the enemy elites.
So now these bottom pairing defenders boosted their TOI against elites and also their corsi numbers through no ability of their own. They just jumped over the boards and benefited from the coaches' usage.
I think these JFresh cards are neat and interesting, but I personally wouldn't trust it over the opinion of a fellow hockey enjoyer that I trust.
P.S. The Nurse contract is gross, but it also seems to be the new norm for D-men. I believe that D-men in general were underpaid relative to forwards up until the last 3-4 seasons and this is part of a natural swing to bring top pairing defencemen in line with top line forwards. Also it seems Canadian teams pay something of a premium on every contract these days.
Given Nurse's natural athleticism, work ethic, and the probable soaring inflation coming down the pipe I think this contract will end up looking palatable over time, but who really knows. My investment account tells me I'm not that great at predicting the future.....
|
But you don't need a JFresh card to look up Nurse's simple frequency of events in his own zone.
You don't need an eye test or a model to want a defenseman to not be under siege more than his counterparts.
|
|
|
08-09-2021, 02:30 PM
|
#309
|
Franchise Player
|
Quote:
Originally Posted by Enoch Root
Because I have. And most people have no idea how difficult it actually is. I take that back - they are easy to create - but how difficult it is to actually create useful models that do what you want them to do.
Any model will spit out results. And, assuming your inputs and variables are relevant to the subject, the model will spit out results that appear to be doing what the model is intended to do.
However, proving that the results are valid is the hard part.
|
Good stuff. Modeling chaos is difficult to impossible. The problem is that the inputs and variables used in hockey statistics are many times not representative of the behavior being modeled and as a result produce inconsistent or invalid results. Providing consistency and validity is the toughest part of modeling any behavior and why so many studies and models fail.
|
|
|
The Following 3 Users Say Thank You to Lanny_McDonald For This Useful Post:
|
|
08-09-2021, 02:31 PM
|
#310
|
Franchise Player
|
Quote:
Originally Posted by Bingo
But you don't need a JFresh card to look up Nurse's simple frequency of events in his own zone.
You don't need an eye test or a model to want a defenseman to not be under siege more than his counterparts.
|
I would say you need an eye test if you're watching Darnell Nurse play defense and you consider him good in his own end.
|
|
|
The Following 2 Users Say Thank You to Lanny_McDonald For This Useful Post:
|
|
08-09-2021, 03:02 PM
|
#311
|
First Line Centre
|
Quote:
Originally Posted by Lanny_McDonald
I would say you need an eye test if you're watching Darnell Nurse play defense and you consider him good in his own end.
|
Darnell Nurse is a tire fire in his own end…..I love the pages and pages of posts trying to justify him being a number 1 dman. Completely ridiculous.
How many points would Rasmus Andersson or Noah Hanifin have playing 20-25 mins a game and full pp mins with Mavi and Raisatl?? He is a very average dman, you paid $2-$2.5MM too much for him.
|
|
|
08-09-2021, 03:05 PM
|
#312
|
Franchise Player
|
He is Zadorov with a more accurate shot and a better opportunity
|
|
|
The Following User Says Thank You to Enoch Root For This Useful Post:
|
|
08-09-2021, 03:29 PM
|
#313
|
Franchise Player
|
Came to this thread to dunk on the horrible Nurse deal, stayed for the insufferable argument about data and modelling efficacy.
|
|
|
08-09-2021, 03:39 PM
|
#315
|
Franchise Player
Join Date: Mar 2007
Location: Income Tax Central
|
Quote:
Originally Posted by Sofa GM
The numbers say he is a bad defenceman, his shooting % says he had a career year goal scoring year and will likely come down to earth, but the fan boys keep speaking of him as a top 5 dman in the league….. someone needs to help me understand…… 
|
Again....he generated those stats playing in the abysmal 'North Division.'
There were teams that were there just to make up the numbers.
__________________
The Beatings Shall Continue Until Morale Improves!
This Post Has Been Distilled for the Eradication of Seemingly Incurable Sadness.
The World Ends when you're dead. Until then, you've got more punishment in store. - Flames Fans
If you thought this season would have a happy ending, you haven't been paying attention.
|
|
|
08-09-2021, 04:24 PM
|
#316
|
Crash and Bang Winger
|
Quote:
Originally Posted by Locke
Again....he generated those stats playing in the abysmal 'North Division.'
There were teams that were there just to make up the numbers.
|
How many times has Edmonton done this though? Where someone has a career year with numbers way outside of their expected output, and then they just assume that that is now the new standard and they will never deviate from that. It seems like every time they have anything in their season that counts as a success of some sort they start acting like they are one tiny piece away and begin booking Whyte for the parade.
|
|
|
08-09-2021, 04:26 PM
|
#317
|
Franchise Player
|
Quote:
Originally Posted by Enoch Root
But I can say this much with complete accuracy and justification: the burden of proof is on the model. And knowing A) that there is really not very much data when it comes to hockey, and B) the data that we have is almost impossible to completely isolate (which is what we really need), the likelihood of obtaining truly valid results for individual player performance that we can accurately and fairly compare against other players' performance is quite low.
|
Well, maybe I'm expressing this incorrectly. I guess what I mean is that the burden has been met, once the person who's created the model has explained how the model has been developed, how it has been tested, and how it has performed. They've done that. So now it would be up to whoever is attempting to discredit the model and claim that it isn't actually useful in the ways that the developer says it is to explain why they're wrong, and why the degree of accuracy they claim to have achieved is a mirage. As you say, you have some experience here, which would seem to suggest that you're qualified to do just that.
I obviously can't explain why you, as one person with some knowledge and expertise on statistical models, have come to the conclusion that the data available for hockey is inadequate to yield results that can be fairly used to compare one player to another, while another person with knowledge and expertise has come to the opposite conclusion. You'd have to explain where the difference of opinion lies.
I certainly see the challenge you're highlighting when talking about reliably assigning outcomes (data) to individual players. I think most hockey analytics people would agree that that's the main source of error in their predictions. But it seems to me that there are enough data points to narrow the error bars to an extent where the model output is reliable enough to be useful and worthwhile. Again, the bar to clear here is nowhere ear perfection.
Quote:
And then there is the added problem that, with hockey stats, the vast majority of the users of the data aren't in a position to properly filter and interpret the results.
|
Well, that's definitely true, though it isn't the fault of the person doing the analysis. If you want to tell me I'm misinterpreting their results and how I'm doing that I'm all ears.
__________________
"The great promise of the Internet was that more information would automatically yield better decisions. The great disappointment is that more information actually yields more possibilities to confirm what you already believed anyway." - Brian Eno
|
|
|
08-09-2021, 04:30 PM
|
#318
|
First Line Centre
|
Quote:
Originally Posted by GhostCookie
How many times has Edmonton done this though? Where someone has a career year with numbers way outside of their expected output, and then they just assume that that is now the new standard and they will never deviate from that. It seems like every time they have anything in their season that counts as a success of some sort they start acting like they are one tiny piece away and begin booking Whyte for the parade.
|
I’m hoping Smith falls apart and Kostinem plays decent and puts up around a 0.908-0.910 SV% and they bring him back on a 3 x $5.5MM
|
|
|
08-09-2021, 05:25 PM
|
#319
|
Franchise Player
|
Quote:
Originally Posted by CorsiHockeyLeague
Well, maybe I'm expressing this incorrectly. I guess what I mean is that the burden has been met, once the person who's created the model has explained how the model has been developed, how it has been tested, and how it has performed. They've done that. So now it would be up to whoever is attempting to discredit the model and claim that it isn't actually useful in the ways that the developer says it is to explain why they're wrong, and why the degree of accuracy they claim to have achieved is a mirage. As you say, you have some experience here, which would seem to suggest that you're qualified to do just that.
I obviously can't explain why you, as one person with some knowledge and expertise on statistical models, have come to the conclusion that the data available for hockey is inadequate to yield results that can be fairly used to compare one player to another, while another person with knowledge and expertise has come to the opposite conclusion. You'd have to explain where the difference of opinion lies.
I certainly see the challenge you're highlighting when talking about reliably assigning outcomes (data) to individual players. I think most hockey analytics people would agree that that's the main source of error in their predictions. But it seems to me that there are enough data points to narrow the error bars to an extent where the model output is reliable enough to be useful and worthwhile. Again, the bar to clear here is nowhere ear perfection.
Well, that's definitely true, though it isn't the fault of the person doing the analysis. If you want to tell me I'm misinterpreting their results and how I'm doing that I'm all ears.
|
This is where I disagree.
Their model may say that this player is better than that player at X (defensive zone play, for example), but they haven't, and most likely can't, demonstrate that the results actually prove this (valid output).
We want to know who is better defensively, Smith or Jones. The model tells us that Smith has a WAR of 18% and Jones has a WAR of 28%. Can we conclude that Jones is better defensively than Smith?
We can conclude (obviously) that Jones scored higher on the inputs that the model is using in order to try and illustrate that they are better defensively, but taking that to the next step, they can't demonstrate that Jones is a better player. We have to assume that the input scores will determine the desired conclusion in order to bother using the model. But that assumption is a giant leap of faith and very difficult to demonstrate (especially with the unique challenges that hockey resents).
Anyway, I have derailed the thread enough for one day. And this is a very challenging thing to discuss on a message board with short, two-paragraph replies. We can agree to disagree.
|
|
|
08-09-2021, 05:44 PM
|
#320
|
Franchise Player
|
Quote:
Originally Posted by Enoch Root
Their model may say that this player is better than that player at X (defensive zone play, for example), but they haven't, and most likely can't, demonstrate that the results actually prove this (valid output).
|
Okay, I am with you but waiting to hear why they haven't and most likely can't demonstrate that.
Quote:
We want to know who is better defensively, Smith or Jones. The model tells us that Smith has a WAR of 18% and Jones has a WAR of 28%. Can we conclude that Jones is better defensively than Smith?
|
Well, no. They have a defensive WAR of some number, generally rounded to the nearest tenth. For Nurse, it's -1.6. WAR stands for Wins Above Replacement - it's a number of won games added by the player in that area. The percentages are intended to express what percentage of active players are below the player in question in that stat.
Quote:
We can conclude (obviously) that Jones scored higher on the inputs that the model is using in order to try and illustrate that they are better defensively, but taking that to the next step, they can't demonstrate that Jones is a better player. We have to assume that the input scores will determine the desired conclusion in order to bother using the model.
|
This seems like the issue is defining a "better player". The model developer is saying, look, we have thousands and thousands of shots of data. X% of shots in this area near the slot result in goals, Y% in shots over here on the half wall result in goals, and we can see, compared to a replacement level defenseman, where players playing against Darnell Nurse take their shots from, and how many they get. We can adjust that rate by looking at where they get those shots when he's playing with teammate A, B, C, D by looking at their own individual results when playing with or without Nurse on the ice. We can do the same for opposing players, normalizing by how good they are at getting high danger shots, and the volume of them, against other competition league wide. We can then factor in things other data we have access to, such as turnovers, possession exits, completed passes out of the zone, to factor in how those affect the frequency and dangerousness of chances given up by his team when Darnell Nurse is on the ice, again adjusted according to how good his teammates are at those things and how good the competition he plays against is at securing turnovers and preventing controlled zone exits. And taken together, all of those things tell you a lot about whether Darnell Nurse is good at playing defense or not.
I gather that what you are saying is that those things can't actually tell us much that's useful about whether a player is good at defense. If so, I guess that's where we'll have to disagree. I think those things (particularly shot volume, shot location and the ability to get the puck out of the zone in transition) are the most important skills and outcomes when it comes to keeping the puck out of your team's net. I base that view on a large amount of analysis that has been done over many years by a wide variety of analysts.
Quote:
Anyway, I have derailed the thread enough for one day. And this is a very challenging thing to discuss on a message board with short, two-paragraph replies. We can agree to disagree.
|
Sure, no worries. I do understand where your skepticism is coming from.
__________________
"The great promise of the Internet was that more information would automatically yield better decisions. The great disappointment is that more information actually yields more possibilities to confirm what you already believed anyway." - Brian Eno
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -6. The time now is 02:45 PM.
|
|