Wednesday, January 16, 2013

...Wrapped in an Enigma, Smothered in Secret Sauce

So dad, seeing that list of WAR leaders for the Yankees is pretty cool. It's cool when things you think you "know" about baseball tumble out of the stats.

I had two things I wanted to add; the first is about Joe D and the Mick; and the second is about how I finally was able to wrap my head fully around your "'stats' are outcomes" notion.

Starting broadly, Bill James, the Sabermetric Moses, when discussing his background assumptions for his Win Shares system of statistical analysis, discussed Joe DiMaggio and Rusty Staub. Nobody would mistake Rusty for Joe, but since their raw Win Share value were nearly identical, James decided on how to finagle the numbers to help them correspond with people's observations. He arrived at a system that mixed a player's raw values, their the three best seasons, and their best five year stretch with minor adjustments for league averages and park effects.

One thing he says about DiMaggio is that he was a great player during those three years he spent in the military, he just couldn't play. I think everyone would agree. I also think that if you were to claim that during those three years he could have been good for an average of 6 WAR a year, nobody would say that's totally out of the question. Some may say that it's too low. But, if you add 18 wins to his score, it'll be closer to 100, and his raw WAR value would be closer to Mickey's.

Another thing that Bill James claimed is that league averages show that the game in the late '30s and and early '40s was a much higher scoring affair than the late '50s and early '60s, and that this leads to the result that runs were at a premium during Mantle's era, and his run production was more significant to the team and the American League than DiMaggio's. You can take that leap if you buy the stats-based argument, but you have to convince yourself that 107 RBIs were more significant than 147.

Maybe the raw WAR values bear out certain feelings Bill James has about Mantle and DiMaggio (besides the missing 18-20 wins for Joe D lost during his military service).

When you first brought up the outcomes discussion with me on the phone, I felt it, but not fully immersed in how different a point of view it demands. I think I may have it, though. Correct me, please, if you feel I'm way off.

Let's say we create a player, last name Schmoe, first name Joseph. Let's say Joseph starts the season on a tear, killing the ball. In the first five games almost nobody can get him out, and he goes 16 for 21, for a .762 average. ESPN does opposing segments about Schmoe; one about it's still early in the season, and we should all just wait, and another segment about Where did this guy come from, and can he sustain it, and when is "too early" to start thinking about .400...

The .400 talk is obviously stupid, since our Joseph ball-player falls into a slump, and over the next 10 games goes hitless, say an 0 for 42 fortnight. Now, Mr. Schmoe, through less than ten percent of the season is hitting 16 for 63, which is good for a .254 clip.

These are statements of outcomes, not necessarily bankable expectations. You know that if you roll a die you've got a fifty percent chance to get an even number. On game 16 of Schmoe's season we're discussing, his chances of going 0 for 4 or going 5 for 5 have far more to do with his mental state that day, whether the pitcher he's facing is on his game, whether the game is outside and cold, whether or not he's loosened up by a dirty joke...many possible things that have nothing to do with his .254 average.

In that Game 16 he's not more likely to get 1 hit in 4 at-bats (a .250 average) because of his average. He'll go 1 for 4 because the pitcher got him to dribble the ball over to the shortstop on a second inning sinker; then he hung a curve ball that Schmoe roped to center; later Schmoe smashed a ball deep that was caught on the warning track; and then he struck out looking at pitch number 17 from a reliever late in the game. Or any other possible set of circumstances that lead to any other set of outcomes.

The point is that his success to end a slump almost rests more with the pitcher he's facing that day than with himself, unless his mechanics are a total mess.

Now what if we switch the timing on Joseph's hot streak and slump. Say in late July, maybe game 116, Schmoe starts a five game tear, killing the ball, going 16 for 21 and raising his average a few points. He's proclaimed player of the week. Then, say, he goes into the ten game slump, and we're getting ready for game 132 in August. Can we say that Joseph's chances in game 132 on getting a hit are based on his batting average?

Just like in game 16, it seems like his chances of getting a hit might have more to do with the pitcher he'll be facing than whatever stats he's built up to that time. Right? Isn't that the kernel of the idea that using outcomes as a method of extrapolating future performances is inherently flawed?

Isn't baseball awesome?

Okay, one more thing: Who are the two best Yankee pitchers? There are many candidates, but let's say for the sake of what I'm trying to illustrate, we go with Whitey and Mo. Plenty of Yankee fans younger than myself would have no problem putting Rivera on that list, but they may not even know about Whitey.

I, too, am a huge fan of Big Mo, and have seen him save games in person more than once. And I, too, feel kinda funny placing a guy who has pitched just over 1200 innings in the same breath with a guy who pitched almost 3200. The word that's used is "leveraged" innings, like a closer has more to do with the final outcome of a game than a starter. What stat, besides Saves, does Mariano have the lead over Whitey Ford? The lead is not trivial either; it's more than a factor: games.

Mo appeared in 1051 games to Whitey's 498 games. This leads to modern day relievers being seen by the public as closer to position players maybe. What does this mean? Can we make any conclusions?

Two things to end with: 1) A starting pitcher can give up seven runs, leave with his team behind, and the team can still win the game--did that starter have more to do with the outcome of that game than the last pitcher up there closing the door on the other team? And 2) Between 1961 and 1965, Whitey Ford faced more batters than Mariano Rivera has faced in his 18 year career, and we're somehow supposed to rank Mo on the same level as Whitey?

I love baseball...

1 comment:

  1. Pat, outstanding post, definitely better than mine.

    And you hit on exactly some of the points I was thinking about when drafting it. Here's a further attempt at one of them: I'm not sure where I'm coming down on this one. I stated that many new sabermetric stats value accumulation, but I said that without a lot of expertise. If Mr. Schmoe doubles and homers on a given day, and drives in three runs, that doesn't mean jack for the next day's game. The increased probability of good production from a hot-hitting Joe Schmoe comes from his physical fitness and mental readiness, not from his numbers, a fact that I think many number-crunchers lose sight of.

    Now, on pitchers: I think I posited a starter going 7 innings in a five-day period having a 90% say in, or control over, the game he started. Well, if he only went 7 innings, I'm going to have to amend his influence to only 70%, right? That's a lot closer to the position player's 56%. Now, however, let's say the team's closer had a busy week and appeared in 3 of the five games, one inning each. Our thesis says that, okay, 90% control over one inning gives him 10% control over the game. If he does it three times in the five days, that gives him a cumulative 30%. However, when you add the leverage factor, which drives these numbers quite a bit higher, maybe the closer deserves a pretty substantial WAR.

    People look at me strange when I say there's only one true stat, and that's winning percentage. Everything else in every sport, serves it, and serves to varying degrees to illuminate it.

    ReplyDelete