Saturday, April 16, 2011

What's a muscle up worth?

Not much, compared to the last OHS!

Thanks to BJ's observation below (see comments), the analysis of the original post is pretty much wrong.  Even though the muscle up should have been worth quite a bit (considering the percentile chart below), HQ is awarding the highest ranking points to tied individuals instead of the lowest.  So if you score a 90 (ie where the big vertical line is in both men and women on the plot below), you are awarded the percentile score at the top of the vertical line.  I had incorrectly assumed that the rank score for 90 would be at the bottom of the vertical line.

The corollary of this: the last OHS is one of the most important reps you can complete!

Thanks again, BJ, for pointing this out.  I have left the evidence of the previous post so people can make fun of me.

The muscle-up is the most valuable rep in the open thus far, gaining 13 percentile pts on the men's side, and a staggering 20 percentile pts on the women's side.  To put this in absolute terms, if all the competitors from week 3 continue through week 4, a male competitor completing a muscle up will beat an additional ~1150 competitors, while on the women's side the same will beat another ~950 competitors.

The first muscle-up is the single most valuable rep in the open thus far. The way HQ is scoring the open, the last OHS is one of the most valauble reps.  In fact, if you replace everything that I previously said about "the first muscle up" and replace it with "the last OHS" the post would be all good.

Since the percentile score at the point of the muscle-up starts around the 70th percentile for women, this presents a critical strategy for the gals. A single muscle up may mean the different between getting to regionals and not (also see chart for percentile scores of the top 100 athletes in each region).  If I were a combination of things:  (1) woman (2) on the edge of making regionals (3) able to get through the first two phases of the workout (all things I am not even remotely close to), I would do my very best to rest all muscles after the OHS and give the muscle-up the best shot in the last 20 seconds of the workout.  The second muscle up has little value compared to the first, and it doesn't make any sense to possibly expend energy if you have a few minutes left.

For the men, the same things hold, but the end result has far fewer implications since the muscle-up starts around the 42nd percentile.  The muscle-up would clearly be for pure glory!

Also, by popular request I have started to look into the age vs performance question, and yikes... the plots are pretty grim, especially on the men's side.  Stay tuned...

This post discusses the fourth 2011 crossfit open workout, described here.


  1. How are you getting accurate data? The leaderboards on the games site have been so broken that scraping them would be useless. If you've got access to the raw data is there a way for others to get it so that we can generate sane leaderboards?

  2. Great, I'm really looking forward to the additional breakdowns. And I must say that through WOD 3 I thought the programming was great, but WOD 4 IMO is flawed.

    If WOD 3 was used to eliminate anyone who couldn't clean 165/110, then WOD 4 should have been a lot harder on anyone who couldn't do a M/U. So for example they could have reversed the order of the AMRAP (starting with the M/U's), or, less harsh, they could have made it 30/15/5. As it is, this WOD again favors the big guys who can get through the OHS with relative ease. (Of course none of this is to say that it would changed the overall top results, as the leaders can do everything well.)

  3. Could somebody explain me the big problem with the website data? I'm fully aware that HQ hasn't done a good job 'sorting' the scores, but that shouldn't affect my plots, unless there are problems with the accuracy of the raw data.

  4. Not sure I agree with your analysis. Since scores are given based upon a top down ordinal ranking, the difference between a person doing 90 reps and 91 reps is rather modest. On the other hand, if you don't get all 30 OHS completed, you will be severely penalized. As an example, as of this writing, here were the men's rank scores for 89, 90, and 91 reps.

    89 reps: 4360
    90 reps: 3415
    91 reps: 3144

  5. ... and for women, the first muscle up is even less valuable. Here are their scores for 89, 90, and 91 reps:

    89 reps: 1041
    90 reps: 379
    91 reps: 324

  6. Whoa... BJ, you're totally right and I didn't even bother checking the leaderboard to see if the results matched up that way. The way HQ is scoring makes the last OHS the most important rep, not the first muscle up. Yikes, I will totally update the post to reflect that.

    On a broader note though, doesn't this seem like a strange way to score the open? Suppose a workout is designed such that out of 10000 competitors, 9999 score a zero and a single person scores a 1. I think most people would agree that the single person must be a superhero and should be rewarded as such.

    The way HQ would score this is that the single person would get a rank score of 1, and everybody else would get a rank score of 2, which hardly seems right to me. Thoughts?

  7. This comment has been removed by the author.

  8. This comment has been removed by the author.

  9. I've always thought they should award the people who tie with the mean value of the ranks. (This would have been pretty important during the DL competition in the 2009(?) games.) So if if one person gets 100 reps, 48 get 99 and 1 gets 98, the ranking would be 1 for first, 25.5 for the next 48 people and then 50 for the last competitor. As it is the ranks are 1 for first, 2 for the next 48, 50 for the last. Seems kind of stupid and easy to fix...

  10. At the moment, HQ isn't just having trouble sorting the scores, they're currently issuing different "scores" for the same number of reps for the burpee/ohs/mu WOD, at least on the regional boards.

    So, I guess it would work to scrape the raw rep numbers for each competitor and then implement the scoring algorithm yourself and properly account for ties (unlike the games site). Though, they appear to have changed the mechanism they're using for scoring within regions. They used to calculate regional placing by summing your individual workout scores relative to the worldwide field, now they're using your score within the region.

    That also doesn't account for the cases where people are reporting that their scores are never actually being shown anywhere on the leaderboards, but those are probably few enough to not affect your plots much.

  11. Thanks for the update. I agree with you 100% that HQ has adopted a bizarre scoring method that has deemphasized the value of having a muscle up. Furthermore, I would guess this is exactly the opposite of their intentions. Btw, this site is awesome; I'm really enjoying the analysis.

  12. agreed, this workout should have been done backwards. and there must be something weird about the points making it so that the 90th rep was the most important, bc that doesnt make any sense. if youre getting 29 OHS you are def gonna get 30. i think your analysis is correct.

  13. Just a thought: have you tried sorting these numbers using multi-factor analysis? I imagine height/weight ratios would by really interesting to see and probably much more helpful. A shorter 135 lb individual will perform much better than a taller one (most likely) for instance.

    You could probably even manage to do this with BMI given the sample size and the likelihood that many CFers in the games are relatively healthy body-comp wise. This would give a good idea to the best "thickness" of a CFer.

  14. The more I think about it, what Amit says makes more sense to me. If everyone who ties is awarded the lowest ranking score, there would be a pretty large incentive to get the next rep in as quick as possible in order to separate yourself from the group. I'm thinking especially of WOD#1, where many people ended up finishing at the end of the round. Even though a single double under is not difficult at all, it still would have been worth a fair amount, especially in the middle percentile scores.

    @Jeff- Working on it! It'll probably be one of the last posts I'll do, consider it a thesis of sorts.

  15. @Cameron: Thanks for the heads up. The inaccurate raw score conversion is definitely troubling, and I'll look into that. Curiously, with the new way they're calculating region ranking, have you been able to find an example where person A is ranked higher than person B on the overall leaderboard, but person B is ranked higher on the regional board?

    I can think of strange circumstances where it might occur, but I have trouble believing it makes a huge impact.

  16. You don't have to look far, check out Rich Froning and Dan Bailey in the overall vs central east. Bailey leads overall but Froning is one point ahead in the region (assuming they've done the math right which is far from certain).

  17. I would be really interested in looking at this data set myself if it is available. Especially once the last WOD results are in. Is that possible? I just started learning R (a data analysis/statistics language if you don't know) and would love to use this data to play around in it.

  18. If HQ does change their ranking algorithm so that the lowest/bottom percentile rank is awarded for all those tied (and they could still decide to change it), then getting that 1st muscle up would be the most important rep, right? I suspect that was their intention: that having a single muscle up proved valuable, just as having a single OHS proved valuable at the bottom end of the range.
    I don't think the workout was flawed, in that MUs were included, could be made important with a rank algorithm change, and they were not required to get on the board.

  19. The raw score to ranking conversion is not inaccurate -- it's just that the score used to rank is done with relative ranks, and thus can produce results that are unusual. However, I think their formula is reasonable. If there were a physical site for say, Central East, Bailey and Fronings' scores would be relative to the Central East competitors only. The results would be exactly as described by -- Froning would be winning. Similarly, if all 12K males were to compete against each other, the results would be exactly as described by -- Bailey would be winning.

  20. Amit's suggestion of ties being scored as "mean placement" is a sound one. It preserves the sum of scores you get when there are no ties. That is, for 100 athletes, the sum of 1 to 100 will be the same regardless of the number of ties. As pointed out, in the extreme case of being ahead of a number of ties, it gives you the equivalent lead that you would have had had there been no ties.

  21. Mr. Young. I'm in Crossfit's technology group. I'd like to chat with you about data analysis. Hit me up

    I have some interesting questions.

    Great work.