Wednesday, March 21, 2012

Week 4 data download and Mike's analysis

Mike has been tearing it up this year doing some great plots.  Head on over his way and enjoy his commentary.

Quick note for analyzers out there:  Jeff has scraped the latest data (thanks, Jeff!).  His comment on the dataset, as well as the link are as follows:

From Jeff:
This update includes several new features:

- Regional Ranks per event in addition to Overall Ranks per event.
- Masters Divisions now properly account for bubble ages (44 yrs old now, but will be 45 in July)
- Age07 is Age in July. Is only different than "current age" for 390 bubble Masters.

Data file can be found here:


  1. Nice analysis Mike. I'd love to see a scatter of 'changed place' based on cfdev's "fixed" method (for ties and attrition).
    I agree that the open does a good job for the top atheletes. Where it falls apart is, as you pointed out, in the middle of the pack. This may not matter for choosing the fittest man and woman, but it will make a big difference for affilates trying to find the fairest way to choose members to send to regionals on the team. I've experimented with different scoring for my own affiliate members and each method changes who would be selected based on regional scores.

  2. It seems like a good idea to do a results scrape for 12.5 later in the week, but not too late so as to answer the question:
    "I still have time to do 12.5, what score will I need to get into XX percentile or a certain place?"

    While one can look at the current scores at any moment and see the current rankings and scores, I'd think that a plot of results would show how the overall field is doing and so, therefore, what the rankings will look like at the end.

    This assumes, I guess, that athletes that post late are typically distributed. That is, NOT only firebreathers post late, only the bottom end of the scale post late.

    So, what day/time might be a good time to scrape and plot "results so far"?

  3. I'm having some trouble with your wk4 csv Jeff. It's reading into R as a vector of long strings. I know that it is | delimited, not comma, so that isn't the problem (I had no trouble reading in the wk 1 data using read.csv(...., sep="|") ). Any thoughts on what's changed this time?

  4. I looked at the file with a plain text editor, and its just as documented. Mostly integers delimited with pipe, then things like name, URL , division, etc are surrounded by double quote, delimited by pipe.

    I used same code to create this file as prior weeks, no change to end of line terminator or anything. So, no, I have no ideas.

    Its alomost the same as wk1 file was. Does that R command expect a specific column format? If so, the number of columns has been tweaked over the weeks, adding regional ranks, etc.

  5. Week 5 of CrossFit Open 2012 data is now available here:

    1. Sweet! Thanks, wish I'd seen this earlier.

    2. This comment has been removed by the author.

    3. One issue I've noticed with the file--not a big deal, but figured I'd mention it in case you're not aware of it--scores for WODs 1 and 2 for some athletes do not seem to have made it into the file. About 12% of the 12.1 scores are missing, and about 6% of the 12.2 scores. The missing scores seem to be those of athletes who did not complete all the workouts.

  6. Thanks Jeff! I was waiting for it! Next week, I have a bunch of downtime, so I'll hopefully be able to do to a bunch of analyses.

    Anyone is free to suggest questions they want answers to.

  7. Preliminary final scoring fairness assessment is up:

    Still have lots to do, but wanted to keep the info coming while people are still interested in the open.

  8. Hi guys --

    I just started a site,, to do some of my own analyses of the Open data. I've put up a couple interactive tools that aren't actually analyses, but rather scratched an itch.

    I'm not completely sure yet that it won't crash :), but if you're interested in checking it out, please do, and leave feedback if you like.

    Yours, Mike

  9. I just wanted to post a link to an analysis I've done on the regional competitions, comparing scores worldwide while taking into account the advantage gained by competing in later weeks. There hopefully will be more to come on this blog in the future.

  10. Jeff: did you ever compile data for athlete's benchmarks? I'd like to play around with it if you have. Thanks!