Group thought: Questions to ask during the 2012 open....

Jeff is doing a great job providing a complete dataset this year (see last post for a link), which will allow many people to play with the data and come up with interesting analysis.  It's a ton of hard work and we're very fortunate to have him donating his time to this effort.

That said, what are some of the burning questions people want to ask of the dataset?  Hopefully by getting the questions out, we can have folks discussing the data and questions together.

Here are some of the things I'd like to see during the games.

1)  I'm fond of the height/weight plot for each workout, as I think it gives a quick informative picture of what each WOD.  By week 3 or week 4, is it possible to predict what the last wod should be?  One of my standing predictions is that wall ball will be a part of the open.  I think it's one of the few exercises that helps taller athletes.  Erg likely falls into that category too, but I have trouble thinking that would be required.

2)  At the end of the games, will it be possible to answer how much a year's worth of training helps?  This is a bit tricky, because we'll need someway of understanding if the overall performance of open participants has changed over the years.  Has crossfit's emergence and popularity shifted the performance curve down? 

Feel free to post your own questions, about a specific WOD or of the games overall.


  1. Awesome. Will the box link eventually include some of the 2012 data?

    Thanks for the sharing the MATLAB scripts. Useful.

    2. I'd like to do a comparison between the men's and women's distributions of scores. I'd assume that with AMRAPs of properly scaled weights that the scores should be the same. If not, that would give us a hint to how to set up standard scaling rules to achieve parity.

  2. I have harvested the scores and ranks for Event1. I began the harvest several hours after Event 1 was defined as closed. It is still possible for athletes to sign up and submit results, even after event closes. Plus its possible that scores and ranks change as corrections and edits are made.

    I added a "Division" column with approx. age divisions (see notes).
    I added overall rank values.
    I added columns for Events 2-5. So, the format should be close to final format.
    I removed the quotes around numbers.
    I formatted english heights as "69 in". So, all feet/inch are gone.

    The file is located:

    1. Awesome! Thanks for doing this. I got it downloaded and it all seems to be opening up correctly on this end. You all seem like Matlab nerds, but I work for the company that makes IDL (a big Matlab competitor), so I'll be releasing my IDL scripts to do processing as I get them done.

    2. One more request, could you update your screen scraper to grab the other supplemental optional information like fran time, c&j, etc. It'd be cool to see how different skills translate into performance in different workouts.

    3. Thanks Jeff!!
      I'm confused about the overall results column. I assume it is meant to be the points used to determine overall ranking, in which case it should be the sum of the rankings from the 5 workouts, but currently I think it's the sum of the reps from the 5 workouts?
      Also, is it easy to grab the regional rankings too? If not, I'll write some code to calculate it myself.

      Your efforts are MUCH appreciated, I have no idea how to do this myself.

  3. I'm not sure if this information can be captured, but I suspect that training volume will need to be at least an average of two workouts per day for those athletes who make it past the regionals.

  4. Had a few minutes down time at work. Here's where I'll be posting my analysis. Starting off with just boring histograms. Will do more as time goes on.

    1. Mike,

      Those are gorgeous! Are there only two? There are some weird icons on the screen.

      Are those fitted bell curves? -X^2?


    2. Cool, miked. Thanks for playing around with the data! It's great to see some early plots. Did you want me to link to your site with any sort of write-up? I think viewers would love to read discussion points you bring up. Feel free to contact me...

      ...and oddly, my wife was one of the very few people at her company that could be rightly called an IDL expert. She used it pretty extensively through grad school and loved it!

  5. According to a recent article on the Games site, scores are in flux until 5pm Tuesdays. I suppose they can still change and have new ones added even after that deadline. But for the purposes of advancing to Regionals, the scores must be approved and I'd guess any disputes/issues resolved by Tuesday 5pm PT. As such, I will not begin my harvest of Event 2 data until Tuesday night. I will not repeat it this week for Event 1. But I'll wait the designated period for Events 2-5.

    As for including the Stats data and the questionaire content, I am going to create separate files for these values. The number of valid responses varies (an athlete can enter none or as many as they wish) and the values are longer, so it can get messy. I'd rather not mess up to the nice, clean format I have for results at this point.

    Am also looking into somehow combine Open 2011 with Open 2012 data, matching on some combination of name/age/region.