Kendall’s Tau and the CrossFit Games

Correlation of Results

Correlation coefficients are a measure of agreement, and Kendall’s Tau is used to rank agreement of the order of items. An example of this could be seeing how closely different athletes are in ranking.

Suppose Alice was a gymnast and loves everything gymnastics-related. Brenda spent time as a weightlifter and loves throwing heavy things around. Cindy was a runner and swimmer, and excels at anything aerobic. Now, if they perform workouts and get these results:

WorkoutWinnerSecondThird
1. Pushups & DeadliftsAliceBrendaCindy
2. Burpees over the BoxCindyAliceBrenda
3. RowerAliceCindyBrenda

Comparing Workout 1 & Workout 2, we have 3 pairings of our athletes (A-B, A-C, B-C) and only one of these matches (Alice beat Brenda in both) and two don’t, so our correlation is defined as:

(concordant_pairs – discordant_pairs) / (num_pairs)

In this case, it’s (1-2)/3 = -0.33. Similarly, if we compare Workouts 2 & 3, and two pairings are concordant (A>B and B>C), so these correlate (2-1)/3 = 0.33. Workouts 1 & 3 have 2 concordant pairs and correlate (2-1)/3 = 0.33. We might then conclude that Workouts 1 & 3 most differentiate the athlete rankings.

Much like regular correlation coefficients, if the athletes finish in exactly the same order, the score would be 1.0. And, if they finished in exactly reverse order, the correlation would be -1.0. BTW, you could also use this to look at sports with subjective judging and see which judges are voting in blocks against the consensus, etc.

2021 CrossFit Games

For this analysis, I took the 20 highest ranked athletes and the order in which they ranked in each of the 15 events in the finals. I chose 20 as this many were in the final cut and went through all 15 workouts. I typed the data in manually, did the calculations with Python and then built a heat map with Tableau:

In this, Event 0 represents the overall ranking. The intensity of the blue shows a positive correlation, and orange shows a negative correlation. The first thing I noticed was that the men’s events correlated a lot less than the women’s event. The women were more likely staying in the same rank order across all events.

A few more observations, just looking at individual event pairs:

  • For the men, the worst predictor of overall ranking was Event 10 (30 toes-to-bar, 1.5 mile run, 30 toes-to-bar, 1.5 mile run, 30 toes-to-bar). It had a slight negative correlation (-0.147)
  • The strongest positive correlation was Event 4 (wall walks and thrusters) and Event 14 (deadlifts and hand-stand pushups)
  • The strongest negative correlation (-0.432) was Event 10 with Event 8 (handstand obstacle course. The women tended to stay in the same order (tau = 0.211)
  • For the women, the strongest correlation of finishing in the same order was Events 6 & 7. These were basically the same thing, 5 rounds of running and a clean, so this is not surprising. The least correlated Events for the women were 9 & 11.

So, what could we do with this information? I’m still thinking about that. One might do a larger analysis and look at this over several years to see what types of events correlate least to final position – then schedule these types of events earlier in the finals. This would create a lot more chaos on the leaderboard 😎

Note: I am not affiliated with CrossFit or sponsored by them, this was just a hobby project. The event descriptions and results are at:
https://games.crossfit.com/workouts/games/

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *