Sunday, February 3, 2013

Test-run for army judging

I served as an army appearance judge at a local Warhammer 40K tournament on Saturday at Huzzah Hobbies in Ashburn, Virginia.  This event is a qualifying event for NOVA Open 2013.  Registration for NOVA Open 2013 opened just the night before on Friday.  Friday evening was the first milestone for publishing the new judging system that I composed, in collaboration with the NOVA Open CEO, Mike Brandt, and long-serving NOVA Open art judge, Bob Likins.

All three of us formed the 3-judge panel on Saturday.  It was an ideal lab to test out our new judging system.  The system is set up for two rounds of judging.  In the first round, each judge rates the armies quickly using coarse-grain criteria:
  • 0    Unpainted
  • 10  Tabletop Minimum
  • 30  Tabletop Standard
  • 50  Tabletop Ideal
  • 70  Work of Art
  • 90  Masterpiece
For a tournament, where the army appearance level is expected to be high, the bar is set correspondingly high, where the bell curve is centered on 50, but it requires extra effort to achieve even that level.  The average is a "tournament-level" average.

If two judges' First Round scores are equal, that is the base score that is kept.  There are no averages.  If there are significant deviations across the judges, the head judge reconciles.

For the NOVA Open, all scores of 70 and 90 are then reevaluated using fine-grained criteria, where each qualifying army is examined in detail.  There are four criteria categories.  Two are objective:  Technical and Artistic; and two are subjective: Creative and Impact.  They are not weighted.  Each judge gives the 70 or 90 base score a single-point bump-up, bump-down, or neutral adjustment, performing this adjustment for each of the four criteria categories.  The total adjustments for all 3 judges equate to a possible 12-point spread on either side of the base score.

The 2-Round method enables efficient use of two limited resources -- staff and time.  Every army can be quickly scored for the purpose of competition scores, and the art judges can devote an appropriate proportion of time concentrating on the highest-quality candidates.  The system is designed to evaluate literally hundreds of armies.  The 40K tournament alone will have 256 armies!  Then there's Warhammer Fantasy, Warmachine/Hordes, Flames of War, Infinity, and other tournaments on of that!

For this smaller 14-person tournament, we adjusted the method to apply the Final Round criteria to the 50, 70, and 90 scores.  This adjustment worked well for a smaller field of competitors.

We were very pleased with the system.  The First Round of judging consumed roughly one minute per army.  That pace will likely be sufficient to grade all NOVA Open tournaments across the two days, but we won't have a lot of slack.  We'll probably need to go faster.

Some interesting observations:
  • Every First Round score saw at least two judges agree on the score.  There was no need for head judge reconciliation.  
  • Each judge saw something different or caught something important that the other judges missed. 
  • Each judge brought their own bias to the scoring.  The judge most acquainted with 40K detected creative elements that the other judges missed.  Another judge might have focused more on the Technical or Artistic criteria, while another judge (someone who's a big softie) might have given a little more weight to the Impact criteria.  This observation is particularly interesting to me, because when designing the system, I wondered if the criteria should be weighted.  I decided to see what would happen if it was left open and undefined.  My observation is that each judge self-weights the criteria.  If the criteria were weighted up-front, I think each judge would counter the weighting to effect their predisposition, and therefore neutralize the original weighting.  Therefore, there's no point to weight the criteria -- it just causes more work for the judge to compensate for ultimately their own bias!  Overall, the biases should level out, more or less, among the three judges.
  • Each judge scored one army a 70.  Each 70-scored army was a different army!  
  • All of the 70s were thrown out, leaving five 50s for the Final Round.  The three winners were closely matched in the final tally, scoring 55, 53, 51.
All in all, I'm very pleased with the results.  Ultimately, though, its the competitors who should feel good about the system.  We heard some valuable feedback from one person, who helped us realize that we were poorly communicating the difference between the 30 and 50 scores.  That's something we've already gone back to adjust.  Hopefully we'll get some more feedback on the NOVA Open website, forums, or Facebook page.

We wrapped up the evening with about 8 of us eating dinner and drinking beer at a good restaurant.  The game store owner joined us, and that was fascinating to talk to him and get an insight into the retail side of the hobby.

I picked up some more Dust Warfare models, the GW emery boards, some airbrush primer, and some old-skool 80's-era Space Orks.  A great day!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.