Thursday, May 22, 2008

Georgian Election | ODIHR Preliminary Report and its Percentages

So the preliminary report on yesterday's Parliamentary Elections which ODIHR has just released again notes that the count had problems.

While this, as discussed yesterday, is not a good overall indicator for how the counts went throughout the country, it raises the question whether we can at least compare this report with the one for the Presidential Election in January. Presumably, if 23% of observers managed to find a bad count in January, and 22% identify problems now, it should mean that the number has remained relatively stable. So: in terms of count, the election roughly is the same.

Right? Actually, no. First, different observers have different standards in terms of what they characterize as "bad". As the ODIHR statistician (a figure fighting for more attention internally, and fortunately making some progress) will tell you, Russian observers, for example, fill out their forms somewhat differently. Since there is no training, there's no calibration of what "bad" means, and how to distinguish that from "reasonable" or "very bad". Change the composition of the Election Observation Mission, and you may change the results. Although this is the biggest problem when comparing two very different missions (Georgia's numbers, with 22% of counts assessed as bad or very bad and Armenia's Presidential Election in February, with 16% in that category just can't be meaningfully compared), it can also affect a comparison of two elections in the same country.

A bigger challenge comes from better targeting of observers: since this is a repeat election within a relatively short time frame, ODIHR can target so-called problem districts and precincts much more accurately. More observers in these problem districts means more problems found. It is perfectly possible that a relatively stable number actually hides a marked improvement. Again, that's a sort of non-obvious selection bias.

Add another curious component: in the January election at least some teams were ordered to abandon the observation because of rough cold conditions and snowfall at some point in the night ("drive before the driver gets too tired"), and return to their hotels. This time, with better weather, the observation probably was more sticky, and more teams stayed until the very end when some of the problems become really apparent. Again, this could have some impact when comparing the numbers.

Noting these counterintuitive impacts (some small, some big) on absolute numbers shouldn't serve to dismiss the observation effort, nor the attempt to quantify. Yes, no count should be bad, and training and everything else should remain as ambitious as possible. We're noting this primarily to contribute to a sophisticated use of the data, and again to underline the need for a revised observation methodology, which ideally emphasizes more sophisticated sampling.

1 comment:

Onnik Krikorian said...

You know, it's now extremely difficult to assess what OSCE/ODIHR mean with all their diplomatic speak seemingly sometimes based on their own requirements and with nothing to compare a report with.

Couldn't they just come up with some scoring system that would be the same everywhere they monitor so we can properly assess improvements and regressions as well as how countries shape up with each other?

Yes, I know the teams differ, but have a foolproof methodology with set and measurable requirements? Or at least, please, can they standardize the language they use for all of us out here.