Even if the pairwise comparison method is unfamiliar to most of us because we are more likely to be offered scale-based assessment in everyday life, it is still widely used in practice wherever simple methods are too imprecise. For example, it is the favoured method for tournament sports such as chess, tennis, fencing, badminton or others. In market research, it is used to determine consumer preferences. In clinical studies, it is used to assess the effectiveness of treatments or medication. The method is also used in psychology to investigate preferences and decision-making behaviour, in product development to evaluate design prototypes and in web design to optimise the user experience.

Personally, I used the method during my time as a research chemist to analyse the effect of influencing factors on complex issues. It is a high-precision statistical method and recognised state of the art. I would like to explain how this method works using a simple example from photography.

Suppose we are presented with five images that we have to rank:


To make the task challenging, examples from different genres were given: Flowers, people, landscape, sport, animals. They are all high-quality photographs. The technical quality is therefore irrelevant. The only decisive factors are the special nature of the shot and the image's effect on the viewer, whereby different standards must be applied to each image.

The task is complex. With a scale-based evaluation (for example, 1 to 5 points), it would most likely result in almost every image receiving 5 points from every juror. This may flatter the respective photographers, but it makes us fail.

This is where the pairwise comparison comes in handy. The images all compete against each other. There are n×(n-1)/2 comparisons necessary. For our example, these are the following 5×(5-1)/2=10 combinations:


As an example, you could make the following 10 individual decisions (grey = discard, coloured = keep)


These decisions are entered into a matrix (0 = discard, 1 = keep) in the blue fields. The white fields are automatically calculated as the opposite. In the column on the far right, the sum of the evaluations appears, automatically resulting in a ranking:


You immediately recognise the great power of this method, but also the big problem: With many images, the matrix becomes huge and manually unmanageable. This is because n×(n-1)/2 comparisons are required for n elements. With 48 images, this means 48×47/2=1128 comparisons! Without suitable software, the analysis would be a tedious task. This is the reason why the pairwise comparison is not yet widely used in photo evaluation, although it is actually the more objective method. But fortunately, we now have suitable software tools at our disposal so that we can easily use this more effective method for our needs.

And now to the confusion that Laura noticed because the display of the interim results was inadvertently activated during the ongoing voting process. (Sorry, that shouldn't have happened. Nowhere in the world are interim results shown during an ongoing electoral process to avoid influencing subsequent voters.)

As Laura noted, she was confused about not having seen all the images. This is understandable, but it is not at all necessary to present all pairs of images to all jurors. It would overwhelm them all to have to judge such a large number of image pairs.

An African proverb says: ‘How do you eat an elephant? You cut it up and eat it piece by piece." In this sense, the total number of comparisons required can easily be divided among several judges in order to reduce the workload and achieve a solid and objective assessment. For example, the comparisons can be divided evenly between the judges using the round robin method. Alternatively, block allocation can be used. The ‘random assignment’ method I have chosen is the best, but also the most labour-intensive: the pairwise comparisons are randomly assigned to the jurors so that each juror evaluates a random subset of the total comparisons. This can also be done multiple times to ensure that each comparison is evaluated by more than one judge, which increases reliability.

Without software support, all this used to be very laborious. You spent hours working on tables in which you were not allowed to make any mistakes. But thanks to suitable software, the method is now also very easy to use.

To summarise, it can be said that the method of pairwise comparison is more reliable and meaningful than the widespread scale-based votings. Nevertheless, due to the amount of work involved, it was previously not very popular and therefore not widely used in the non-professional sector. Thanks to the availability of suitable software tools, however, these obstacles no longer exist. As a result, the method can now also be used with advantage by non-professionals.

St. Johann in Tyrol
June 10, 2024
Bernhard

Addendum: This method is also ideal if we want to select just 12 images from a large number for an annual photo calendar, for example. Or if we want to find the one we want to add to an application from 50 outstanding portrait shots. Try it!