Chapter 6 The ideas of combination in psychology
6.2 Mechanical combination of individual judgments
The review of mechanical combination of individual judgments was expanded in detail base on Clemen (1989). The topics of subjective group consensus are innate related to the nature of combination. Investigating them may bring abundant reward of appreciation and understanding the application of combination ideas, but for the limit of space, the topics will be skipped. All the articles reviewed here combine individual judgments by mechanical, or mathematical way. Therefore, this part is named as
‘Mechanical combination of individual judgments’.
One of the branches of mechanical combination of individual judgments is
“concocted group judgments” which is coined for referring the group judgments that is
formed by calculation rather than by subjective discussion within a group.
6.2.1 Does beauty have a standard?
Are beautiful things really beautiful, or do we only think them so, because they give us a certain kind of pleasurable feeling, feeling which we have been taught to call 'disinterested,' 'immediate,' 'universal,' etc.? (Gordon 1923, 36)
Psychologists are motivated to investigate concocted group consensus by their curiosity about the process of group consensus formation and its property.
Gordon adopted the following method: first, she formed concocted group orders.
“For example a certain rug [the material used] would be given first place by one person, third place by another, twelfth place by another, etc. These numbers were added, and the rug which had the smallest resulting number was assigned first place in the composite order of merit. The rug which had the largest resulting number was called last […] in the composite order” (39). Second, she compared agreement of group with group. She sequentially divided total subjects into, for example, 20, 4 and 2 groups. The group order derived from each group were compared with other group orders. For example, if there were 200 subjects, Gordon may divided them into 4 groups and produce 4 concocted group orders. Comparing the group orders with each other produce 6 correlation coefficients.
Results shows group agree with group. For example, the average correlation coefficient is 0.76 if there is 20 persons a group; 0.87 if 50 persons; and 0.93 if 100 persons. The results is quite exciting. If one group’s judgment can be accepted as standard, then we may randomly collect some people’s judgments and form a group judgment that is quite similar to the standard.
The question is “[t]he group may agree but are they therefore right”
(Gordon 1924, 398). Gordon was then inspired to investigate the accuracy problem by using weight as a standard. She asked subjects to assign orders on glass bottles from the lightest to the heaviest. Next, she formed the concocted groups of 5 persons, 10 persons, etc. each time and derived the composite order. The result was exciting, while there are only 5 persons in a group, the average coefficient of correlation between composite order and the true order is just 0.68. But while she put more members in a group, the composite order is approximate to the true order. The average coefficient of correlation
for 10 persons is 0.79; for 20 persons is 0.86; and for 50 persons is 0.94. Gordon found that the estimated order gradually approximately to the true order as more individual’s rankings are pooling. Notably, the average coefficient of correlation between individual’s order and the true order is only 0.41, the results of the group are all superior to the results of the average single individuals. This may indicate the advantage of gathering information from people.
6.2.1.1 Critique to Gordon and to forecast combination
However, the upshot of Gordon’s work was eventually shown to be a statistical result rather than psychological outcome in perspective of test theory. Stroop (1932) argued the belief on the good quality of group judgment was merely a statistical consequence of aggregating data. Gordon’s experiment setting accidentally satisfied two required assumptions of Spearman-Brown formula. First, the standard deviations of each ranking are equal. Second, the correlations of all pairs of rankings are equal. The first requirement is satisfied because the components of each ranking is changelessly number one to ten. And because Gordon formed grouped ranking with random-picked individual rankings, by this system of average, the correlations of pairs of grouped rankings are approximately equal. And therefore what demonstrate by Gordon just match to the prediction that was produced by applying Spearman-Brown formula on Gordon’s results (Kelley 1925). Similarly, some argues the upshot of forecast combination is just a statistical result as well. Manski (2011) noted that the loss of mean forecast shall certainly small or equal to the average loss of individual forecast by Jenson’s inequality with convex loss function (Exhibit 10).
Exhibit 10 An illusion of the statistical upshot of forecast combination
6.2.1.2 Compare concocted group consensus with forecast combination
Gordon (1924)’s paper, to my knowledge, is the first paper focused on the accuracy of the combining estimations and thus a first paper employs similar concept in forecast combination in the field of psychology. But the procedures are totally different from forecast combination. First, concocted group consensus measures accuracy not by forecasting errors or other measurements used in forecast combination, it utilized Spearman's rank correlation coefficient as an evaluating tool,
ൌ ͳ െ σ ݀ଶ
݊ଷെ ݊ǡ
where in the experiment, order of the boxes weights have to be assigned. ݀ is the difference between the individual’s order and the true order.
Second, concocted group consensus used a different way to combine estimates.
For example, to make a concocted group including 5 persons, one have to (1) average the positions assigned to each weight by the first 5 persons and (2) rearrange the
‘concocted’ order, (3) continually repeating the concocted process itself to the next 5 persons till all members are assigned concocted group. The combining process is totally different to the simple average or weighted average used in forecast combination. An application of pooling idea independently developed by psychologists.
6.2.2 Combining multiple judgments
On the other front, psychologists manipulate regression techniques to combine clinical judgments. A brief review had already provided by Clemen (1989). Nowadays, the concern about human judgment in psychology has already became elements that can not be neglected in economic forecasting (Kuo and Liang 2004). To my knowledge, there was at least three papers published in International Journal of Forecasting, inheriting the tradition which has been initiated since Meehl (1954) and using forecast combinations as a main approach to explore their topics. Fischer and Harvey (1999) used techniques in group consensus giving feedback to judges to improve judges’
performance. Results showed though the reception of the predicted outcome is helpful, judges’ performance are still, in consistence with previous studies, inferior to that of simple averaging. Provision of mean absolute percentage errors which was updated each
period to forecasters help further their performance but inconsistent is still a fatal wound of human being. More recently, Jørgensen (2007) reviewed empirical studies comparing the accuracy of expert judgments, model and the combination of the two.
Among all sixteen relevant studies, ten of them indicates a higher average accuracy of expert judgments over that of model. The paper came with (Hogarth (2007))’s comment in the same periodical as well.