Introduction
Science stands on the process of peer review. Peer review aims to assure reasonable and fair consideration of scientific papers, for publishing in journals but also for research funding, tenure, prize awarding and so forth. Even though among journals (and institutions) there are small differences in the procedure itself, the process is based on the concept of scientific experts analyzing and scrutinizing studies in their specific field, in order to determine which ones are publishable, how revisions should be conducted, or if publication should be supported by a journal.
Such evaluation is carried out to verify if a study is well conducted, if its methodology is adequate and, obviously, if the results are significant. Since those specialists conducting the review are experienced in publishing in the field, it is reasonable to assume that the process enhances the development of specific knowledge. Nevertheless, the equality and transparency of the process is far from achieving a consensus, and questions concerning evaluation biases in the traditional peer review have been brought to attention (Giner-Sorolla, 2012; Ietto-Gillies, 2012; Kaatz, Gutierrez & Carnes, 2014; Maner, 2014) its role, and the criticisms levelled at it. An analysis of specific problems in economics leads to a full discussion of the Open Peer Review (OPR, and amongst those, bias against author's origin (Rubinstein & Brenner, 2014; Verlegh, 1999; Wolff, 1973).
There are numerous situations in which stereotyping and judgment are connected, and understanding this relation can improve evaluation processes (Kahneman, 2003; Wall, Liefeld & Hes-lop, 1991). In the present study, we focus on scientific papers evaluation, and how the stereotype of a low-prestige institution, or of a low-prestige country, can affect evaluation and judgment of a scientific paper. The literature on stereotyping in judgment and evaluation of scientific papers indicates that scientists at prestigious universities tend to have higher rates of publication and higher rates of citation, in part attributable to the prestige of their institutions (Giner-Sorolla, 2012; Ietto-Gillies, 2012; Robert C. Calfee, 2010; Wolff, 1973). Even if we consider that prestigious institutions may have better research centers, superior infrastructure, attract better scientists and that, developed countries usually have well-developed scientific incentive programs and higher investments on science, it is reasonable to think that this will have an impact on their scientific production. On the other side, it could not affect the evaluation of a scientific work, since those aspects are not solely responsible for the quality of a research -also, this kind of judgment bias could easily be defined as stereotyping.
Stereotyping is a cognitive strategy, with evolutionary origins stemming from the beginnings of the human race, to optimize mental processing -after labeling a person, an object, or a situation, it is easier for the brain to identify and react. The interesting fact about stereotyping, and the focus of the present study, is that stereotypes can be activated even when the stimulus presented is unknown, that is, when the brain makes connections with past experiences in order to develop a subsequent coherent response (Shanks et al., 2013). Even when something is shown for the first time, based on our previous experiences, we automatically develop a labeling process which will influence our present and future responses to that stimulus (Loersch & Payne, 2014).
Stereotypes arise from unconscious associations, sometimes through priming effects, which have been attracting more attention to the phenomena in the last years, not only for their potential to influence behavior, but also for their widespread presence. The literature shows that the automatic processing can lead to biases and errors in judgment (Morewedge & Kahneman, 2010) and even though we are aware that this is a possibility, it is hard, if not impossible, for individuals to control the prejudicial effects (Holroyd, 2015).
The literature on priming is extensive on how it can influence the basic processes of evaluation and judgment (Cokely & Feltz, 2009; Henderson & Wakslak, 2010; Smith & Mackie, 2014), even though the cognitive process behind it is not completely described. The standard perspective is that, once a particular stimulus is shown, it activates neural networks through a process called spreading activation -the stimulus activates pieces of information in memory that are related or associated to the content, influencing the response to the stimulus (Molden, 2014), and since the activation of those networks is not conscious, there is no participation of conscious will in the process.
It is well accepted that not only simple and basic responses can be primed, but even extremely complex cognitive processes can be affected non-consciously through priming, such as goal activation (Marien, Custers, Hassin & Aarts, 2012), observed and simulated responses from others (Smith & Mackie, 2014), and, the focus of the present study, judgment and stereotypes (Allen, Sherman & Klauer, 2010; Rubinstein & Brenner, 2014). Since scientific authors are also victims of prejudice and publication biases (Garfunkel, Ulshen, Hamrick & Lawson, 1994; Lee & Schunn, 2011; Papaioannou, Machaira & Theano, 2013) on the present study, we focus on scientific papers' evaluation, and how the stereotype of a low-prestige institution, or of a low-prestige country, can affect evaluation and judgment of a scientific paper through a priming effect. Priming is a recurrent phenomenon in social cognition, and recently, interest in how it can influence evaluation and judgment has been increasing (Chaxel, Russo & Wiggins, 2016; Doyen, Klein, Simons & Cleeremans, 2014; Mohr, Koutrakis & Kuhn, 2015). Beyond the examination of priming effects, recent advances are trending towards the possible mediators and moderators (Pickering, McLean & Krayeva, 2015; Poehlman, Dhar & Bargh, 2016), including attributions related to the origin of primed information (Loersch & Payne, 2014). The influence of priming on complex processes can be explained through cognitive biases, which can arise from different sources (Hilbert, 2012; Pleggenkuhle-Miles, Khoury, Deeds & Markoczy, 2013), including culture, (Grossmann & Jowhari, 2018) simultaneously evaluating the role of task-compliance, operationalization specificity, and cross-cultural robustness. In the original study, participants either circled first-person plural (interdependent condition, hence the need to research on and replicate effects of priming on judgment in different cultural groups, such as Brazilian academics.
In the present study, we focus on two different types of evaluation, one presumably more technical (a scientific paper) and the other more subjective and personal (a chocolate tasting), and how the stereotype of origin can affect evaluation and judgment. Our objective is to study if, through a priming effect, subjects will show a bias in evaluation and judgment on both tasks. In addition, to contribute to the growing body of evidence on the role of moderator variables in priming effects, the moderating effect of academic experience will be researched.
General Method
The present work is based on two studies using the same stimuli, words on a footnote that imply European or African origins. There is consensus in the literature that European and African origins are related, respectively, to positive and negative representations, leading to stereotyping effects in many domains (Rubinstein & Brenner, 2014). Based on a pre-test, the words chosen were Welgesteld-Tijdschrift ("wealthy magazine" in Dutch), and Kuranta-Bothata ("problematic magazine" in Setswana, a Southern African language).
The experimental design followed the same rationale, which was to subtly introduce the stimuli to the participants, then present the evaluation object (a scientific paper for experiment 1, and a piece of chocolate for experiment 2) and finally assess the biases in evaluation and judgment for the experimental groups. On both experiments, the manipulation check consisted in asking the participants if they could remember any information regarding the presented stimuli.
Experiment 1 - Chocolate Testing
Participants
For this experiment, 113 mostly (81.4 %) male graduate students (M = 24.78 years, SD = 7.02) were given an unmarked chocolate, and after tasting, asked to evaluate it using a questionnaire.
Instruments
The chocolate evaluation questionnaire (Valdeci, Bastos, Pereira, Basilio & Leite, 2012) was answered on a Likert scale, from 1 (very poor) to 5 (excellent) and one question asking if any information on the institution funding the research was remembered, as the manipulation check.
Procedures
Undergraduates were randomly assigned to three groups and conducted to a room, where they were handed out a consent form before being asked to taste and evaluate a piece of chocolate. The variable manipulated was the information presented in a footnote placed on the informed consent, regarding the institution which funded that research. The institution's names were Welgesteld-Tijdschrift (WT Condition) and Kuranta-Bothata (KB Condition) with the control group (CG) lacking a footnote.
After tasting the chocolate, the evaluation questionnaire was delivered and answered in the same room, and the participants were let off.
Results
The evaluation scores were summed and the average was considered the General Tasting Score (GTS). A one-way ANOVA indicated that the difference was significant, F (2, 112) = 5.641, MSE = .28, p = .005, ŋ2 = .06, with participants in KB Condition showing a tendency to evaluate the chocolate more negatively (M = 3.61, SD = 0.61), whereas in WT condition the evaluation was more positive (M = 4.02, SD = 0.54). The mean for the CG condition was very similar to KB condition, (M = 3.78, SD = 0.43) with no significant difference.
Paired comparisons and the confidence intervals support the initial findings (95 % CIs, WT [3.84, 4.19], KB [3.40, 3.82] and CG [3.63, 3.93]), confirming that the difference between KB (M = 3.61, SD = 0.61) and CG (M = 3.78, SD = 0.43) was not significant t (70) = 1.37, p = .173, effect size d = 0.332. Contrarily, the difference between KB (M = 3.61, SD = 0.61) and WT (M = 4.02, SD = 0.54) was significant t (75) = 3.10, p = .003, effect size d = 0.712.
Discussion
The stimuli changed the evaluation of the chocolate and even though the differences between the control group and the others were not significant, the trends were in the expected direction. Furthermore, most of the participants did not remember any information concerning the institutions (92.7 %), which strengthens the assumption that the stimuli were subtle enough.
For the second experiment, there were two conditions, WT and KB. In addition to replicating the priming effect in a different context, the aim of Experiment 2 was to research a moderator-academic experience. Presumably, more experienced academics would be less prone to biases, hence less affected by the stimulus.
Experiment 2 - Article Evaluation Task
Participants
During an academic conference in Brazil, 80 participants, mostly doctorate students (63 %) and PhDs (28.4 %), were randomly selected and averaged 5.44 years of academic experience (SD = 4.51), with no significant difference in gender (50.6 % male).
Procedures
Participants were asked to evaluate a scientific paper lacking any identification, except for a footnote regarding the funding institution - Welgesteld-Tijdschrift (WT Condition) and Kuranta-Bothata (KB Condition). After returning the article, they completed an evaluation questionnaire.
Measures
The questionnaire evaluated different aspects of the paper, such as originality, methodology and conclusions. In the first part of the questionnaire, respondents evaluated based on their opinion, and, on the second part, based on how they thought other scientists would evaluate it. The last part asked if they remembered any information on the funding institution.
The scores for each question were averaged, and named General Acceptance Index (GAI) and General Acceptance Index - Others (GAI-O). Scores ranged from -2 to +2, with -2 being the worst possible evaluation.
Results
Concerning the GAI, participants in the WT condition evaluated the article more positively than those in the in KB condition (m 's=-.42 and -.05, respectively). A one-way ANOVA confirmed that this difference was significant, F (1, 79) = 11.55, MSE = .39, p = .001, ŋ2 = .13. For the GAI-O, the same pattern was found, with WT condition (M=.29) being significantly different from KB (M=-.14); F (1, 79) = 4.49, MSE = .14, p = .037, ŋ2 = .05, effect size d = 0.459.
Academic experience was measured in years, self-reported in the questionnaire, and a stepwise regression analysis was used to examine it as a moderator of the relation between priming and the GAI. Presumptively, as the academic experience increases, a confirmatory bias gets stronger, repeating the well-known discriminatory behavior towards European and African origin (de Bruin, Treccani & Della Sala, 2015; Rubinstein & Brenner, 2014) we estimate the effect of a Sephardic sounding surname on wages. We first compare the wages of Israeli Jewish males born to Sephardic fathers and Ashkenazi mothers (SA).
The slope of the regression lines is consistent with academic experience being a moderator, with the regression line for the WT showing a positive slope, while the line for KB Condition showed a negative slope. A multiple regression was conducted, in which the interaction term was inserted, and the result was significant, F (3.76) = 3.71, p < .05, indicating that the model was a good predictor of the GAI, even though the total variance explained was relatively low, R 2=0.12, which means that even though the model is adequate, there are other variables that need to be also studied.
Discussion
Results indicated a consistent difference in evaluation and judgment between the conditions; the footnote appeared to have operated as a prime, and triggered a biased evaluation in both groups, even though it was subtle enough not to be remembered.
As for the moderating effect of academic experience, the total variance explained was small, but the slope lines suggest that as the years of academic experience grew, the differences on the evaluation given to the article were more expressive, which may be an indication that the bvias effect was stronger, with participants on WT condition giving better evaluations and those on KB condition giving worse ones. Relatively small effects of priming have often been reported, indicating that the effect itself is subtle enough to make it hard to detect (Cesario, 2014).
General Discussion
On both experiments the priming stimuli were sufficient to affect evaluation and judgment, especially considering that over 90 % of the participants were unable to remember it. The chocolate evaluation (subjective and non-technical) and the article evaluation (more technical) were similarly affected, which strengthens the case for the reliability of this effect.
Individual variables could have affected both experiments, such as personal taste in experiment 1 and experience as a reviewer in experiment 2. Both limitations should be addressed in future work. Nonetheless, the change of a mere two words, if these words resemble the origin of the entity being judged, had an effect.
The study of how moderators are related to the effects found would certainly provide important evidence regarding how automatic biases can be primed, hence forth affecting peer reviewing, judgment and evaluation.
Considering the central role of peer reviewing in science, and adding up to the growing literature on publishing biases, prejudice and questionable practices (Button, Bal, Clark & Shipley, 2016; de Bruin et al., 2015; Koole & Lakens, 2012; Wagenmakers, Dutilh & Sarafoglou, 2018) further work must continue to address those issues, with different and perhaps larger populations, in an effort to understand the automatic biases and its mechanisms, and how those effects could influence behavior, and the publishing field as well.
Specifically, regarding the peer review system, the results do not prove that there is an established bias, but it indicates that a subtly presented stimulus (such as two words on a footnote) can trigger a priming process that alters the perception of the subject, hence affecting its judgment towards an object. If we consider that almost all participants (92,3 %) did not remember any information concerning the funding institution, even though the priming stimulus was subtle enough not to be remembered, it did affect their judgment.
In addition, Kuranta-Bothata and Welgesteld-Tijdschrift are not even real as research funding institutions, which indicates that not only the subjects were affected by the priming stimulus, but also that they were able to link those words, possibly through spreading activation, to contents that already existed in their cognition. The sounding-like effect of the words "Kuranta- Bothata" and 'Welgesteld-Tijdschrift" was sufficient to activate pre-existing concepts, consequently affecting the perception and judgment of the paper.
Even though the subjects were not all reviewers, all of them had experience in evaluating papers, since it is part of their daily activities, as either post-graduate students or faculty members. In addition, recent studies relating self-regulatory dynamics, construal level (Fujita & Trope, 2014) and constraint and affection (Schroder & Thagard, 2014) bring forth new possibilities of more robust explanations for the mechanism of priming itself. Also, the study of how moderators are related to the effects found (Wheeler, Petty & Al, 2014) would certainly bring important evidences on how priming works.
The literature on implicit biases have been deepening the understanding on those kind of effects (Malouff & Thorsteinsson, 2016) but further work must address, in different and perhaps larger populations, the limitations early presented, in an effort to understand the effect and its mechanisms.