Alcohol Cues and their Effects on Sexually Aggressive Thoughts

Companion Piece: Abma (2020) Experiment and Fail Priming studies, mostly found in the subdiscipline of social psychology, have been the subject of vigorous debates among methodologists, philosophers of science, and priming researchers themselves. This article contributes to the debate about priming studies by carefully examining and dissecting one priming study in particular, namely the 2020 article "Alcohol Cues and their Effects on Sexually Aggressive Thoughts" by Julie Leboeuf, Stine LindenAndersen, and Jonathan Carriere. By pointing out the flaws of this supposed reproduction study, I reflect on the various levels of complexity that are involved in conducting priming experiments with human subjects. I conclude that the call for more reproductions or replications is worthwhile, but only if the original experiments are solid and theoretically interesting. Abstract Alcohol and its effects on aggression have been the subject of many discussions and research papers. Despite this fact, there is still a debate surrounding what it is exactly about alcohol that causes aggression. The current study sought to replicate the past finding by Bartholow & Heinz (2006), that alcohol cues without consumption increase the accessibility of aggressive thoughts, which can then influence aggressive behaviors. In the present study, participants had to complete a lexical decision task that was set up to assess whether aggressive words were detected faster in the presence of alcohol-related pictures compared to neutral pictures. The results of this study did not replicate the expected finding as only a main effect of word type was found in which participants detected neutral words faster than aggressive words. Furthermore, the study aimed to assess the role of gender stereotype acceptance levels in this association, but due to faulty design considerations, such analyses were not possible. The results are discussed in terms of the limitations of the study, and propositions for future directions are addressed.


Introduction
According to the World Health Organization (2018), alcohol abuse results in three million deaths worldwide every year. Wells et al. (2000) found that of all the physical aggression incidents reported by the participants, 67.9% happened while someone involved in the confrontation had been drinking. Similarly, severe alcohol intoxication is believed to play a role in almost half of the sexual aggression cases worldwide (Testa, 2002). Furthermore, Field et al. (2004) found that among all the factors under investigation, what appeared to be the strongest predictor of intimate partner violence was the expectation of aggressive behavior following alcohol consumption. Although the pharmacological effects of alcohol have been well researched (Chermack and Taylor, 1995;Giancola, 2000;Heinz et al., 2011), there is less information available with regards to other factors, such as cognition and expectancies, that can lead to aggression in alcohol-related contexts. Bartholow and Heinz (2006) as well as Subra et al. (2010) researched whether simple exposure to alcohol-related cues unconsciously increases the availability of aggressive thoughts, thus increasing the possibility of subsequent aggressive behaviors. In both studies, the researchers found that participants made faster lexical decisions when aggression-related words were paired with alcohol-related pictures compared 10 E L .
with neutral primes. The first goal of the present study was to replicate the finding that alcohol cues increase the accessibility of aggressive thoughts and expand them to consider specific types of aggressive thoughts, such as those of a sexual nature. The second aim was to assess the potential role of gender stereotypes in the association between alcohol-cues and aggressive thoughts.
Before exploring more thoroughly the body of research pertaining to automatic aggressive cognition associated with a sheer exposure to alcohol cues, it is worthwhile to mention that similar studies have examined how other stimuli can generate aggressive thoughts. In 1998, Anderson, Benjamin Jr and Bartholow reported that simple identification of weapon primes was linked to an increase in aggressive thoughts. In this study, participants named aggressive words faster compared with nonaggressive words when presented with a weapon prime. The authors argued that this increase resulted from the weapon stimuli automatically priming aggression-related thoughts (Anderson et al., 1998). More specifically, Anderson and colleagues referred to the semantic network model of memory to support their claim. This theory posits that words or concepts that are similar in meaning or that repeatedly co-occur are activated simultaneously in the semantic memory and therefore develop strong associations. This model goes as far as to propose that this increase in aggressive thoughts subsequently increases the likelihood that these thoughts will affect behavior (Bartholow and Heinz, 2006). From this model, it would be hypothesized that two concepts associated together, such as weapons and aggression, could develop such a strong association that this association would then increase the probability of someone behaving aggressively.
While weapons have been reported to be linked with an increase in aggression-related thoughts, other elements from the environment that could similarly influence levels of aggressive thoughts, such as alcohol, should also be considered. Although it is generally accepted that alcohol increases aggression, there is still a debate as to what precisely causes or explains this increase (Bartholow and Heinz, 2006), and it is usually best explained through a combination of multiple theories and viewpoints (see Heinz et al. (2011) for a comprehensive review). Some leading theories include the physiological disinhibition hypothesis, the indirect cause hypothesis, and the expectancy hypothesis (Bushman, 2002). Briefly stated, the physiological disinhibition hypothesis states that alcohol intake increases the levels of aggression by anesthetizing the part of our brain that usually keeps our aggressive impulses under control, making people more likely to express aggressive behaviors (Bushman, 2002). Following a similar line of reasoning, the indirect cause explanation proposes that alcohol consumption might increase aggression tendencies by producing changes at the cognitive and emotional levels, such as by affecting intellectual functioning and reducing self-awareness, which then increases the likelihood of aggressive acts being committed (Bushman, 2002). As for the expectancy hypothesis, it holds that alcohol is linked with aggression because people expect it to be that way (Bushman, 2002). This presumed effect/expectancy hypothesis suggests that people tend to associate aggression and alcohol, even if only unconsciously, which potentially accounts for one of the ways in which alcohol intake is linked with aggression ( Bartholow and Heinz, 2006). The problem with this hypothesis is that evidence supporting it mostly comes from placebo designs in which participants were led to believe they had consumed alcohol. Thus, it is unclear whether a belief that alcohol consumption has occurred is necessary for this unconscious association to be activated, or whether the presence of alcohol cues alone can increase the accessibility of aggressive thoughts (Bartholow and Heinz, 2006).
To address this methodological limitation, Bartholow and Heinz (2006) conducted a study in which they examined the extent to which alcohol cues without consumption or belief that alcohol had been consumed (i.e., placebo effect) could increase the accessibility of aggressive thoughts. They tested 121 undergraduate students and had them participate in a lexical decision task.
The participants were first primed with a stimulus, were then shown a string of letters, and had to decide whether the letter string presented to them was a showing a link between weapon exposure and aggressive thoughts obtained by Anderson et al. (1998). Results showed that participants identified aggressionrelated words faster when exposed to aggression-related pictures compared with neutral images, suggesting that alcohol and aggression and linked together in semantic memory.
In 2010, Subra and colleagues replicated the study by Bartholow and Heinz (2006) with a different population as well as a slight methodological alteration (Subra et al., 2010). First, their study was conducted with a French-speaking population as opposed to an English-speaking one, and the sample was not restricted to undergraduate students. Subra and colleagues used the same stimuli as those used by the authors of the 2006 study, with the exception that they selected different images for the neutral condition. They criticized the ones used previously by Bartholow and Heinz (2006) on the basis that their nature (i.e. plants) was not neutral since nature scenes have been shown to decrease aggressive thoughts (Kuo andSullivan, 2001, Kuo andSullivan, 2001). Instead, they used photos displaying non-alcoholic beverages for their neutral condition. Similarly to the study under replication, participants made faster lexical decisions about aggression-related words when exposed to alcohol or weapon primes compared with neutral primes, again supporting the idea that alcohol and aggression might hold strong associations in semantic memory.
The two studies by Bartholow and Heinz (2006) and Subra et al. (2010) suggest that exposure to alcohol cues without consumption is linked with an increase in aggression-related thoughts. Interestingly, the target words that were used for the aggression category were mainly of a physical nature (e.g., punch, assault, murder) and did not specifically assess sexual violence. It would be worth investigating whether this increase in thoughts of an aggressive nature also extends to sexual violence, especially considering that the World Health Organization (n.d.) listed the use of alcohol or drugs as a factor that increases the risk of men committing rape. Additionally, past research has shown that alcohol priming without consumption can increase sexual expectancies. Friedman et al. (2005) reported that men who were exposed to subliminal alcohol-related words rated women as being more sexually attractive, and that this effect was more precisely caused by the sexual expectancies associated with alcohol intake.

E 11
that men who were led to believe they had consumed alcohol, regardless of whether this was actually the case, experienced more sexual arousal than men in the control group. This becomes problematic when considering the finding by another group of researchers, that induced sexual arousal increased participants' expectations of their own sexual aggressiveness (Loewenstein et al., 1997). Overall, alcohol cues seem to be linked with increases in both sexual expectancies and sexual arousal, the latter being further associated with elevated expectations of sexual aggressiveness. Therefore, it is of paramount importance to better understand the relationship between alcohol cues and sexually aggressive thoughts.
Another element that was not assessed in the 2006 and 2010 studies is acceptance of gender stereotypes. In one study, Abbey (2002) reported that traditional gender role beliefs about dating and sexuality could be at play in explaining the occurrence of sexual assaults, irrespective of whether alcohol was involved or not. More precisely, endorsing beliefs such as interpreting a woman's refusal as an invitation to be convinced, or that forced sex is sometimes acceptable, was a proposed factor that could increase the likelihood of sexual assaults being committed. While recognizing that this is a good first step into assessing the role of gender stereotypes acceptance with regards to sexual violence, more general gender role beliefs should be investigated to truly understand the role that gender stereotypes play in sexual violence occurrences.
To this effect, Locke and Mahalik (2005) reported that men who showed a problematic use of alcohol and who conformed to such masculine norms as having power over women and being violent were more inclined to score higher on the Rape Myth Acceptance Scale and to report more sexually aggressive behaviors. The first study mentioned here only looked at gender stereotypes with regards to sexuality and dating (Abbey, 2002), and although the second study by Locke and Mahalik (2005) might seem to assess gender stereotypes more globally, it appears biased towards specific stereotypes with regards to power dynamics between men and women. In other words, the scales used in the above-mentioned studies seem to be assessing sexual harassment beliefs and attitudes more than gender stereotypes. Another problem with the literature on gender stereotype beliefs and its relationship to violence is that the vast majority of the articles available only report on the likelihood of men committing sexually violent behaviors, or on the relationship between men's level of gender stereotypes and the perpetration of violent acts, thus leaving women out of the picture (Jakupcak et al., 2002;Gidycz et al., 2007;Abbey et al., 2003). There is a clear need to investigate the relationship between gender stereotype beliefs and sexual violence as it relates not only to men, but to women as well.

Objective and Hypotheses
The objective of this project is to further the line of research on the effects of alcohol cues on aggressive behaviors by testing this relationship specifically with sexually aggressive words and by taking into account gender stereotype beliefs. More precisely, the aggressive words used by Bartholow and Heinz (2006) will be replaced by aggressive words of a sexual nature. The comparison point will be against the two studies that have investigated this before and for which significant results have been reported, namely the one by Bartholow and Heinz (2006) and the one by Subra et al. (2010). Gender stereotypes acceptance will be evaluated through the German Extended Personal Attributes Questionnaire (Runge et al., 1981).
Following from the findings suggesting that alcohol cues increase both sexual expectancies (Friedman et al., 2005) and aggressive thoughts (Bartholow and Heinz, 2006), it is hypothesized that participants will make faster lexical decisions to aggressive words of a sexual nature when paired with alcoholrelated primes compared with neutral primes. A similar effect should also be found for the weapon-related primes. Furthermore, it is suggested that for both men and women this association will likely differ depending on one's level of gender stereotypes acceptance. Indeed, results are expected to show a non-significant association at low levels of gender stereotypes beliefs, a moderate interaction at medium levels of gender stereotype beliefs, and a dramatically significant interaction at high levels of gender stereotype beliefs.
Finally, for replication purposes, it is expected that participants will be slightly more accurate at identifying neutral words compared with aggressive words.

Participants
Sixty participants took part in this study, but two of them were excluded from the analyses, giving a final sample size of 58. One participant was excluded because the experiment failed before the data could be recorded, and the other was excluded because it was clear from the debriefing session that this participant had not understood the computer task properly. Furthermore, this participant's accuracy rate was only 68.54% compared with a mean accuracy rate of 95.08% (SD = 4.62) for the sample. On this distribution, a score of 68.54% represents a z-score of -5.74, providing sufficient grounds for excluding the data from the analyses.
From the remaining participants, 49 self-identified as women (84.5% ) and nine as men (15.5% ). No one expressed a mismatch between sex at birth and gender self-identification, and no one selected the 'other' option for their self-identified gender. The age of the participants ranged between 18 years old and 46 years old (M = 21.64, SD = 7.71) and they were all enrolled at Bishop's University. Please refer to Appendix A for a complete list of the different programs represented in this sample.

Stimuli and Task
Questionnaires.
In this experiment two different questionnaires were used: a short demographic

questionnaire (see Appendix B) and the German Extended Personal Attributes
Questionnaire, a scale evaluating gender stereotype acceptance and beliefs (Runge et al., 1981). This questionnaire includes two subscales both comprising eight items, namely "expressivity" and "instrumentality" , which are intended to measure the degree to which someone can be classified according to stereotypically masculine (i.e., instrumentality subscale) or stereotypically feminine (i.e., expressivity subscale) adjectives. In its original form, this questionnaire was designed to assess self-ascribed masculinity or femininity, but for the purpose of this study, it was modified to assess one's view and degree of endorsement of gender stereotypes in general (see Appendix C).

Task
Participants completed a lexical decision task in which they had to decide whether a string of letters presented to them was a legitimate English word.
Prime stimuli consisted of 15 photos: five containing alcohol bottles, five portraying weapons, and five showing non-alcoholic beverages (see Figure 1 for sample pictures). Target words were also divided into three categories, each containing 15 words (see Appendix D for the complete list): aggression-related words of a sexual nature (e.g., grope, rape), neutral words (e.g., observe, vanish), and nonword letter strings (e.g., wenct, jork). Each photo was paired with 3 aggressive words, 3 non-aggressive words, and 3 nonword letter strings for a total of 135 trials. For each trial, an image was presented for 300ms, followed by a 200ms interval prior to the showing of the target word, which stayed on the screen until the participants responded or up to 3 seconds. Participants had to indicate by pressing on a key whether the letters formed a legitimate English word or not. An interval of 3 seconds separated each trial.

Procedures
Participants were asked to come to the Psychological Health and Well-Being lab on Bishop's University campus for one session lasting between 30 minutes and 45 minutes. Following the procedure by Bartholow and Heinz (2006), partial disclosure was used in that participants were told that the goal of the ex-periment was to measure the speed of language comprehension in the presence of distractive information (i.e., pictures). Once they had agreed to participate in the study, participants were asked to complete two different paper questionnaires, including a short demographic questionnaire and a questionnaire pertaining to gender stereotype acceptance levels, as mentioned earlier. Next, participants were asked to complete the main task of the study, namely the computer-based lexical decision task described above. After completion of the lexical decision task, debriefing took place: participants were informed of the reasons justifying the use of partial disclosure, and a new consent form was presented to them.

Results
Following the procedure used by Bartholow and Heinz (2006), trials on which the participants' response times (RTs) after the onset of the target word were smaller than 150ms or greater than 1,500ms were deleted and excluded from analyses (2.68% of all trials). Furthermore, response times to nonwords were not included in the analyses because they were only used in the study for methodological reasons and do not have any bearing on the present hypotheses being tested (Bartholow and Heinz, 2006). The data from the demographic questionnaire was not used in the analyses for different reasons. First, since there was no discrepancy between gender self-identification and sex at birth, no comparison was possible in this case. Next, given that there were only nine men in the study, the sample size was not large enough to run analyses based on gender differences. Finally, because every participant was enrolled at Bishop's University, no comparison could have been done, and field of study was not used either considering that most participants were enrolled in psychology.

Gender Stereotypes Acceptance Levels
It is likely that the original questionnaire and the way in which it should be coded were manipulated to the extent that the results were uninterpretable. For this reason, the results of the German Extended Personal Attributes Questionnaire are not reported. The associated design-related issues are further explored in the Limitations section. 1

Response Times
Only the correct-response trials were kept for the response times analyses. That is, the trials where participants misidentified a nonword for a word, and viceversa, were excluded from the present analyses (5.18% of the remaining trials).
The mean response time values did not show any skewness and did not need to be corrected through a log transformation, contrary to what the two groups of authors had previously found (Bartholow and Heinz, 2006;Subra et al., 2010).

Accuracy
The analyses performed for the accuracy levels are identical to those performed for reaction times, except that the trials on which the participants made a wrong decision were not excluded. Mean accuracy values did not show any kurtosis and did not need to be corrected through an arcsine transformation, contrary to the main study being replicated (Bartholow and Heinz, 2006). The overall accuracy level was 95.08% (SD = 4.62), but this variable was further analyzed and broken down through a 3 (prime type: weapon-related pictures, alcohol-related pictures, neutral pictures) x 2 (target word type: aggressionrelated words, neutral words) repeated measures ANOVA. Here again, the sphericity assumption was not violated. Replicating the results by Bartholow and Heinz (2006) and Subra et al. (2010), the main effect of target word type was significant, F (1, 55) = 45.591, p < .001. More precisely, neutral words were detected with more accuracy (M = .98, SE = .003) than aggressive words (M = .919, SE = .010). All the other main effects were not statistically significant and neither were the interactions, which supports the hypothesis. The main effect of prime type was not significant (F (2, 110) = 0.101, p = .904), and the interaction between prime type and target word type also did not reach the significance level (F (2, 110) = 0.088, p = .916). Therefore, the results obtained in this study are not biased by a speed-accuracy trade-off.

Discussion
Contrary to what Bartholow and Heinz (2006) and Subra et al. (2010) had previously reported, the results obtained in this study do not support the hypothesis that alcohol cues increase the accessibility of aggressive thoughts.
Whereas the previous research had found that aggressive words were detected faster when preceded by alcohol or weapon pictures compared with neutral pictures, and that aggressive words were detected faster than neutral words, none of those results were replicated in the present study. Indeed, there was no significant difference in the speed of detection of aggressive words across the different types of pictures, and neutral words, as opposed to aggressive ones, were recognized faster by the participants. However, the minor finding by both groups of authors that neutral words were detected with more accuracy was replicated in this study. Finally, the unique hypothesis that was added to this study with regards to gender stereotype acceptance levels and their expected influence on response times could not be properly tested due to conceptual flaws. It is possible that major methodological and design-related limitations played a central role in this failure to replicate. Alternatively, on a conceptual level, there is perhaps a fundamental difference between "physically aggressive thoughts" and "sexually aggressive thoughts" , which could be reflected in the types of stimuli that elicit them. The methodological limitations and conceptual alternative will be addressed in turn.

Limitations
The sample of participants in this study was problematic on many levels. First, when dealing with response times and differences in terms of milliseconds, it usually takes a large sample size to maximize statistical power. A sample size of 58 participants was probably not large enough to optimize statistical power, especially when compared to the 121 participants that were recruited by Bartholow and Heinz (2006). The main author of this study initially aimed at recruiting 120 participants to mirror the number of participants tested in the project under replication, but did not achieve this goal due to time constraints.
A stronger conceptual limitation, however, is that no power analysis was done prior to testing to determine the ideal sample size. Therefore, it is difficult to know how many participants would have been needed to optimize the likelihood of detecting an effect. In the future, it would be important to conduct power analyses to determine the optimal sample size, and to ensure that enough participants are recruited in time to meet that standard.
The participants in this study also greatly differed compared with those recruited by Bartholow and Heinz (2006). Indeed, whereas they had recruited 60 men and 61 women, the present study included 49 women and 9 men.
This difference in men-women proportion is problematic considering that past studies reporting a link between drinking and aggression have focused on the perpetration of aggressive acts by men, and not women (Jakupcak et al., 2002;Gidycz et al., 2007;Abbey et al., 2003;Locke and Mahalik, 2005). It is therefore plausible that a greater proportion of men in the current study would have led to different results. Thus, further replication attempts should ensure an equal representation of men and women in the sample.
A further limitation of this study pertains to the word choice for the aggressive target word category. Finding salient aggressive words of a sexual nature proved to be challenging for many reasons. One of them is that most of the words that could be found through internet searches were expressions made up of more than one word and could not be used as word length has an impact on reaction times (Bartholow and Heinz, 2006). Additionally, the line between sexually aggressive words and sexual preferences is somewhat blurry (e.g., sodomy, choke). Therefore, some of the words used might not 14 E L .
have been associated with violence for certain participants but instead with pleasure. Finally, the face value of some of the words in the aggressive category was doubtful, the best example being the use of the word 'prey'. During the debriefing session, some participants reported finding it difficult and confusing to assess the letter strings when they were preceded by a photo on which there was some writing. Ensuring that the priming stimuli do not contain any confusing information such as writing is therefore an important consideration for future replications.
Another point that was brought forward by some participants is that the neutral pictures may have been more effective if they had not been beverages.
The argument is that it becomes too obvious that the study is researching the effects of alcohol as it is contrasted with non-alcoholic beverages, despite the fact that weapon pictures are also included. As mentioned earlier, the reason why Subra et al. (2010) chose non-alcoholic beverages pictures was to address the limitation to Bartholow and Heinz' study (Bartholow and Heinz, 2006).
The conclusion from this is that neither plants nor non-alcoholic beverages are an optimal neutral category for this study and the ideal neutral prime has yet to be found. It would be recommended to conduct a norming study prior to the experiment to create a list of possible neutral visual stimuli. It is also possible that the issue does not reside in the nature of the neutral stimuli per se, but instead that not enough neutral stimuli were included, making the goal of the study obvious. Adding filler trials with random sets of pictures and words could help blind the participants to the purpose of the study.
Importantly, all these limitations do not offer a justification as to why neutral words were detected faster than aggressive words, contradicting the results obtained in both previous studies. A plausible explanation has to do with the timing of testing, which occurred close to the end of the academic semester at Bishop's University. Indeed, 'conventional wisdom' among researchers is that the quality of participants most likely declines towards the end of the semester (Ebersole et al., 2016). The timing of testing could likely account not only for the low number of participants, but also for a poor quality of answers that further leads to poor results. However, Ebersole and colleagues (Ebersole et al., 2016) tested this hypothesis and found little evidence in its support. Although participants reported declining effort and attentional levels as well as an increase in stress as the semester unfolded, the researchers only found a weak and negligible effect of time of semester on task performance.
Therefore, although it is plausible that testing participants towards the end of the semester had a negative impact on the quality of the results, it cannot fully explain the failure to replicate the main findings obtained by Bartholow and Heinz (2006). Nonetheless, a solution to this problem would be to test participants throughout the semester, and to include a broader sample outside of the university population so that time of semester cannot be a problematic variable.
As was made clear in the introduction, there is a great need to investigate the relationship between the endorsement of gender stereotypes and acts of sexual violence as it relates to both men and women. However, the execution of the hypothesis in the current study was done inadequately, rendering the results uninterpretable. The German Extended Personal Attributes Questionnaire was used because it was the one most closely related to the hypothesis under investigation that was available to the author at that time. However, this questionnaire was originally designed to measure self-ascribed masculinity or femininity, and was thus modified to instead assess a general endorsement of gender stereotypes. What is problematic, is that the way in which this was done was completely arbitrary. In the original form of the questionnaire, participants are presented with 16 items representing masculinity and femininity and have to indicate where they think they fall on the scale with regards to each item (Runge et al., 1981). In this study, participants were asked to indicate to what extent they believe that the 16 characteristics are representative of men in general, and were then asked to fill out the questionnaire a second time, but this time by indicating to what extent they believe that the said characteristics are generally representative of women. The validity of those changes was not tested prior to the main experiment and was not based on any empirical evidence. This manipulation was so great that a new method of coding was necessary, which was also based on arbitrary grounds. A score of 3 (middle of a 5-point semantic differential scale) was given a value of 0, scores of 2 or 4 were given a value of 1, and scores of 4 or 5 (extremes of the scale) were given a value of 2. There is no reason to believe that this coding method was valid. Furthermore, participants were divided into 3 equal groups using the visual binning option in SPSS, reflecting their level of gender stereotypes acceptance: low, medium, high. Again, the decision to divide participants into three equal groups instead of using pre-established cut-offs did not have empirical support. A further problem with this questionnaire was a lack of variability in the answers as most participants tended to 'sit on the fence', i.e., selecting a score of 3. Some items on the questionnaire are obvious at face value which may have made the participants answer in a socially desirable way, but there is no way of knowing whether social desirability was at play or if the results truly reflect the participants' beliefs. Including a social desirability scale or other unrelated questions might have solved this problem, as well as using a 6-point Likert-type scale. Alternatively, the questionnaire could have been done online as to measure reaction time to stereotype-congruent and stereotype-incongruent adjectives. Nonetheless, the bigger question remains as to whether the questionnaire was a valid and reliable measure to begin with.
Two major lessons can be learned from this. First, when designing a study, methodological decisions should be empirically-based as opposed to arbitrary.
Second, it is critical to ensure the validity of the measures to be used before conducting the main experiment. Future studies looking at the relationship between gender stereotypes and sexual violence would greatly benefit from first designing a sound psychometric questionnaire assessing general levels of gender stereotype endorsement.
View interactive version here.

Future Directions
Replication attempts should target a larger and more representative sample, much like Subra et al. (2010) did, to ensure that the results not only represent the student body accurately, but that they can also be generalized to the population as a whole. Although the goal of this experiment was to study aggressive behaviors in university students, it is still important to look at the bigger picture and remember that undergraduate samples do not represent the general population. What seems to be most problematic here is the lack of male representation and the overrepresentation of psychology students. The composition of this sample is not representative of the general university population, even less so of the overall population, and although it might not entirely explain the failure to replicate past findings, it is important to keep this issue in mind when designing subsequent research protocols. Understandably, a sample of 22-year-olds who mostly major in psychology might not be optimal when studying abusive relationships with alcohol.
Other than methodological considerations, attention should also be paid to conceptual alternatives. As mentioned earlier, "physically aggressive thoughts" and "sexually aggressive thoughts" might be fundamentally different. To this point, White et al. (2008) found that among over 10 correlates of physical and sexual aggression, the only two that distinguished perpetrators of physical aggression from perpetrators of sexual aggression were motives for sex. More precisely, the participants in the sexual aggression only group scored significantly higher than those in the physical aggression only group on the hedonism and dominance motives for sex (White et al., 2008). However, it is not clear from this study whether the participants had committed their aggressive acts under the influence of alcohol or not. Motive for sex is only one possible factor differentiating sexual aggression from physical aggression: multiple studies have also investigated the predictors of sexual and physical aggression and victimization James and Young, 2013;Gidycz et al., 2007;Felson and Burchfield, 2004). However, there is no clear pattern emerging from the literature asserting whether there is a fundamental difference between physically and sexually aggressive thoughts, and if so, how it is characterized.
An interesting avenue to resolve this question would be to measure whether there are brain activity differences when participants are processing scenarios involving sexual compared with physical aggression.
Furthermore, future studies would benefit from inquiring about participants' nationalities to explore whether the patterns of responses differ across countries. This idea emerged during the debriefing sessions as multiple participants reported coming from Europe and being raised with an open-minded attitude towards alcohol, further saying that sexual violence was not necessarily associated with alcohol for them. The study by Subra et al. (2010) took place in France, but it was investigating the link between alcohol and physical violence as opposed to sexual violence. Therefore, it would be important to study participants representing a range of different nationalities and look at whether differences emerge in the pattern of responses when it comes to alcohol and sexual violence. Taking it a step further, it would be even more informative to include both sexually aggressive words and physically aggressive words in the same study to explore more directly the possible differences in the relationships between alcohol priming and various types of aggressive thoughts across nationalities.
Another avenue that would be worth investigating is the relationship between aggressive thoughts and subsequent behavior. The studies by Bartholow and Heinz (2006) and Subra et al. (2010) point to the idea that alcohol-related cues can increase the accessibility of aggressive thoughts. How does this translate into behavior, if it does so at all? As discussed earlier, the semantic network model of memory posits that concepts that are similar in meaning or that co-occur are activated concurrently in the semantic memory and develop strong associations (Anderson et al., 1998). Therefore, when one concept is activated, it produces a spreading activation process through which other related concepts become more accessible, ultimately increasing the likelihood that those concepts will have an impact of subsequent behavior (Bartholow and Heinz, 2006). Bartholow and Heinz (2006) did follow up on their first experiment with a second one, not under replication here, which aimed at investigating the effects of this increase in aggressive thoughts. However, the results only showed that the increase was related to more aggressive interpretations of the behavior of others, and did not assess whether the participants would also be more likely to commit aggressive acts themselves (2010). Other studies have reported a link between the activation of a specific attitude and subsequent behavior. For example, Bargh et al. (1996) found that participants who were primed with an elderly stereotype took more time to walk out of the experiment center than did control participants. The idea was that their attitude towards older individuals (i.e., that they are slow walkers) influenced their behavior. Similarly, other researchers found that priming participants with the stereotype of 'professor' enhanced their performance on a general knowledge test, whereas priming the trait 'stupid' reduced their performance on said test (Dijksterhuis and Van Knippenberg, 1998). Although these studies have their own limitations, the results suggest that when an attitude toward an object is activated, it can influence behavior. The ultimate goal would be to establish a direct link between the activation of an attitude towards alcohol and aggressive thoughts, and a subsequent increase in aggressive behaviors. How to conduct this kind of experiment in an ethical manner, however, remains to be answered.

Conclusion
This current research project was undertaken to extend the literature by replicating the evidence that simple exposure to alcohol stimuli without actual consumption or belief that consumption has occurred can increase aggressive thoughts. The initial intention of this project was always to complete a replication of the first of two experiments by Bartholow and Heinz (2006)  This replication attempt suffered from many methodological and designrelated issues. For example, the sample size was most likely too small, although no proper power analysis was run, and the sample further lacked generalizability.
The low number of participants might be explained by the timing of testing, but it remains unclear why so few men participated. Another important impediment to replication relates to the alteration of the aggressive target words since it is doubtful whether the new sexually aggressive words were of a similar salience as the original ones. In other words, the sexually aggressive words might not have been as centrally relevant to the concept of sexual aggression as were the aggressive words employed by Bartholow and Heinz (2006) to the notion of physical aggression. Similarly, the validity of the neutral category of pictures employed remains unclear. Finally, the execution of the hypothesis regarding gender stereotype levels was completely arbitrary and the results uninterpretable.
In conclusion, although the present study failed to replicate the previous finding that alcohol-related cues increase the accessibility of aggressive thoughts (Bartholow and Heinz, 2006;Subra et al., 2010), this does not mean that the effect is non-existent. Rather, it is probable that the methodology employed in this study was significantly flawed and reduced the likelihood of finding