Collaborative open science as a way to reproducibility and new insights in primate cognition research

Department of Psychology, The University of Edinburgh, Edinburgh, UK Scottish Primate Research Group Mental Health Data Science Scotland, The University of Edinburgh, UK Department of Psychology, Language Research Center, Georgia State University, Atlanta, GA, USA Leipzig University, Leipzig, Germany Stanford University, Stanford, CA, USA Department of General Zoology, University of Duisburg-Essen, Essen, Germany German Primate Center Leibniz Institute for Primate Research, Behavioral Ecology & Sociobiology Unit, Göttingen, Germany Leibniz ScienceCampus “Primate Cognition”, Göttingen, Germany Department of Psychology, Ludwig-Maximilians-University, Munich, Germany Department of Evolutionary Anthropology, Duke University, Durham, NC, USA Centre for Ecological and Evolutionary Synthesis, University of Oslo, Norway Department of Social Psychology and Quantitative Psychology, University of Barcelona, Barcelona, Spain Shanghai Key Laboratory of Brain Functional Genomics, Key Laboratory of Brain Functional Genomics Ministry of Education, School of Psychology and Cognitive Science, East China Normal University, Shanghai, China NYU-ECNU Institute of Brain and Cognitive Science at NYU Shanghai, Shanghai, China Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, China 17 Institut de Recerca i Estudis en Psicologia, Facultat d’Educació i Psicologia, Universitat de Girona, Girona, Spain Department of Early Prehistory and Quaternary Ecology, University of Tübingen, Tübingen, Germany Florida Institute of Technology, Melbourne, FL, USA Department of Cognitive Science, University of California San Diego, San Diego, CA, USA Department of Psychology, University of Miami, Coral Gables, FL, USA Psychological Science Accelerator Centre for Comparative and Evolutionary Psychology, Department of Psychology, University of Portsmouth, Portsmouth, UK Messerli Research Institute, University of Veterinary Medicine Vienna, Medical University of Vienna, University of Vienna, Vienna, Austria


Introduction
The goal of primate cognition research is to understand how primates acquire, process, store, and use information (Shettleworth, 2010). This is an enterprise rooted in the fields of ethology (de Waal, 2016) and comparative psychology (Roitblat, Bever, & Terrace, 1984). To fully understand primate cognition from an ethological perspective, we must study its mechanisms, ontogeny, function, and phylogeny (Tinbergen, 1963); to address each of these elements, large and diverse samples that adequately reflect the full extent of variation in cognitive traits within and between species are essential (Martins & Hansen, 1996;Tomasello & Call, 2011). To understand primate cognition from a psychological perspective requires the same need for large samples and diverse species (e.g., Beran et al., 2014;Bitterman, 1960;Dewsbury, 1984;Wasserman, 1993Wasserman, , 1997 However, these approaches are often problematic for single research groups with limited access to study populations. Historically, attempts to make inferences about cognitive evolution have been severely limited by reliance on samples that are insufficiently large and diverse to adequately reflect the extent of variation in cognitive traits (Beach, 1950;Shettleworth, 2010). For instance, to make reliable evolutionary inferences, one must account for the degree of shared ancestry between species. Species with more shared ancestry are expected to perform in a more similar way on cognitive tasks compared to more distantly related species. Failing to account for shared ancestry among species may heighten the risk of over-and under-interpreting apparent species differences, thereby undermining the reliability of inferences about primate cognitive evolution. Moreover, even when an adequate number of species is sampled, it is often difficult to rule out alternative explanations for between-species variation in cognitive performance, such as differences in motivation and perceptual ability (Mackintosh, 1988) or training histories. One solution to this task impurity problem (Miyake et al., 2000) is to study first what varies between individuals of a single species, using test batteries aimed at assessing multiple cognitive abilities with multiple tasks for each ability. The shared variance of multiple tasks pertaining the same ability (but varying in peripheral demands such as perceptual or motoric requirements) might then be compared across species (Völter et al., 2018). Such a psychometric approach, however, also requires large and diverse samples within a given species. Meeting this requirement is challenging because researchers are often limited to small samples in zoo, laboratory, and field settings, and in wild populations where control of extraneous influences is even harder to achieve.
Large and diverse samples are also essential to estimate the replicability of primate cognition research across sites. In human psychological research, the failure to achieve such samples has led to findings that have proven difficult to replicate within and across populations (Henrich, Heine, & Norenzayan, 2010;Open Science Collaboration, 2015), a predicament that is a key contributor to what is widely known as the 'replication crisis' (Lindsay, 2015). Similarly, in primate cognition research, studies using comprehensive cognition test batteries suggest that different populations of the same primate species possess markedly different cognitive profiles (Herrmann et al., 2010; cies. Further, the species that were studied varied widely in how much research attention they received, partly because a small number of test sites contributed most of the studies. These results suggest that the generalizability of primate cognition studies may be severely limited. Publication bias, questionable research practices, and a lack of replication attempts may exacerbate these problems. We describe the ManyPrimates project as one approach to overcoming some of these issues by establishing an infrastructure for large-scale collaboration in primate cognition research. Building on similar initiatives in other areas of psychology, this approach has already yielded one of the largest and most diverse primate samples to date and enables us to ask many research questions that can only be addressed through collaboration. to the validity and reliability of research findings (e.g., Open Science Collaboration, 2012; Psychological Science Accelerator: Moshontz et al., 2018). However, primate cognition research has no infrastructure for large-scale collaboration, and consequently many of the key challenges to the integrity of the field outlined above remain unresolved. ManyPrimates seeks to overcome these challenges by developing an infrastructure for large-scale research collaboration among researchers who have access to different primate populations available for cognitive testing. In this paper, we review the current state of the field of primate cognition research, including the species studied, sample sizes, and study sites (for a similar recent survey of field primatology research see Bezanson & McNamara, 2019). Based on this analysis, we outline key limitations of the field, highlight the importance of largescale collaboration in primate psychological science, summarize the goals of ManyPrimates, report on the current state of the project, and suggest directions for the future.

State of the Field
A widely held view within the field of primate cognition is that research is dominated by work with a few species (Beach, 1950;Shettleworth, 2010). Another common conjecture is that primate cognition studies are characterized by notoriously small sample sizes.
Small samples are, perhaps, less problematic for "proof of principle studies" seeking to identify whether a single individual has a certain ability, like being able to perceive a stimulus or use a tool. However, small sample sizes present a much larger obstacle to obtaining precise, reliable quantitative comparisons of differences in ability between species (but see Smith & Little, 2018). Additionally, some research sites might be particularly productive, leading to overrepresentation in the literature of individuals and species with idiosyncratic environments (e.g. rearing history, amount of cognitive testing, reliance on food provisioning by humans, or size of enclosure or home range) that could affect the generalizability of findings to more diverse populations and species. Hopkins, Russell, & Schaeffer, 2014). Such variation in living systems need not represent noise or error.
Instead, such variation might be the outcome of predictable responses to sources of variation across sites (Voelkl & Würbel, 2019), including differences in social environment, ecology, and population-specific histories of participation in other cognitive tasks (Cronin et al., 2017). These influences can be exam- where. At the same time, it has also been suggested that such differences in cognitive profiles might be explained by methodological differences across studies (Völter et al., 2018).
The use of different methods to assess a particular cognitive ability, rather than acting as a challenge to repeatability, might instead offer opportunities for conceptual replications. In fact, this form of repeatability is used to establish the construct validity of cognitive abilities. Recently, Cauchoix et al. (2018) assessed both contextual and temporal repeatability of cognitive measurements at the individual level in non-human animals. In their meta-analysis, the authors found evidence for repeatability of cognitive performance at the individual level across contexts (i.e., different tasks designed to measure the same cognitive trait) and over time (with low to moderate reproducibility estimates).
Consistent and reliable individual differences in cognition are important from an evolutionary point of view, as such traits might confer different fitness benefits (Thornton, Isden, & Madden, 2014). However, datasets suitable to examine the repeatability of cognitive performance in primates specifically are lacking (with some exceptions: e.g., Hopkins et al., 2014). This lack of studies is due to limited access to sufficiently large and diverse samples, and inadequate coordination across research sites to ensure cognitive tasks applied across species are directly comparable.
In other areas of psychological science, large-scale collaborations have been adopted to combat challenges least one study in black, and all others in grey. Less than 15% of over 500 commonly recognized primate species have been represented in studies from our review period. There is great variability in taxonspecific research efforts, and most primate radiations only received marginal attention. Of 16 primate families, 13 were included in at least one study. No cognitive research was reported on bushbabies and galagos (Galagonidae), sportive lemurs (Lepilemuridae), or tarsiers (Tarsiidae). Lorises (Lorisidae), the aye-aye (Daubentoniidae), and owl monkeys (Aotidae) each appeared only in a single study, and for gibbons (Hylobatidae) there were just two. Together, these four families featured in less than 1% of publications. We found similarly low numbers for Indriidae (3 studies) and Atelidae (4 studies). By far, the most intensively studied groups were the great apes (Hominidae) and Old World monkeys (Cercopithecidae), appearing in 38% and 40% of studies, respectively. Within these highly studied taxa, chimpanzees (184 studies) and rhesus macaques (152 studies) dominated. Among Old World monkeys, research was almost exclusively focused on the subfamily Cercopithecinae, which most prominently includes macaques, baboons, and vervet monkeys. The second subfamily of the group, the folivorous Colobinae, only featured in 3 studies (0.5%).
Thus, the vast majority of studies focus on great apes and cercopithecine Old World monkeys, which comprise just a small fraction of primates' phylogenetic, ecological, and behavioural diversity. In particular, folivorous and nocturnal primate taxa are systematically underrepresented. Thus, the intuition that primate cognition research is dominated by only a few species is supported. This sampling bias is problematic for evolutionary inferences, because underor unrepresented species might have psychological characteristics that differ from even closely related species. For example, rhesus macaques are socially less tolerant compared to other macaque species, which has been suggested to affect their social cognitive skills (Joly et al., 2017). The overrepresentation of rhesus macaques could therefore lead to a biased impression of macaque social cognitive skills in general in the literature.
To test these intuitions, we conducted a systematic review of recently published primate cognition research. We surveyed all journal articles from 22 relevant journals 1) publishing work on primate cognition between January 2014 and October 2019. We included all articles with original data from at least one primate species (excluding humans) that studied some kind of psychological process (judging from the title, abstract, and/or keywords) and involved at least one experimental manipulation. We included studies with any kind of behavioral measure and excluded studies focusing exclusively on other processes (e.g., genetics, neurophysiology). In addition to surveying the literature, we also solicited articles from the members of the ManyPrimates mailing list. For each article, we coded 1) the primate species involved, 2) the sample size per species and site, 3) the data collection site, 4) whether or not the study included a replication, 2) and 5) whether or not species were compared to one another statistically. All data and analysis scripts associated with this review are available in a public repository at: https:// github.com/ManyPrimates/japanese_review. We encourage the reader to consult the original data file for information beyond the summaries presented here. 2) We coded as replication if the same species was studied with comparable methodology but at a different site.
3) The taxonomic identities of subjects were reported in varying detail within the papers (i.e. regarding subspecific status). To be consistent, we chose to base our reporting and analysis on the species level and excluded publications that failed to provide a species assignment for their subjects. In the case of orangutans (Pongo spp.) and robust capuchins (Sapajus spp.), interspecific hybrids frequently feature in cognitive studies. To account for this issue, data on these groups are presented here on the genus instead of species level. When species-level assignments for Pongo and Sapajus subjects were provided, we included them in the raw data (see: https://github.com/ManyPrimates/japanese_ review). rhesus macaques with some precision, for most species we are left with too little information for accurate quantitative comparisons. Other research questions, such as the structure of individual differences in cognitive abilities, are also hampered by small sample sizes (for reviews, see Shaw & Schmelz, 2017;Völter et al., 2018).
Additionally, a few sites contributed most studies in the field. The 5 most productive sites featured in 38% of studies. Figure 3 shows a map of all 183 data collection sites that we identified in our review. The size of each dot corresponds to the number of studies from Sample sizes in these studies ranged from 1 to 481 individuals, but varied widely by species (Figure 2).    Groves (2006) for guenons, Groves and Shekelle (2010) for tarsiers, Mittermeier et al. (2010) for lemurs, Mootnick (2006) for gibbons (except that Nomascus siki is here regarded as a full species, as in the study concerned), Rylands et al. (2016) for tamarins and marmosets, and Groves (2001) for all other groups. *The following species appeared in published studies but were not included in the 10kTrees data set. Here, they therefore take the place of closely related species: Hoolock leuconedys (H. hoolock), Plecturocebus cupreus (P. moloch), Callicebus nigrifrons (C. personatus), and Eulemur rufifrons (E. rufus).

The median sample size across all species and studies
shows the number of studies for a species in relation to the number of sites contributing data for that species.   Phylogenetic data were obtained from 10kTrees (Arnold et al., 2010). Branch lengths are proportional to absolute time. Nomenclature corresponds to that in Fig. 1 in the data to increase the likelihood of publication.
Combined with a low rate of replication studies (only 2% of studies in our sample attempted to replicate zees, but coming from just 29 sites. Because primates are long-lived and often spend most of their life at one site, the same subjects are tested over and over again.
As a consequence, we may end up knowing a lot about a few individuals, but less about the variation within the species. In addition, repeated testing of few individuals may result in better performance due to experience with cognitive testing, hampering comparisons both within and across species.
From a comparative perspective, we found that while 19% of studies involved more than one species, 20% of these (22/111) did not compare species quantitatively-though of those that did, almost half (43/89) compared more than two species. Taken together, evolutionary inferences based on comparing multiple species are the exception rather than the norm in primate cognition research 4) .
These issues likely arise not because researchers do not want to study a broad range of species, each with a  4) It is worth noting, however, that we did not include in this review studies that compared humans to other species, which make up a large proportion of comparative cognition studies.
While research comparing humans to other species is scientifically valuable, it is also critically important to explore cognition beyond humans to gain fundamental insights about the nature and evolution of cognitive diversity (Burghardt, 2013;Byrne, 2000 (Farrar & Clayton, 2019).
In the following, we present the ManyPrimates project as an attempt to overcome some of these issues.
ManyPrimates cannot solve all issues related to funding and the culture of academic publishing, but it can provide researchers with an opportunity to con tribute their limited resources to a larger project. This pooling of resources allows them to tackle important evolutionary questions in a systematic and meaningful way. The project was inspired by other large-scale collaboration projects within psychology, which we now turn to.

Large Scale Collaboration in Psychology
Many in how to conduct a study, analyze the data, and interpret results, and this flexibility can be used to favor positive results (Gelman & Loken, 2013). For instance, researchers might: recruit participants in several stages and end the experiment once desired results show up; HARK-Hypothesize After the Results are Known (Kerr, 1998); test several parameters at the same time to compare multiple results and choose those that work (John et al., 2012); analyze only a subgroup of participants; or fail to adjust for inflated Type I error rates 5) . Put another way, these practices involve 'torturing' the data until the desired results appear (p-hacking; Simonsohn, Nelson, & Simmons, 2014).
Abusing these degrees of freedom undermines trust in 5) It should be pointed out that p-hacking is not necessarily a conscious choice of a researcher, but can also be due to unconscious bias (Gelman & Loken, 2013).
if a large number of researchers in the field join forces.
ManyPrimates aims to include a wide variety of institutions, with university research labs, zoos, and sanctuaries participating. This diversity improves the representativeness of the sample and the study results.
Systematically accounting for variation in housing and rearing backgrounds allows researchers to examine whether environmental variation predicts differences in tions. This distribution and diversification of data collection thus helps to address many of the problems we identified above in our review of primate cognition studies (e.g., the problem of a few very productive sites being responsible for most studies). In addition to @ManyPrimates) and conference presentations (10 conference presentations since the official launch of the project). In July 2020, we will host a ManyPrimates symposium at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany.
We established and tested this infrastructure while running our first pilot study on short term memory.
The topic was selected-through voting-from a In light of the literature review presented above, the sample size and diversity of species in the pilot paper is already extraordinary. For example, none of the studies reviewed included more species or more sites.
However, from a phylogenetic perspective, the number of species represented in the sample is still relatively small (Freckleton, Harvey, & Pagel, 2002;Freckleton & Rees, 2019). We therefore decided to continue data collection to add more data from more species. The pilot project has thus become ManyPrimates1 (MP1data collection planned to end in May 2020). We also break new ground in the way we organize the data analysis. For MP1, we announced a modelling challenge to solicit phylogenetic models from the community to find the best predictors of short-term memory abilities on a species level. Researchers can submit models specifying the external variables (social and ecological) they think best predict short-term memory abilities across species. All submitted models will then enter into a model comparison. Given a nearly endless number of plausible models, deciding which to favor strongly depends on one's theoretical views. As a project, ManyPrimates aims to be theoretically neutral.
This position is best maintained not by making theory- Our website is also the main way to disseminate backgrounds to tackle the challenge of developing unbiased tests.
ManyPrimates also offers the opportunity to reflect upon and exchange ethical considerations. We need to compare country-specific or site-specific laws/regulations, and carefully consider definitions of invasive and noninvasive cognitive testing procedures.

Future Directions
ManyPrimates seeks to address the limitations men- ing an online platform for large-scale collaboration, future projects can benefit from collective data sets not just from that project, but from all preceding projects.
The replicability and generalizability of primate cognition studies can be tested through a large and robust framework. Because sample sizes are often a concern in this field, we can study these subjects intensively, and take advantage of their detailed developmental histories. As an example, if a future project assesses

Many zoo and sanctuary animals are research naive
and some may come from suboptimal backgrounds, such as having been pets or in the entertainment industry. However, these limitations are also an opportunity to compare results across sites and individuals, allowing us to investigate the effects these differences, avoid sampling biases, and identify the robustness of reported effects (Baribault et al., 2018;Fiedler, 2011).
Another primary challenge is balancing the need to keep methods the same across species with the flexibility required to adjust methods for species differences (Boesch, 2007

Conclusion
To understand the evolutionary mechanisms underlying primate cognition, large and diverse samples are the serial ordering abilities of many of the primate species in the first ManyPrimates project, we can examine serial ordering and its relation to short term memory. If, later, we assess prospective memory in these individuals, then we also have short term memory and serial learning profiles for these same individuals. This example can be expanded outside a primary research topic (such as memory). We could examine relations between sensorimotor skills and susceptibility to perceptual illusions, or choice biases and impulsivity, or any number of other combinations.
That nonhuman primates are so long-lived is a major within the last five years shows that during this period only about 15% of primate species have been studied.
The median sample size was just seven individuals.
Great apes and Old World monkeys account for the vast majority of studies, with chimpanzees and rhesus macaques being especially overrepresented in research.
Taken together, these two species are the focus of study in more than half of all publications. Furthermore, only five data collection sites produced 38% of all studies. These imbalances in sampling introduce bias to primate cognition research and pose a serious problem for reconstructing the evolutionary history of cognition.
ManyPrimates offers the infrastructure for large- was housed at BBB. We attempted to collect data from XX additional subjects (XX species1, X female, mean age = X; XX species1, X female, mean age = X) but did not include them because CCC."