A Bibliometric Analysis of the 35th anniversary of the paper “The Statistical Analysis of Compositional Data” by John Aitchison (1982)

This study presents a comprehensive bibliometric analysis of the paper published by John Aitchison in the Journal of the Royal Statistical Society. Series B (Methodological) in 1982. Having recently reached the milestone of 35 years since its publication, this pioneering paper was the ﬁrst to illustrate the use of the methodology“Compositional Data Analysis”or“CoDA”. By October 2019, this paper had received over 780 citations, making it the most widely cited and inﬂuential article among those using said methodology. The bibliometric approach used in this study encompasses a wide range of techniques, including a speciﬁc analysis of the main authors and institutions to have cited Aitchison’ paper. The VOSviewer software was also used for the purpose of developing network maps for said publication. Speciﬁcally, the techniques used were co-citations and bibliographic coupling. The results clearly show the signiﬁcant impact the paper has had on scientiﬁc research, having been cited by authors and institutions that publish all around the world.


Introduction
Nowadays compositional data are defined as arrays of strictly positive numbers for which ratios between them are considered to be relevant (Egozcue and Pawlowsky-Glahn (2019)). Despite warnings about the problems involved in not using specific methods for such data (Pearson (1897), Chayes (1948) and Vistelius and Sarmanov (1961)), it was not until the 1980s that the first general methods were proposed as appropriate methods for their analysis (Aitchison (1982) and Aitchison (1986)). This methodology received the name of compositional data analysis, CoDa analysis or simply CoDA. It is usually written CoDa when it refers to "compositional data" and CoDA when it refers to "Compositional Data Analysis". That same terminology also encompassed methods that allow the analysis of data wich positive values, whereby although the data do not have to fulfill the characteristic of constant sum, they do need to meet the requirement that the study of certain ratios of this study is considered To meet our aim, the WoS database and the VOSviewer Software (Van Eck and Waltman (2010)) were used. The VOSviewer Software was employed with the aim of graphically mapping the bibliographic material used. Specifically, the following techniques were considered in this paper: bibliographic coupling and co-citation. The reason for using the WoS database is that it is considered the most influential in the world (Merigó, Gil-Lafuente, and Yager (2015)). The rest of the document is divided into the following sections: the second section presents the bibliometric methods used in this paper; the third section provides a complete bibliometric study of Aitchison's work "The Statistical Analysis of Compositional Data" (Aitchison (1982)); and the fourth section summarizes the main conclusions, limitations and future lines of research.

Methodology
The term bibliometrics was introduced by Pritchard (1969) as "the application of mathematical and statistical methods to books and other means of communication". Currently, although many other definitions exist (see Yuan, Gretzel, and Tseng (2015) and Köseoglu, Sehitoglu, Ross, and Parnell (2016)), they all describe it as an instrument for analyzing the evolution of scientific disciplines based on intellectual, social and conceptual structures (Zupic andČater (2015)) in order to identify trends and patterns in scientific research (Merigó, Blanco-Mesa, Gil-Lafuente, and Yager (2017)). Therefore, bibliometrics is one of the most widely used approaches for analyzing how a scientific field develops (Bar-Ilan (2008)). For this bibliometric study, data were gathered from the WoS database in October 2019 using "The Statistical Analysis of Compositional Data" as a keyword in the field "title" and "Aitchison, J." as a keyword in the field "author". These searches returned Aitchison's 1982 paper as the only result. Subsequently, the information was refined based on the total number of citations obtained by the paper, which resulted in 784 publications for analysis. Given that no consensus exists in the literature on which methods are best or most appropriate, we used several bibliometric indicators to present the data. Firstly, we considered the number of publications and citations, these methods being considered the most popular according to Ding, Rousseau, and Wolfram (2016). The former indicates productivity, while the latter quantifies the influence of these publications (Svensson (2010)). Other common indicators include the most productive authors, institutions and countries, and number of publications and citations per person (Mulet-Forteza, Salvá, Monserrat, and Amores (2020)). For the analysis of institutions, we also included general university rankings. The results in the tables are sorted by total number of publications (TP). In addition, we used the VOSviewer software (Van Eck and Waltman (2010)) to graphically map the bibliographic data (Sinkovics (2016)) for co-citations (Small (1973)) and bibliographic coupling (Kessler (1963)). Co-citation assumes that there is some kind of relationship between two documents cited jointly by a third document (McCain (1990), Ramos-Rodríguez and Ruíz-Navarro (2004) and Small (1973)). According to McCain (1986) and McCain (1991), these documents allow the academic structure of a scientific discipline to be determined. Bibliographic coupling measures the similarity of the subject analyzed among the documents considered from the frequency in which certain references are shared. A bibliographic coupling occurs when two documents include the reference to a third document (Young (1983)), so there is a possibility that these documents are linked (Martyn (1964)). Bibliographic coupling is usually applied to perform the graphic mapping of institutions and countries (Small (1999) and Boyack and Klavans (2014)), while co-citation is usually used to perform the graphic mapping of autors (Glänzel and Thijs (2012) and Zupic andČater (2015)). The combination of methods used to collect data from the WoS database, along with use of the VOSviewer software, allowed us to incorporate both the "full counting" and "fractional counting" techniques. The difference between these methods is that "full counting" assigns one point to each participant of a paper, whereas "fractional counting" takes into account co-authorship of the paper (Mulet-Forteza, Genovart-Balaguer, Merigó, and Mauleon-Mendez (2019b)). Aitchison's paper (1982) In this section, we will address the different research questions posed.

Bibliometric study of
3.1. Evolution of number of citations received by Aitchison's paper (1982) Regarding the first question (RQ1), Figure 1 presents the evolution of the citations received by Aitchison's 1982 paper.  Figure 1 shows that the paper has received uninterrupted citations since its publication in 1982. It also indicates how the number of citations received has evolved over different periods. In this sense, with few exceptions, the number of annual citations received by the paper between 1983 and 2007 did not exceed 10 per year. On the other hand, since 2008 annual citations have exceeded the previous value every year, following an expected exponential growth (Price (1986)). Likewise, a very significant increase in the number of citations received can be observed since 2015, and this increased still further in the years 2018 and 2019. We have analyzed some of the reasons why Aitchison's 1982 paper has received a significant number of citations, especially since 2011. To this effect, in Table 1 we examine the evolution over time of the main research areas where the citations for the said work have been provided.
Table 1 clearly shows that the majority of citations received by Aitchison's 1982 paper come from three research areas, i.e. Mathematics, Geology and Environmental Sciences Ecology. Nevertheless, the interest of these research areas in Aitchison's 1982 paper has been aroused only in the last decade. Almost 60% of the citations from the area of Mathematics belong to this period, while the percentage goes up to over two-thirds in the areas of Geology and Environmental Sciences Ecology. Other areas that have also provided a great number of citations of Aitchison's 1982 paper are Geochemistry Geophysics, Mathematical Computational Biology, Engineering, Biochemistry Molecular Biology and Agriculture. On the other hand, Table 1 also illustrates a wide range of research areas that provided the largest amount of citations of Aitchison's 1982 paper when it was first published but are not so relevant today, including Zoology, Chemistry, Nutrition Dietetics, Plant Sciences, Behavioral Sciences, Paleontology, Endocrinology Metabolism, Physiology, Physical Sciences Other Topics, Reproductive Biology, among others. It can therefore be seen that there has been a shift in interest in the research carried out by Aitchison in 1982, and that areas related to Statistics, Geosciences, Mathematics, Computer Science, Biochesmitry and Economics, among others, have replaced those initially used by CoDA. Consequently, the journals that have cited Aitchison's 1982 paper the most are those indexed in these research areas. Just as an example, it is noteworthy that the Journal of Geochemical Exploration, indexed in the Geochemistry Geophysics research area, is the one that has cited Aitchison's 1982 paper most often, with a total of 23 papers citing the said document during the last decade. Other journals indexed in the research areas that cited very often Aitchison's 1982 paper during the last 10 years include: • In the Mathematics research area: Bioinformatics, Mathematical Geosciences, Environmental and Ecological Statistics and Environmetrics, among others.
• In the Environmental Sciences Ecology: Environmental Earth Sciences and International Journal of Environmental Research and Public Health, among others.
• In the Statistics research area: Journal of the American Statistical Association, Annals of Applied Statistics, Stochastic Environmental Research and Risk Assessment, Biometrics and Austrian Journal of Statistics, among others.
• In the Geosciences research area: Quaternary International and Geoderma, among others.
Therefore, it can be stated that the interest aroused in these research areas by Aitchison's 1982 paper has caused a genuinely growing interest in this publication, especially during the last decade. It has also been possible to confirm that the authors who have most often cited Aitchison's 1982 paper during the last decade match those at the top of Table 2. In fact, only some positions have been exchanged. Thus, for example, Antonella Buccianti and Vera Pawlowsky-Glahn would exchange their positions, while Andrea Bloise, who occupies position 11 in Table  2, if we consider only the citations made to Aitchison's 1982 paper during the last decade, would occupy the ninth position in this new ranking, relegating John Aitchison from the TOP 10, who would be left out of the list of authors who have cited Aitchison's 1982 paper the most. This is not surprising, considering that John Aitchison died in 2016 at the age of 90. Finally, we also analyzed the original source of the 784 citations received by Aitchison's paper. In this regard, 90.7% of citations were from documents published as papers, 4.6% from proceedings papers, 2.2% from books, 2% from reviews, and the remaining 0.5% from notes and letters. Thus, 93.2% of citations came from papers that had passed a strict process of arbitration; in other words, articles, reviews, letters and academic notes.
3.2. Most productive authors citing Aitchison's paper (1982) In this section, we address the second question (RQ2) posed in our paper. Firstly, Table 2 lists those authors who have cited Aitchison's paper (1982) the most.
Vera Pawlowsky-Glahn (University of Girona, Spain) is the author who has cited the Aitchison paper (1982) the most, followed by Antonelle Buccianti (Università degli Studi di Firenze, Italy) and Juan José Egozcue (Polytechnic University of Catalonia, Spain). As Table 2 shows, the three main authors in this ranking have cited Aitchison's paper (1982) a total of 86 times, although it should be noted that this value, when obtained by means of a full counting method, does not take into account co-authors among these authors. This bias will be eliminated later when performing the graphic analysis of the main authors through a fractional counting method. It is also interesting to observe the decreasing number of authors producing an increasing number of citing papers, as predicted by the bibliometric law of authors' productivity (Lotka (1926)). The University of Calabria (Italy) counts four authors and is the most repeated institution among the authors who lead the ranking in Table 2, followed by the Australian National University, the University of Girona and the University of South Australia, with three authors each. With two authors, we find the Helmholtz Zentrum für Umweltforschung (Germany), the HZDR -Helmholtz-Zentrum Dresden-Rossendorf (Germany) and the Polytechnic University of Catalonia (Spain). The rest of the institutions have only one author represented in Table  2 (23). Finally, the authors in Table 2 work in 13 different countries. Australia (with seven authors) leads this ranking, followed by Germany and Italy, with six authors each. Next, we find Spain (five authors), Canada (four authors), China and the US (with three authors each), France and the UK (with two authors each), while Belgium, Czech Republic, Estonia and Greece only have one author in Table 2. Figure 2 shows a graphic map of the co-citations among the most influential authors to have cited Aitchison's paper.   . This is the group with the largest network of connections, both between each other and with authors in the other nodes. The third group, consisting of four authors, is led by Javier Palarea-Albaladejo (Biomathematics & Statistics Scotland), while the fourth group, also with four authors, is led by Domenico Miriello and Andrea Bloise, both from the University of Calabria. With some exceptions, most authors in Figure 2 also appear in Table  2, which indicates that there are no significant differences between the analyses performed by the WoS database using the "full counting" method and that done by the VOSviewer Software using the "fractional counting" method.

Most productive institutions citing Aitchison's paper (1982)
Regarding the third question (RQ3) posed in our paper, Table 3 shows the institutions to have most frequently cited Aitchison's paper (1982), together with the position that these universities occupy in the Academic Ranking of World Universities (ARWU) (Consultancy (2019)) and the Quacquarelli Symonds World University Ranking (Symonds (2019)).
The University of Girona is the institution whose researchers have most frequently cited Aitchison's paper (1982), followed by the Polytechnic University of Catalonia and the University of Florence. The countries displaying the largest number of institutions in Table 3 are France (11), the UK (8) and the US (8). In addition, 12 institutions in Table 3 appear in the ARWU top 100, with Harvard University ranking the highest, in first position. Similarly, 13 universities appear in the top 100 in the QS ranking, with Harvard University leading the ranking, in third position on that list. Table 3, which was compiled using the full counting method, can show biases in those institutions with cultures which encourage several authors to work together on a single paper. For this reason, Figure 3 shows the results of the previous analysis using the fractional counting method, eliminating the aforementioned bias. Figure 3 shows a bibliographic coupling of the institutions that cite Aitchison's paper (1982).   Figure 3 shows four main node groups and five secondary node groups. The largest group of 14 institutions is focused around English-speaking institutions, including Harvard University, Duke University and the University of California San Diego. The second group contains 13 institutions, including the University of South Australia, Victoria University and the University of Zurich. The third is composed of 11 institutions, among which the University of Sao Paulo, the University of Turin and the University of Bremen stand out, and the fourth main node, with nine institutions, is led by the University of Girona, the Polytechnic University of Catalonia, and the University of Florence. The latter three lead the first three positions in Table 3. In turn, this group of institutions is the one with the largest network of connections, both with one another and with institutions in the other nodes. Therefore, with some exceptions, it can be observed that most of the institutions present in Figure 3 also occupy relevant positions in Table 3, which indicates that there are no significant differences between the analysis performed by the WoS database using the "full counting" method and the VOSviewer Software using "fractional counting".
3.4. Most productive countries citing Aitchison's paper (1982) Regarding the fourth question (RQ4), Table 4 shows the countries that have most frequently cited Aitchison's 1982 paper. Table 4 shows that the countries with the highest populations are not those that have cited Aitchison's 1982 paper most, with the exception of the US. As a matter of fact, only three countries in the top 10 (US, China and Brazil) have over 100 million inhabitants. In contrast, Table 4 shows how countries where English is widely spoken, especially among academics, are those where Aitchison's paper (1982) tends to be cited the most. This trend is especially present in countries such as Australia, Norway and New Zealand. These countries would lead the rankings in Table 4 if we ordered it based on number of papers cited by population. Like the previous tables, Table 4 uses the full counting method, which skews countries where several authors write articles together rather than working independently. We therefore implemented a fractional counting method in Figure 4, which shows a bibliographic coupling of the countries that have cited Aitchison's 1982 paper most. Figure 4 shows eight clusters. The first cluster by number of countries (10) is led by Germany and is composed of European countries, with the exception of Taiwan. The second cluster (seven countries) is led by Australia. This cluster is basically made up of non-European (6) countries. The third cluster by number of countries (6) is led by the United Kingdom. The United States leads the fourth cluster and is the most productive country, with the broadest network of connections on the map. The fifth cluster is led by Spain, while the last is led by Italy. In general, it can be observed that the results obtained under both the full counting system (WoS database) and the fractional counting system (VOSviewer Software) are very similar. Figure 4 shows a very diverse network of connections, where we find cultural connections between different countries such as Colombia and Ireland, or Australia and Iran.

Conclusions
Adopting a bibliometric approach and based on data obtained from the WoS database, in this paper we have carried out an analysis of all the publications that have cited the paper entitled "The Statistical Analysis of Compositional Data" published by John Aitchison in the Journal of the Royal Statistical Society. Series B (Methodological) in 1982. Having recently reached the milestone of 35 years since its publication, the paper is considered to be the seminal article on the CoDa analysis.
In this paper, we have met all of our established aims. Specifically, we have answered the four research questions we asked at the beginning. As for the first (RQ1), we have analyzed how the number of citations of this paper has evolved, showing how the paper has received uninterrupted citations since its publication and that over the past four years the number of  Scientifique CNRS are the ones who have cited the paper (RQ3) the most, while by country, the authors from the United States, the United Kingdom, Spain, Italy, Australia and Germany are the ones who have cited the paper (RQ4) the most. Our analysis indicates that there are no significant differences between the analysis of the WoS database using the "full counting" method and the "fractional counting" method used with the VOSviewer Software.
Although this document provides a description of the structure of citations, leading authors, institutions and countries that have cited Aitchison's 1982 paper, it does have some limitations. For example, since data were collected from the WoS database, the limitations of this database also apply to this analysis. As we have indicated previously, the WoS database collects information under a "full counting" method, meaning that documents with many co-authors generally have more weight than documents produced by a single author (Mulet-Forteza, Genovart-Balaguer, Mauleon-Mendez, and Merigó (2019a)). To resolve this limitation, we also employed the "fractional counting" method, using the VOSviewer software to identify co-citations and bibliographic coupling. A further limitation is that the results are dynamic and will inevitably change over time.
Despite the above limitations, this paper represents a starting point for future bibliometric studies in this field. In this respect, future lines of research should aim to carry out a bibliometric analysis focusing on all publications that have included the methodology of "Compositional Data Analysis", firstly in the field of the social sciences, and then increasing the number of publications by also covering papers indexed in the WoS under "Science Citation Index Expanded". Although we recognize the limitations of our analysis, the main aim of this paper was to analyze the academic structure of the papers, authors, institutions and countries that have cited Aitchison's 1982 paper. We believe it does this in a sufficiently rigorous and complete manner, while also presenting an overview of the most important data related to Aitchison's 1982 paper, which has recently celebrated the 35th anniversary since its publication.