1. Introduction
In this document, analyzes of the documents published by the DYNA (Colombia) journal between January 2010 and September 2019, and of the documents citing DYNA available in April 2020 are presented. This article combines different techniques commonly used in technology mining, biblometrics, systematic literature mapping, business intelligence, and scientometrics with the aim of building a methodological framework for analyzing peer-reviewed and indexed scholarly journals. The aim of this work is to lay the foundations of the term author’s analytics.
Although the main functions and responsibilities of the scientific editor are related to operative decisions as the assignment of papers to reviewers, manuscript evaluation and editorial decisions [1], other editorial decisions include tactical and strategic aspects that impact the indicators of the journal. In this sense, a deep knowledge of all aspects of the journal are mandatory. However, in practice it seems difficult to characterize the authors who submit their contributions to the journal, and it is even more difficult to characterize the authors and journals in which the citations are given. Furthermore, in the search for greater journal reputation, it is vital to understand who the authors and journals they cite are, in order to establish policies and strategies to increase the impact factor, or any other measure of scientific reputation.
There are several approaches that can be used for the editor to gain insights, and a deep knowledge of the authors and journals, including systematic literature reviews, bibliometrics, science mapping analysis, and tech-mining. The objective of systematic mapping studies is to understand, organize and summarize the main data and findings from a wide field of knowledge, with the aim of extracting key information as the main journals, authors or topics in the knowledge area. The starting point of a systematic mapping study is to define the research questions that will be answered in the research; in [2], a list of the most frequently asked research questions in systematic mapping studies is presented. These questions include:
What are the most cited documents in the area of research?
What is the most cited recent research?
Who are the most important authors?
What are the most important journals and conferences in the area of research?
What are the main topics in the area of research?
Ultimately, the research results of systematic mapping studies are used to formulate systematic review projects in order to study in deep the selected main topics of interest for the researcher. As can be inferred, this methodology is commonly used in the academic sector.
Bibliometrics provides to the editor with systematic and reproducible statistical methods to analyze and review a large body of information [3], composed of the published articles of the own journal and the external citing documents. Bibliometrics shared many points with the metric analysis of information [4], such that, many statistical metrics are commonly used for both approaches.
In a broad definition, tech-mining refers to exploit information about emerging technologies with the aim of improving decision-making using text mining and summarization techniques [5]. Although systematic mapping methodology is focused on results, leaving aside the use of technology, tech mining emphasizes the use of computational tools to analyze and derive conclusions. The main sources of information in technological mining include, but are not limited to, research documents and patents. In [5], the analyses are divided in two categories: basic and advanced. Basic analysis groups all the analyzes of individual columns in a bibliographic information dataset; therefore, the answers to the most common research questions in systematic literature reviews are considered in the basic analysis proposed by the tech-mining methodology. This kind of analysis takes into account the occurrence and relationships of different terms in the bibliographical dataset as authors, journals, citations, etc., and their occurrence by year. Advanced analysis covers all analyzes of relationships between terms that appear in two columns of a bibliographic dataset, for example, author-journal relationships. Thus, the systematic mapping of literature and tech-mining can be used to provide a complementary and unified point of view of a body of scientific documents.
On the other hand, if it is accepted that the definition of customer analytics refers to the processes and technologies that massively use data to provide information about customer to improve key decision-making, then the process proposed in this paper it can be defined as author analytics by extrapolation of the definition. Thus, the aim of this work is to define the term of author analytics and propose their methodological aspects; furthermore, the DYNA journal is analyzed with the proposed methodology.
The rest of this papers is organized as following: Section 2 defines the concept of author analytics and describes its methodological aspects. Section 3 presents the analysis of the papers published in the DYNA journal. Section 4 presents the analysis of papers citing the DYNA journal. Finally, the main conclusions are described in Section 5.
2. Towards a definition of author analytics
2. 1. Definition
In this article, author analytics is defined as the use of data to provide information about authors, journals, institutions, countries, and concepts to improve strategic and tactical editorial decision-making related to improving the academic standing of a journal. Here, the term author refers to authors of the papers published by the journal and the authors citing these papers. The use of bibliographic data implies that author analytics is carried out through the use of text mining and statistical summarization techniques.
2.2. Objective of research and study design
The objective of the research is to obtain a vision and in-depth knowledge of the current journal, by analyzing the authors who publish in the journal and the authors who cite the journal's publications. The aim of the study design is answering research questions related to:
Characterize the authors, journals, institutions, and countries, and their relationships
Identify the knowledge base of the authors.
Thus, the main research questions proposed to analyze the journal are the following:
What is the number of papers published?
What are the most cited articles?
How many different authors does the journal have?
What is the number of authors per country of affiliation?
Who are the authors with the most publications?
Who are the authors most cited?
Which authors usually publish together?
What is the publication pattern of the main authors per year?
What institutions have the most publications?
Which institutions have the most citations?
What are the most used keywords in the journal?
What are the most used keywords by most cited authors?
What are the keywords that are frequently used together?
What are the keywords with the most citations?
To analyze the body of literature that cites the journal under review, many of the above questions can be used. In addition, the following questions can be added:
What are the sources that are most cite the journal under analysis?
Who are the authors that are most cite the journal under analysis?
Are the authors who are citing the journal also authors of the analyzed journal?
How many authors who cite the magazine have never published in it?
What is the citation pattern for the journals citing the journal under study?
What is the citation pattern for the authors citing the journal under study?
Additionally, the study can be enriched by adding descriptive statistics of the different terms of the bibliographic base.
2.3. Workflow
The following workflow is adapted from the general science mapping workflow based on [3,5-8]:
Data collection refers to the process of build a database with the core documents published by the journal, and other database with the documents citing the analyzed journal. This process is common to all methodologies used to analyze bibliographic data [9]. This step includes tasks related to data preparation and cleaning. Usually, text mining techniques are used for manipulating strings. Mainly, they are necessary to:
Detect and delete duplicate entries in the bibliographical dataset.
Unify keywords, for example, plural and singular.
Detect different keywords referring the same concept and unify terms, for example, forecast and predict.
Data analysis involves the use of software tools for analyzing the bibliographic dataset using descriptive statistical techniques with the aim of gain knowledge about the data. Advances analyzes include co-word analysis to produce semantic maps [10], co-author analysis to analyze social structure and collaboration networks [11], and citation analysis.
Visualization refers to the selection of most appropriated graphs for revealing insight and facts about the journal and the citing papers. Many techniques can be used to present the results obtained, but particularly, statistical graphs and tables are especially useful. In this paper, heatmaps and bar plots are used to summarize and highlight patterns and relationships, but chord diagrams, pie plots and many other graphs can be used.
Finally, insights and facts are reviewed and interpreted.
2.4. Relationship of the author analytics to other approaches
Author analytics shares with the systematic mapping of literature, the methodological aspects related to the formulation and answer of the research questions used to analyze a body of literature; and with business intelligence and tech mining, the use of different strategies and tools to summarize key insights from the data.
In other words, author analytics can be understood as the union of systematic mapping of literature, scientometrics, business intelligence, descriptive analytics and tech mining techniques to make data-driven editorial decisions and to formulate policies.
Author analytics shares with the previous methodologies the necessity to define research questions that guide the analysis, the use of specialized software to clean and manipulate text and numeric data, and the use of statistical graphs and tables to provide insights
3. Analysis of the publications of the DYNA Journal
3. 1. Used information
We apply the proposed methodology to the DYNA (Colombia) journal, edited by Facultad de Minas, Universidad Nacional de Colombia, Medellín Campus. The information was download from Scopus in August 2020. The information available for the journal covers the period from 2010 to December 2019. Citing information is updated to August 2020.
In the data preparation phase, records corresponding to documents summarizing conferences, proceedings, workshops and congresses were deleted. In addition, duplicated documents with the same title of other documents in the bibliographic dataset were deleted. Author keywords differing in plural and singular were unified to the same string.
The extraction of the affiliations of the authors was a challenge, due to the many problems that were found in the text strings. Many texts were bad formatted without a separator character or inconsistent separation. A detailed revision was made of hand with aim of recover errors.
3.2. Descriptive statistics of the journal
In this section, we analyze a total of 1702 research papers written by 4,156 authors for the period since January 2010 to December 2019, with an average of 2.44 authors per document. and 0.41 documents per author. Editorials, Letters to the Editor, and Notes were discarded in this analysis. There are 1,615 multi-authored documents, and 87 single-authored documents; thus, for multi-authored documents, there are an average of 3.31 co-authors per document. Authors are associated to 2,602 affiliations in 44 countries. There is an average of 26 references per article.
There are 6,033 different author keywords, 22,809 abstract words, and 4,682 title words. Word extraction from abstracts and titles was made using text-processing techniques: first, we split texts in minimal context units by dividing paragraphs using commas, semi-colons and dots as splitting characters; second, we create a list composed of all author keywords present in the database; third, we search for author keywords in the minimal context units and mark for extraction. Four, text-mining techniques were used to form 2-grams and eliminate stop-words.
3.3. Analysis by year
Fig. 1 presents the number of papers published by year; this figure shows that the number of documents by year is approximately constant, with the maximum of 189 published documents in 2014. In Fig. 1, darkness of bars is proportional to the number of citations for the year.
Fig. 2, presents the number of average citations by year; this indicator is calculated as the ratio between the number of citations and documents published in each year. The maximum value is reached in 2012 with 3.85 citations per document, and a total of 597 citations. From 2014 onwards, an annual decrease in citations is observed, reaching only 37 citations for documents published in the year 2019. This is normal behavior as older documents have more citations than younger documents.
Table 1 summarize the main impact indicators for the journal, published by Scimago web site. These indicators are calculated over all published documents. The table shows that from 2014 onwards, the number of cites per document (in a 4-year basis) is approximately constant; however, during the same period of time, the number of self-cites per document has dropped to practically zero. This is an important fact, because it shows an increase in external citations. However, the SJR indicator, which it is a prestige indicator that accounts the importance of the citing journals, presents a decreasing trend since 2014.
3.4. Term analysis
In this section, different fields of the bibliographic dataset are analyzed. These 4,156 authors belong to 2,602 institutions in 44 countries. Table 2 shows that the journal mainly published articles originating in Colombia (69.5%), Spain (14.5%), Brazil (9.5%), and Mexico (8,1%). In terms of total citations, articles with authors affiliated with institutions in Portugal and Chile have a higher average number of citations.
Table 3 presents the ten institutions with the highest number of published documents and the total of citations for the analyzed time period; thus, DYNA-Colombia mainly published documents from Universidad Nacional de Colombia (33.4%), Universidad del Valle (8.1%), Universidad de Antioquia (6.3%), and Universidad Industrial de Santander (4.6%). To analyze the quality of the publications, we compute the ratio between the total number of citations and the total number of documents published; we found that, for the ten institutions with the highest number of documents published, the institution with highest impact is the Universitat Politecnica de Valencia with a ratio of 3.84, followed by Universidad del Norte (3.13) in second place, and Universidad Pedagógica y Tecnológica de Colombia (2.77) in third place.
Table 4 presents the authors with ten or more documents published, total citations for the author in the journal, and Scopus H Index as a measure of quality for the author; however, we found that several authors have more than one profile and, thus, it is not possible to use this indicator for comparisons and quality assessment. This table shows that no author participates with more than 1% of the total articles of the journal for the analyzed time period. It is necessary to note that the authors with more documents published in the journal there are not, necessarily, the authors with more total citations, as the case, for example, of Arango-Serna MD, Correa R, or Restrepo OJ. Several authors in the table belong to the group of then most cited authors of the journal; they are marked with an asterisk in Table 4. The most cited author is Tinoco IF cited by 62, followed by Garcia-Alcaraz JL (with 52 total citations) and Osorio JA (cited by 50). Ten most cited papers appear in Table 5; none of the most cited papers was written for the most important authors of the journal. As previously indicated, the journal has 4156 authors, of which 3367 authors (81%) have written a single article.
Also, the authors in Table 4 written the 5.93% of documents published by the journal.
For analyzing author’s keywords, we use text-processing techniques to unify words and reduce the dataset; most part of the process was devoted to unifying words in plural and singular. There are 35 keywords used in nine or more documents, which are presented in Fig. 3. The most cited keyword of the figure is supply chain followed by model, mathematical model and power quality. By analyzing the frequency of occurrence, it can be said that journal presents regularly documents related to computational intelligence (artificial neural networks, fuzzy logic and genetic algorithms), management (management, supply chain, logistic, sustainability), modeling (modeling, and mathematical model) and mechanical engineering (wear, corrosion, characterization, mechanical properties), and energy (power quality, renewable energies, biofuel, hydrogen, energy efficiency).
3.5. Analysis of terms by year
Fig. 4 presents the number of documents published by year for authors with the most documents published in the journal. The authors present a production of articles approximately constant over time; however, Arezes PMFM and Adarme-Jaimes W present an exceptional case with ten and seven articles published in a single year.
The number of documents published by institution for the ten institutions with most documents published is presented in Fig. 5. This figure allows us to conclude that the mean participation of the institutions is approximately constant over time.
In Fig. 6, we present the occurrence by year of the most frequent author keywords in the journal. Most of the keywords appear in most of the years. However, the number of appearances seems to be low in comparison with the number of articles published yearly by the journal. This can be explained by the wide spectrum of engineering topics that is addressed by the journal, preventing specialization.
Also, It is also plausible to think that the most frequent words per year have a direct relationship with the number of published articles per year by the most frequent authors, such the case of Adarme-James W, who published seven articles in 2014 in supply chain.
The participation per year of the countries listed in Table 2 is approximately constant, except for Ecuador, which only has publications since 2015
3.6. Bigraph analysis
In this section, we analyze the co-occurrence between the terms in two different columns of the bibliographic dataset. Fig. 8 shows that the association of the most frequent authors and the most frequent countries in the affiliation of articles. The matrix shows that author publish most frequently with authors of institutions of the same country, with exception of Restrepo OJ, Tinoco IF, and Osorio JA. Moreover, Fig. 8 presents the matrix of co-occurrence between authors and institutions, and it shows that the most frequent authors publish articles with co-authors that belongs to the same institution. In terms of international collaboration, Universidad Nacional de Colombia is the institution with largest and most varied participation in publishing with international institutions, as presented in Fig. 9.
Fig. 10 and Fig. 11 present the ratio between single-authored (SD) and multi-authored (MD) articles for institutions and countries respectively. Fig. 10 shows that it is more common for articles to be written by authors from different institutions than by authors from the same institution. However, it is more common that articles be written for authors from institutions in the same country (Fig. 11).
We also analyze the co-occurrence between author keywords and institutions, and author keywords and countries; no important relationships were found.
3.7. Term by term analysis
In this section, the relationships among between terms in the same column of the bibliographic dataset are analyzed. Fig. 12 shows the co-authorship relationships; the radius of the circle is proportional to the number of documents, and the darkness is proportional to the number of citations. This plot shows that Arango Serna MD publishes with Adarme-Jaimes W and Branch Bedoya W, and Tinoco IF with Osorio JA.
Fig. 13 presents the matrix of co-occurrence between institutions. The matrix was analyzed using graph theory with the aim of finding clusters of collaboration; by using the association index and the Louvain clustering algorithm, two clusters of collaboration were found: the first correspond to the Universidad Tecnológica de Pereira and Universidad Pontificia Bolivariana; and the second is formed by Universidad Pedagogica y Tecnologica de Colombia and Universidad Militar Nueva Granada. In addition, Fig. 13 shows that Universidad Nacional de Colombia writes articles with the rest of most frequent universities.
The previous analysis was also carried out for the most frequent countries and four clusters were found; the first cluster groups Brazil, Chile, Cuba and Ecuador; the second, Mexico, United States, Argentina and Portugal; the third and four clusters correspond to Colombia and Spain respectively.
Also, the co-occurrence of the keywords presented in Fig. 6 was analyzed. The clustering analysis of the co-occurrence matrix of co-author keywords using the Louvain algorithm allow us to find the nine clusters of clusters presented in Table 6. Practical experiences on using this methodology, shows that the Louvain algorithm tends to join clusters of different thematic that share common terms. Thus, cluster 1 mixes supply chain related problems with energy. Cluster 2 is related to materials engineering; cluster 3 is related to energy efficiency and renewable energies; cluster 4 to structural engineering; cluster 5 to management; cluster 6 to mathematical models and optimization. Clusters 7 and 8 are related to the application of computational intelligence to solve engineering problems; finally, cluster 9 to geostatistics. These topics will not be deepened, and the characterization of the thematic areas of the magazine is considered future work.
4. Analysis of documents citing the DYNA journal
4.1. Descriptive statistics
In this section, a bibliographical analysis of documents citing the DYNA Colombia journal is realized. Only documents belonging to the categories of Article, Article in Press, Book, Book Chapter, Review, and Short Survey were considered. According to Scopus information, there are 3083 different documents citing DYNA (Colombia), written by 10,974 authors, and an average of 2.98 authors per document. Authors are affiliated to 2.853 institutions in 97 countries. There are 3.525 multi-authored documents and 158 single-authored documents. Citing documents have an average of 43 references per document. Fig. 14 shows the number of documents citing DYNA by type of document. Most bibliographic citations come from articles (~ 91.5%).
4.2. Analysis by term
Fig. 15 presents the number of documents citing DYNA by source title for source titles with fifteen or more documents in the bibliographic dataset. The journal presents an important percentage of self-citing as discussed previously. DYNA is cited by 1.560 different source titles.
The most frequent authors citing DYNA (Colombia) are presented in Table 7. Table 8 present the ten most frequent countries citing DYNA (Colombia); the first four places are occupied by the most frequent countries in documents published by DYNA (Colombia).
* Authors with most published papers in the DYNA journal.
** Authors with most citations in the DYNA journal.
4.3. Analysis of term by year
Fig. 16 presents the number of documents by year per source title citing the DYNA journal. Note that a strong self-citation of the analyzed journal occurs from 2012 to 2016. Also, most of citations (23) in the Journal of Physics: Conference Series appears in 2019; finally, there are 15 citations from the Advances in Intelligent Systems and Computing journal in a period of three months (January to April of 2020), that it is a very high number for a quarter in comparison with an average of three citations by year for the most citing journals.
Fig. 17 presents the number of documents by author keyword per year in the citing journals. Note that the group of most frequent author keywords are very similar to the group of author keywords in Fig. 6.
4.4. Bigraph analysis
Fig. 18 presents the number of documents citing Dyna-Colombia by source title per author; most frequent authors citing Dyna-Colombia are authors publishing in Dyna (self-citing phenomena).
5. Conclusions
In this article, a methodology, called author analytics, is proposed to generate insight to made data-driven editorial decisions and to formulate editorial policies. The methodology is based on well-known methodologies and techniques used in systematic mapping of literature, business intelligence, descriptive analytics and tech mining. Insight are obtained by the integral analysis of authors and topics of the journal and the authors, topics, documents, and journals citing the journal under review. The proposed methodology is applied to the DYNA (Colombia) journal with the aim of obtaining an integral view of the scientific production published by the journal.