OpenCitations Meta: Data and services

cover
3 Jun 2024

Authors:

(1) Arcangelo Massari, Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy {arcangelo.massari@unibo.it};

(2) Fabio Mariani, Institute of Philosophy and Sciences of Art, Leuphana University, Lüneburg, Germany {fabio.mariani@leuphana.de};

(3) Ivan Heibi, Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy and Digital Humanities Advanced Research Centre (/DH.arc), Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy {ivan.heibi2@unibo.it};

(4) Silvio Peroni, Research Centre for Open Scholarly Metadata, Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy and Digital Humanities Advanced Research Centre (/DH.arc), Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy {silvio.peroni@unibo.it};

(5) David Shotton, Oxford e-Research Centre, University of Oxford, Oxford, United Kingdom {david.shotton@opencitations.net}.

4. Data and services

At the time of its initial release in December 2022, OpenCitations Meta included Crossref (Hendricks et al., 2020), DataCite (Brase, 2010), and the NIH Open Citation Collection (ICite et al., 2022) as its primary sources for the bibliographic metadata describing the publications involved in citations within the following OpenCitations Indexes: COCI (https://opencitations.net/index/coci) (OpenCitations, 2022), DOCI (https: //opencitations.net/index/doci), and POCI (https://opencitations.net/index/poci). From a quantitative point of view, there are within this initial release of OpenCitations Meta 98,243,101 bibliographic entities (fabio:Expression), 309,881,223 authors (pro:author), 2,406,510 editors (pro:editor), 19,076 publishers (pro:publisher), and 659,214 venues (e.g. resources of type fabio:AcademicProceedings, fabio:ExpressionCollection, fabio:Book, fabio:BookSeries, fabio:Journal, fabio:ReferenceBook, or fabio:Series). Thus, on average, each bibliographic resource has three authors. Typically no editor is recorded, as the latter metadata are little used in our sources. In total, the triplestore consists of 3,749,729,755 triples (excluding provenance).

Editors and authors have been counted as roles, without disambiguating the individuals holding these roles. Conversely, bibliographic entities, publishers, and venues were counted by OMID. However, for venues (e.g. journals), we have taken an extra precaution: many are duplicated in OpenCitations Meta because they have no identifiers other than the OMID. Therefore, in the figures shown above, we found it reasonable to disambiguate the venues by title in the absence of other identifiers.

As shown in Table 2, Springer Science is the publishing entity with the highest number of venues (2097), followed by Elsevier BV (1961) and IEEE (1775). When counting the number of publications, Elsevier is in the lead (16,933,610), followed by Springer Science (11,507,498) and Wiley (7,262,893) in Table 3.

Considering the venues in Table 4, Wiley’s ChemInform has the most publications (421,735), followed by Elsevier’s SSRN Electronic Journal (337,223) and Springer’s Journal On Data Semantics (330,093).

Table 5 lists all the types of bibliographic resources in OpenCitations Meta. The current dataset contains mostly journal articles (67,904,323), which exceed the number of book chapters in second place (6,476,623) by about ten times, and proceedings articles in third place (5,046,165) by about thirteen times.

Table 6, which lists the number of publications per year, shows an increasing trend, with a greater number of publications from year to year.

Table 2: The top ten publishers by number of venues

Table 3: The top ten publishers by number of publications

Table 4: The top ten venues by number of publications

Table 5: All the bibliographic resource types involved in OpenCitations Meta, sorted by the number of publications of that type. The reference ontologies are FaBiO (http://purl.org/spar/fabio), DOCO (http://purl.org/spar/doco), and FAIR reviews (http://purl.org/spar/fr)

Table 6: Top ten years of publication by the number of publications in that year

OpenCitations Meta allows the users to explore such data either via SPARQL (https://opencitations.net/meta/sparql) or via an API (https://opencitations.net/meta/ api/v1). In particular, the OpenCitations Meta API retrieves a list of bibliographic resources and related metadata starting from one or more publication identifiers, an author’s ORCID, or an editor’s ORCID. Textual searches are currently under testing and will be released in the future as one further operation of the OpenCitations Meta API. In particular, text searches on titles, authors, editors, publishers, IDs, and venues can be performed. They can also be achieved on volume and issue numbers, provided the venue is first specified. Indeed, searches on multiple fields can be combined using the Boolean conjunction and disjunction operators. For example, once the operation is released, the user will be able to search for all bibliographic resources whose title contains the word “micro-chaos” published either by Philosophical Studies or the Journal of Nonlinear Science: title=micro-chaos&&venue=philosophical%20studies||title=microchaos&&venue=journal%20of%20nonlinear%20science, where “&&” is the conjunction operator, while || is the disjunction operator.

Finally, all data and provenance are available as dumps in RDF (JSON-LD) (OpenCitations, 2023b) or CSV format (OpenCitations, 2023a) under a CC0 licence.

This paper is available on arxiv under CC 4.0 DEED license.