Cobertura lingüística en las nuevas bases de datos

Rodrigo Sánchez-Jiménez

doi:10.47251/clip.n92.186

PDF (Español (España))

Published: Dec 17, 2025

DOI: https://doi.org/10.47251/clip.n92.186

Keywords:

Bibliographic databases, indexing biases, metadata quality, linguistic coverage, research visibility, open access

Rodrigo Sánchez-Jiménez

Departamento de Biblioteconomía y Documentación. Universidad Complutense de Madrid

Abstract

This study examines the linguistic coverage of five bibliographic databases, comparing traditional subscription-based models (Web of Science and Scopus) with emerging open infrastructures and aggregators such as OpenAlex, OpenAIRE, and SciLit. The analysis begins with the well-documented issue of geographic and linguistic biases inherent in classical sources, in order to assess whether these new platforms provide a more diverse representation of global science. To this end, the complete body of indexed output available in each source up to 2025 was retrieved and harmonized, and the distribution of the twenty most prevalent languages was analyzed to quantify differences in visibility across indexing models.

The results reveal stark disparities in the volume of non-English documents, with platforms like OpenAlex surfacing millions of records in Asian languages (Japanese, Indonesian, Korean) and Middle Eastern languages that remain largely invisible in commercial databases. However, the analysis also highlights the trade-off between quantity and metadata quality: whereas WoS and Scopus rely on editorial selection and OpenAIRE on a more “notarial” harvesting approach (with lower completeness), OpenAlex achieves massive coverage through algorithmic inference, introducing a certain margin of error. The study concludes that professionals and researchers now navigate two complementary ecosystems, needing to choose between the curated selectivity of the scientific elite and the more inclusive—though noisier—panorama of global science.

Downloads

Download data is not yet available.

How to Cite

Sánchez-Jiménez, R. (2025). Cobertura lingüística en las nuevas bases de datos. CLIP De SEDIC: Revista De La Sociedad Española De Documentación E Información Científica, (92), 45–54. https://doi.org/10.47251/clip.n92.186

Issue

No. 92 (2025): Clip de SEDIC, Revista de la Sociedad Española de Documentación e Información Científica, nº 92

Section

Panorama

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Author Biography

Rodrigo Sánchez-Jiménez, Departamento de Biblioteconomía y Documentación. Universidad Complutense de Madrid

Profesor Titular en la Universidad Complutense de Madrid, donde ejerce la docencia desde hace más de veinte años. Doctor en Documentación por la UCM en 2006, se ha especializado en cienciometría. Su actividad investigadora se centra en el análisis cuantitativo de la actividad científica, abordando líneas como los costes de publicación (APCs), las brechas de género y la evolución de las tesis doctorales. Asimismo, cuenta con una amplia experiencia en gestión editorial y colabora activamente con el grupo SCImago en el desarrollo de nuevos indicadores y el estudio de fuentes de datos abiertas.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Rodrigo Sánchez-Jiménez, Departamento de Biblioteconomía y Documentación. Universidad Complutense de Madrid