https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Local ecological evidence is key to informing conservation. However, many global biodiversity indicators often neglect local ecological evidence published in languages other than English, potentially biassing our understanding of biodiversity trends in areas where English is not the dominant language. Brazil is a megadiverse country with a thriving national scientific publishing landscape. Here, using Brazil and a species abundance indicator as examples, we assess how well bilingual literature searches can both improve data coverage for a country where English is not the primary language and help tackle biases in biodiversity datasets. We conducted a comprehensive screening of articles containing abundance data for vertebrates published in 59 Brazilian journals (articles in Portuguese or English) and 79 international English-only journals. These were grouped into three datasets according to journal origin and article language (Brazilian-Portuguese, Brazilian-English and International). We analysed the taxonomic, spatial and temporal coverage of the datasets, compared their average abundance trends and investigated predictors of such trends with a modelling approach. Our results showed that including data published in Brazilian journals, especially those in Portuguese, strongly increased representation of Brazilian vertebrate species (by 10.1 times) and populations (by 7.6 times) in the dataset. Meanwhile, international journals featured a higher proportion of threatened species. There were no marked differences in spatial or temporal coverage between datasets, in spite of different bias towards infrastructures. Overall, while country-level trends in relative abundance did not substantially change with the addition of data from Brazilian journals, uncertainty considerably decreased. We found that population trends in international journals showed stronger and more frequent decreases in average abundance than those in national journals, regardless of whether the latter were published in Portuguese or English. Policy implications. Collecting data from local sources markedly further strengthens global biodiversity databases by adding species not previously included in international datasets. Furthermore, the addition of these data helps to understand spatial and temporal biases that potentially influence abundance trends at both national and global level. We show how incorporating non-English-language studies in global databases and indicators could provide a more complete understanding of biodiversity trends and therefore better inform global conservation policy. Methods Data collection We collected time-series of vertebrate population abundance suitable for entry into the LPD (livingplanetindex.org), which provides the repository for one of the indicators in the GBF, the Living Planet Index (LPI, Ledger et al., 2023). Despite the continuous addition of new data, LPI coverage remains incomplete for some regions (Living Planet Report 2024 – A System in Peril, 2024). We collected data from three sets of sources: a) Portuguese-language articles from Brazilian journals (hereafter “Brazilian-Portuguese” dataset), b) English-language articles from Brazilian journals (“Brazilian-English” dataset) and c) English-language articles from non-Brazilian journals (“International” dataset). For a) and b), we first compiled a list of Brazilian biodiversity-related journals using the list of non-English-language journals in ecology and conservation published by the translatE project (www.translatesciences.com) as a starting point. The International dataset was obtained from the LPD team and sourced from the 78 journals they routinely monitor as part of their ongoing data searches. We excluded journals whose scope was not relevant to our work (e.g. those focusing on agroforestry or crop science), and taxon-specific journals (e.g. South American Journal of Herpetology) since they could introduce taxonomic bias to the data collection process. We considered only articles published between 1990 and 2015, and thus further excluded journals that published articles exclusively outside of this timeframe. We chose this period because of higher data availability (Deinet et al., 2024), since less monitoring took place in earlier decades, and data availability for the last decade is also not as high as there is a lag between data being collected and trends becoming available in the literature. Finally, we excluded any journals that had inactive links or that were no longer available online. While we acknowledge that biodiversity data are available from a wider range of sources (grey literature, online databases, university theses etc.), here we limited our searches to peer-reviewed journals and articles published within a specific timeframe to standardise data collection and allow for comparison between datasets. We screened a total of 59 Brazilian journals; of these, nine accept articles only in English, 13 only in Portuguese and 37 in both languages. We systematically checked all articles of all issues published between 1990 and 2015. Articles that appeared to contain abundance data for vertebrate species based on title and/or abstract were further evaluated by reading the material and methods section. For an article to be included in our dataset, we followed the criteria applied for inclusion into the LPD (livingplanetindex.org/about_index#data): a) data must have been collected using comparable methods for at least two years for the same population, and b) units must be of population size, either a direct measure such as population counts or densities, or indices, or a reliable proxy such as breeding pairs, capture per unit effort or measures of biomass for a single species (e.g. fish data are often available in one of the latter two formats). Assessing search effectiveness and dataset representation We calculated the encounter rate of relevant articles (i.e. those that satisfied the criteria for inclusion in our datasets) for each journal as the proportion of such articles relative to the total number of articles screened for that journal. We assessed the taxonomic representation of each dataset by calculating the percentage of species of each vertebrate group (all fishes combined, amphibians, reptiles, birds and mammals) with relevant abundance data in relation to the number of species of these groups known to occur in Brazil. The total number of known species for each taxon was compiled from national-level sources (amphibians, Segalla et al. 2021; birds, (Pacheco et al., 2021); mammals, Abreu et al. 2022; reptiles, Costa, Guedes and Bérnils, 2022) or through online databases (Fishbase, Froese and Pauly, 2024). We calculated accumulation curves using 1,000 permutations and applying the rarefaction method, using the vegan package (Jari Oksanen et al., 2024). These represent the cumulative number of new species added with each article containing relevant data, allowing us to assess how additional data collection could increase coverage of abundance data across datasets. To compare species threat status among datasets, we used the category for each species available in the Brazilian (‘Sistema de Avaliação do Risco de Extinção da Biodiversidade – SALVE’, 2024) and IUCN Red List (IUCN, 2024), and calculated the percentage of species in each category per dataset. To assess and compare the temporal coverage of the different datasets, we calculated the number of populations and species across time. To assess geographic gaps, we mapped the locations of each population using QGIS version 3.6 (QGIS Development Team, 2019). We then quantified the bias of terrestrial records towards proximity to infrastructures (airports, cities, roads and waterbodies) at a 0.5º resolution (circa 55.5 km x 55.5 km at the equator) and a 2º buffer using posterior weights from the R package sampbias (Zizka, Antonelli and Silvestro, 2021). Higher posterior weights indicate stronger bias effect. Generalised linear mixed models and population abundance trends We used the rlpi R package (Freeman et al., 2017) to calculate trends in relative abundance. We calculated the average lambda (logged annual rate of change) for each time-series by averaging the lambda values across all years between the start and the end year of the time-series. We then built generalised linear mixed models (GLMM) to test how average lambdas changed across language (Portuguese vs English), journal origin (national vs international), and taxonomic group, using location, journal name, and species as random intercepts (Table 1). We offset these by the number of sampled years to adjust summed lambda to a standardised measure, to allow comparison across different observations with different length of time series and plotted the beta coefficients (effect sizes) of all factors. Finally, we performed a post-hoc test to check pairwise differences between taxonomic groups (Table S2). To assess the influence of national-level data on global trends in relative abundance, we calculated the trends for both the International dataset and the two combined Brazilian datasets (Brazilian-Portuguese and Brazilian-English), using only years for which data were available for more than one species, to be able to estimate trend variation. We also plotted the trends for the Brazilian datasets separately. All analyses were performed in R 4.4.1 (R Core Team, 2024).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Unemployment Rate in Brazil decreased to 5.80 percent in June from 6.20 percent in May of 2025. This dataset provides the latest reported value for - Brazil Unemployment Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Welcome! This is a Brazilian ecommerce public dataset of orders made at Olist Store. The dataset has information of 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil. Its features allows viewing an order from multiple dimensions: from order status, price, payment and freight performance to customer location, product attributes and finally reviews written by customers. We also released a geolocation dataset that relates Brazilian zip codes to lat/lng coordinates.
This is real commercial data, it has been anonymised, and references to the companies and partners in the review text have been replaced with the names of Game of Thrones great houses.
We have also released a Marketing Funnel Dataset. You may join both datasets and see an order from Marketing perspective now!
Instructions on joining are available on this Kernel.
This dataset was generously provided by Olist, the largest department store in Brazilian marketplaces. Olist connects small businesses from all over Brazil to channels without hassle and with a single contract. Those merchants are able to sell their products through the Olist Store and ship them directly to the customers using Olist logistics partners. See more on our website: www.olist.com
After a customer purchases the product from Olist Store a seller gets notified to fulfill that order. Once the customer receives the product, or the estimated delivery date is due, the customer gets a satisfaction survey by email where he can give a note for the purchase experience and write down some comments.
https://i.imgur.com/JuJMns1.png" alt="Example of a product listing on a marketplace">
The data is divided in multiple datasets for better understanding and organization. Please refer to the following data schema when working with it:
https://i.imgur.com/HRhd2Y0.png" alt="Data Schema">
We had previously released a classified dataset, but we removed it at Version 6. We intend to release it again as a new dataset with a new data schema. While we don't finish it, you may use the classified dataset available at the Version 5 or previous.
Here are some inspiration for possible outcomes from this dataset.
NLP:
This dataset offers a supreme environment to parse out the reviews text through its multiple dimensions.
Clustering:
Some customers didn't write a review. But why are they happy or mad?
Sales Prediction:
With purchase date information you'll be able to predict future sales.
Delivery Performance:
You will also be able to work through delivery performance and find ways to optimize delivery times.
Product Quality:
Enjoy yourself discovering the products categories that are more prone to customer insatisfaction.
Feature Engineering:
Create features from this rich dataset or attach some external public information to it.
Thanks to Olist for releasing this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The archaeological database is composed mainly of radiocarbon ages, and they were subject to calibration using the CalPal program (Weninger and Jöris 2008), version 2020.11.
In some portions of Brazil, OSL and TL ages were widely used, and we had to cope with this issue by means of a “reverse calibration”, i.e., entering ages into the CalPal program and running the calibration until a given (fictious) radiocarbon age, once calibrated, matched as closely as possible the luminescence age. The TL/OSL ages are marked in gray, with blue numbers standing for the “fictious” age, and the actual age appearing in the “Cal age” column.
We also decided to be inclusive in our database, and by this we mean that we are making available all radiocarbon or luminescence ages that were considered bona fide by the researchers who published them, regardless of the fact that other researchers consider these ages inconsistent with their own models or beliefs. The same goes for papers that select ages based on the standard deviations. A large standard deviation means low precision, not necessarily low accuracy. We take for granted that judgements about the appropriateness of the ages can be made individually by the reader, since we provided the full references. We prefer to publish an age with a large associated error than to ignore it. Once again, since we are providing the tables as supplementary material, it is up to the reader to disregard specific ages and run his/her own analysis.
In terms of the geographic location of the sites/ages, we chose to provide UTM coordinates of the nearest municipality, instead of providing “exact” locations. This decision was made on three grounds: first, in the scale of analysis we are presenting, the location of the nearest municipality is more than sufficient to provide an adequate overview of the spatial distribution of the ages; second, the majority of the sites published before the advent and popularization of handheld GPS devices do not have an accurate location and therefore, to provide an “exact” location would be meaningless; third, when trying to plot sites using available databases, be they compilations of data or first publications of a given site, it is common to observe that the apparently “exact” geographic coordinates were plainly wrong, falling outside a given region or even the state. This is something that plagues large databases, generally compiled by several researchers and their students, so we argue that it is much easier to detect errors and convey the right location of a given site, at least approximately, if the municipality is taken into account. Hence, our database has a redundant location scheme: state, municipality, and UTM coordinates. If by some reason the UTM is wrong, the reader at least knows in which state and municipality it is located. The only exception to this procedure was made in the Amazon region (states of Amazonas, Pará, Maranhão, Rondonia). Municipalities in the region are fairly large, and we chose to plot the location of the site when it was considered to be too far away from the nearest urbanized area.
This harmonized set of soil parameter estimates for Brazil. The 1:5M scale Soil and Terrain Database for Latin America and the Caribbean (FAO et al. 1998), provided the basis for the current study. The data set has been prepared for the project on Assessment of soil organic carbon stocks and change at national scale (GEF-SOC), which has the Brazilian Amazon as one of its four case ... study areas.
The land surface of Brazil has been characterized using 299 unique SOTER units, corresponding with 839 polygons. The major soils have been described using 584 profiles, selected by national soil experts as being representative for these units. The associated soil analytical data have been derived from soil survey reports. Gaps in the measured soil profile data have been filled using a step-wise procedure which includes three main stages: (1) collating additional measured soil analytical data where available; (2) filling gaps using expert knowledge and common sense; (3) filling the remaining gaps using a scheme of taxotransfer rules. Parameter estimates are presented by soil unit for fixed depth intervals of 0.2 m to 1 m depth for: organic carbon, total nitrogen, pH(H2O), CECsoil, CECclay, base saturation, effective CEC, aluminum saturation, CaCO3 content, gypsum content, exchangeable sodium percentage (ESP), electrical conductivity of saturated paste (ECe), bulk density, content of sand, silt and clay, content of coarse fragments (less than 2mm), and available water capacity (-33 to -1500 kPa). These attributes have been identified as being useful for agro-ecological zoning, land evaluation, crop growth simulation, modelling of soil carbon stocks and change, and analyses of global environmental change.
The current parameter estimates should be seen as best estimates based on the current selection of soil profiles and data clustering procedure. Taxotransfer rules have been flagged to provide an indication of the possible confidence in the derived data. Results are presented as summary files and can be linked to the 1:5 M scale SOTER map in a GIS, through the unique SOTER-unit code. The subset for the Amazon region, the Brazilian GEF-SOC case study area, has been clipped out of the national set using GIS. It includes 193 unique SOTER units, corresponding with 571 mapped polygons. The secondary data set is considered appropriate for studies at the national scale and regional scale (greater than1:5M). Correlation of soil analytical data, however, should be done more rigorously when more detailed scientific work is considered.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The USD/BRL exchange rate fell to 5.5393 on August 1, 2025, down 1.10% from the previous session. Over the past month, the Brazilian Real has weakened 2.11%, but it's up by 3.32% over the last 12 months. Brazilian Real - values, historical data, forecasts and news - updated on August of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Temperature Time-Series for some Brazilian cities’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/volpatto/temperature-timeseries-for-some-brazilian-cities on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Do you ever wonder how are temperatures in Brazilian cities? Too hot? Cold weather sometimes? And what about climate changes? Is Brazil getting hotter?
This is your chance to check it out!
This datasets are collected in order to provide some answers for the above question through Data Analysis. Maybe you want to try some Machine Learning model in order to practice and predict the evolution of temperature in some Brazilian cities.
The content is provided by NOAA GHCN v4 and post-processed by NASA's GISTEMP v4.
In summary, each data file contains a temperature time series for a station named according to the city. The time series provides temperature records by month for each year. Some mean measurement is calculated, like metANN
and D-J-F
. I can't give details about these quantities, nor how they are calculated. Please refer for NASA GISTEMP website in this regard. The most important seems to be metANN
, which is an annual temperature mean.
These datasets are provided through NASA's GISTEMP v4 and recorded by NOAA GHCN v4. Thanks for researchers and staffs for the really nice work!
--- Original source retains full ownership of the source dataset ---
This data set contains actual sales data for a chain of Brazilian stores. I modified the names of products, customers, and employees to preserve their identity. I am making this data available so that they can help me get the most out of it, analysis such as:
Sales forecast
Customer segmentation
Employee productivity
Profitable products
And everything else that can be extracted from it.
Columns description
Company Code - Affiliate code that sold Order Number - Unique code to identify the sale Employee - Employee who made the sale Product - Name of product sold Product Category - category the product belongs to Client - Name of the customer who made the purchase Client City - City name of the customer who made the purchase Sale Date Time - Date and time the sale was made Product Cost - Cost per unit sold Discount Amount - Total sale discount Amount - Item Quantity Total - Total item value Form of payment - Form of payment
The column values: - Client - Client City - Employee They were exchanged for fictitious names.
The category of the products was maintained, but translated into English, the name of the product consists of the name of the category to which it belongs concatenated with a random number. The rule does not apply to products in the Fuel category, for these, fictitious names were invented.
Landscapes in the Anthropocene are dominated by agroecosystems and, although they can provide resources for biodiversity, most landscape ecology studies overlook agroecosystems importance for conservation. However, understanding the influence of agroecosystems on biodiversity requires measure specific variables and performs complex analyses that demand a planning system which is still lacking. In this project, we combine concepts focused on agricultural landscapes, providing a framework to deal with these elements in landscape ecology to analyse biodiversity and ecosystem services dynamics through time. We consider the spatial and temporal characteristics of agroecosystems in a multi-taxa perspective, pointing out the role of structural, functional and temporal heterogeneity, functional connectivity, habitat quality, and habitat/matrix edge influence in biodiversity. Additionally, we account for the effects of farmers’ choices, external drivers, such as technical constraints, bank loan requirements, public policies, and market as important regulators of the dynamics of agroecosystems.
The Census of Agriculture investigates information on agricultural establishments and agricultural activities developed inside them, including characteristics of the producers and establishments, economy and employment in the rural area, livestock, cropping and agribusiness. Its data collection unit is every production unit dedicated, either entirely or partially, to agricultural, forest or aquaculture activities, subordinated to a single administration – producer or administrator –, regardless of its size, legal nature or location, aiming at producing either for living or sales.
The first Census of Agriculture dates back to 1920, and it was conducted as part of the General Census. It did not take place in the 1930s due to reasons of political and institutional nature. From 1940 onward, the survey was decennial up to 1970 and quinquennial later on, taking place in the beginning of the years ending in 1 and 6 and relating to the years ending in 0 and 5. In the 1995-1996 Census of Agriculture, the information was related to the crop year (August 1995 to July 1996). In the 2006 Census of Agriculture, the reference for the data returned to be the calendar year. The 2006 edition was characterized both by the technological innovation introduced in the field operation, in which the paper questionnaire was replaced by the electronic questionnaire developed in Personal Digital Assistants - PDAs and by the methodological refinement, particularly concerning the redesign of its contents and incorporation of new concepts. That edition also implemented the National Address List for Statistical Purposes - Cnefe, which gathers the detailed description of the addresses of housing units and agricultural establishments, geographic coordinates of every housing unit and establishment (agricultural, religious, education, health and other) in the rural area, bringing subsidies for the planning of future IBGE surveys. The 2017 Census of Agriculture returned to reference the crop year – October 2016 to September 2017 –, though in a different period than that adopted in the 1995-1996 Census of Agriculture. New technologies were introduced in the 2017 survey to control the data collection, like: previous address list, use of satellite images in the PDAs to better locate the enumerator in relation to the terrain, and use of coordinates of the address and location where the questionnaire is open, which allowed a better coverage and assessment of the work.
The survey provides information on the total agricultural establishments; total area of those establishments; characteristics of the producers; characteristics of the establishments (use of electricity, agricultural practices, use of fertilization, use of agrotoxins, use of organic farming, land use, existence of water resources, existence of warehouses and silos, existence of tractors, machinery and agricultural implements, and vehicles, among other aspects); employed personnel; financial transactions; livestock (inventories and animal production); aquaculture and forestry (silviculture, forestry, floriculture, horticulture, permanent crops, temporary crops and rural agribusiness).
The periodicity of the survey is quinquennial, though the surveys in 1990, 1995, 2000 and 2005, 2010 and 2015 were not carried out due to budget restrictions from the government; the 1990 Census of Agriculture did not take place; the 1995 survey was carried out in 1996 together with the Population Counting; the 2000 survey did not take place; that of 2005 was carried out in 2007, together with the Population Counting once again; that of 2010 did not take place and that of 2015 was carried out in 2017. Its geographic coverage is national, with results disclosed for Brazil, Major Regions, Federation Units, Mesoregions, Microregions and Municipalities. The results of the 2006 Census of Agriculture, which has the calendar year as the reference period, are not strictly comparable with those from the 1995-1996 Census of Agriculture and 2017 Census of Agriculture, whose reference period is the crop year in both cases.
National coverage
Households
The statistical unit was the agricultural holding, defined as any production unit dedicated wholly or partially to agricultural, forestry and aquaculture activities, subject to a single management, with the objective of producing for sale or subsistence, regardless of size, legal form (own, partnership, lease, etc.) or location (rural or urban). The agricultural holdings were classified according to the legal status of the producer as: individual holder, condominium, consortium or partnership; cooperative; incorporated or limited liability company; public utility institutions (church, NGO, hospital), or government.
Census/enumeration data [cen]
(a) Frame The 2000 Population and Housing Census and the cartographic documentation constituted the source of the AC 2006 frame. No list frames were available in digital media with georeferenced addresses of the holdings. Census coverage was ensured on the basis of the canvassing of the EAs by enumerators.
(b) Complete and/or sample enumeration methods The AC 2006 was a complete enumeration operation of all agricultural holdings in the country.
Face-to-face [f2f]
An electronic questionnaire was used for data collection on:
Total agricultural establishments Total area of agricultural establishments Total area of crops Area of pastures Area of woodlands Total tractors Implements Machinery and vehicles Characteristics of the establishment and of the producer Total staff employed Total cattle, buffallo, goats, Sheep, pigs, poultry (chickens, fowls, chickens and chicks) Other birds (ducks, geese, teals, turkeys, quails, ostriches, partridges, pheasants and others) Plant production
The AC 2006 covered all 16 items recommended by FAO under the WCA 2010.
(a) DATA PROCESSING AND ARCHIVING The entire data collection and supervision software was developed in house by IBGE, using the Visual Studio platform in the Microsoft Operations Manager 2005 environment and Microsoft SQL Server 2000, with the assistance of Microsoft Brazil consulting. In addition, the GEOPAD application was installed to view, navigate and view maps and use GPS guidance. Updated versions of the software were installed automatically as soon as census enumerators connected the PDAs to the central server to transmit the data collected. Once internally validated by the device, the data were immediately transmitted to the database at the IBGE state unit. The previous AC (1996) served as the basis for defining the parameter values for the electronic editing process.
(b) CENSUS DATA QUALITY Automatic validation was incorporated into PDAs. Previously programmed skip patterns and real-time edits, performed during enumeration, ensured faster and more reliable interviews. In addition, the Bluetooth® technology incorporated into the PDAs allowed for direct data transmission to IBGE's central mainframe by each of enumerators on a weekly basis.
The preliminary census results were published in 2007. The final results were released in 2009 through a printed volume and CD-ROMs. The census results were disseminated at the national and subnational scope (country, state and municipality) and are available online at IBGE's website.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract: Introduction: Clinical reasoning is considered one of the main skills that must be developed by medical students, as it allows the establishment of diagnostic hypotheses and directs investigative and diagnostic strategies using a rational approach. Although educators have traditionally focused the teaching method on the analytical model, many medical professors face the challenge in their daily lives of finding new strategies to help their students develop clinical reasoning. Objective: To carry out an integrative literature review to identify the strategies used in the teaching-learning process of clinical reasoning in Brazilian medical schools. Method: The methodology used consists of six steps: 1. creation of the research question; 2. definition of inclusion and exclusion criteria; 3. list of information to be extracted; 4. evaluation of included studies; 5. interpretation of results and 6. presentation of the review. Results: Most studies indicate that the teaching of clinical reasoning is carried out through discussions of clinical cases, incidentally, in different disciplines or through the use of active methodologies such as PBL, TBL and CBL. Only three studies presented at conferences disclosed experiences related to the implementation of a mandatory curricular discipline specifically aimed at teaching clinical reasoning. The teaching of clinical reasoning is prioritized in internships in relation to the clinical and pre-clinical phases. Final considerations: There are few studies that analyze how clinical reasoning is taught to medical students in Brazilian medical schools. Although more studies are needed, we can observe the lack of theoretical knowledge about clinical reasoning as one of the main causes of the students’ difficulty in developing clinical reasoning.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Local ecological evidence is key to informing conservation. However, many global biodiversity indicators often neglect local ecological evidence published in languages other than English, potentially biassing our understanding of biodiversity trends in areas where English is not the dominant language. Brazil is a megadiverse country with a thriving national scientific publishing landscape. Here, using Brazil and a species abundance indicator as examples, we assess how well bilingual literature searches can both improve data coverage for a country where English is not the primary language and help tackle biases in biodiversity datasets. We conducted a comprehensive screening of articles containing abundance data for vertebrates published in 59 Brazilian journals (articles in Portuguese or English) and 79 international English-only journals. These were grouped into three datasets according to journal origin and article language (Brazilian-Portuguese, Brazilian-English and International). We analysed the taxonomic, spatial and temporal coverage of the datasets, compared their average abundance trends and investigated predictors of such trends with a modelling approach. Our results showed that including data published in Brazilian journals, especially those in Portuguese, strongly increased representation of Brazilian vertebrate species (by 10.1 times) and populations (by 7.6 times) in the dataset. Meanwhile, international journals featured a higher proportion of threatened species. There were no marked differences in spatial or temporal coverage between datasets, in spite of different bias towards infrastructures. Overall, while country-level trends in relative abundance did not substantially change with the addition of data from Brazilian journals, uncertainty considerably decreased. We found that population trends in international journals showed stronger and more frequent decreases in average abundance than those in national journals, regardless of whether the latter were published in Portuguese or English. Policy implications. Collecting data from local sources markedly further strengthens global biodiversity databases by adding species not previously included in international datasets. Furthermore, the addition of these data helps to understand spatial and temporal biases that potentially influence abundance trends at both national and global level. We show how incorporating non-English-language studies in global databases and indicators could provide a more complete understanding of biodiversity trends and therefore better inform global conservation policy. Methods Data collection We collected time-series of vertebrate population abundance suitable for entry into the LPD (livingplanetindex.org), which provides the repository for one of the indicators in the GBF, the Living Planet Index (LPI, Ledger et al., 2023). Despite the continuous addition of new data, LPI coverage remains incomplete for some regions (Living Planet Report 2024 – A System in Peril, 2024). We collected data from three sets of sources: a) Portuguese-language articles from Brazilian journals (hereafter “Brazilian-Portuguese” dataset), b) English-language articles from Brazilian journals (“Brazilian-English” dataset) and c) English-language articles from non-Brazilian journals (“International” dataset). For a) and b), we first compiled a list of Brazilian biodiversity-related journals using the list of non-English-language journals in ecology and conservation published by the translatE project (www.translatesciences.com) as a starting point. The International dataset was obtained from the LPD team and sourced from the 78 journals they routinely monitor as part of their ongoing data searches. We excluded journals whose scope was not relevant to our work (e.g. those focusing on agroforestry or crop science), and taxon-specific journals (e.g. South American Journal of Herpetology) since they could introduce taxonomic bias to the data collection process. We considered only articles published between 1990 and 2015, and thus further excluded journals that published articles exclusively outside of this timeframe. We chose this period because of higher data availability (Deinet et al., 2024), since less monitoring took place in earlier decades, and data availability for the last decade is also not as high as there is a lag between data being collected and trends becoming available in the literature. Finally, we excluded any journals that had inactive links or that were no longer available online. While we acknowledge that biodiversity data are available from a wider range of sources (grey literature, online databases, university theses etc.), here we limited our searches to peer-reviewed journals and articles published within a specific timeframe to standardise data collection and allow for comparison between datasets. We screened a total of 59 Brazilian journals; of these, nine accept articles only in English, 13 only in Portuguese and 37 in both languages. We systematically checked all articles of all issues published between 1990 and 2015. Articles that appeared to contain abundance data for vertebrate species based on title and/or abstract were further evaluated by reading the material and methods section. For an article to be included in our dataset, we followed the criteria applied for inclusion into the LPD (livingplanetindex.org/about_index#data): a) data must have been collected using comparable methods for at least two years for the same population, and b) units must be of population size, either a direct measure such as population counts or densities, or indices, or a reliable proxy such as breeding pairs, capture per unit effort or measures of biomass for a single species (e.g. fish data are often available in one of the latter two formats). Assessing search effectiveness and dataset representation We calculated the encounter rate of relevant articles (i.e. those that satisfied the criteria for inclusion in our datasets) for each journal as the proportion of such articles relative to the total number of articles screened for that journal. We assessed the taxonomic representation of each dataset by calculating the percentage of species of each vertebrate group (all fishes combined, amphibians, reptiles, birds and mammals) with relevant abundance data in relation to the number of species of these groups known to occur in Brazil. The total number of known species for each taxon was compiled from national-level sources (amphibians, Segalla et al. 2021; birds, (Pacheco et al., 2021); mammals, Abreu et al. 2022; reptiles, Costa, Guedes and Bérnils, 2022) or through online databases (Fishbase, Froese and Pauly, 2024). We calculated accumulation curves using 1,000 permutations and applying the rarefaction method, using the vegan package (Jari Oksanen et al., 2024). These represent the cumulative number of new species added with each article containing relevant data, allowing us to assess how additional data collection could increase coverage of abundance data across datasets. To compare species threat status among datasets, we used the category for each species available in the Brazilian (‘Sistema de Avaliação do Risco de Extinção da Biodiversidade – SALVE’, 2024) and IUCN Red List (IUCN, 2024), and calculated the percentage of species in each category per dataset. To assess and compare the temporal coverage of the different datasets, we calculated the number of populations and species across time. To assess geographic gaps, we mapped the locations of each population using QGIS version 3.6 (QGIS Development Team, 2019). We then quantified the bias of terrestrial records towards proximity to infrastructures (airports, cities, roads and waterbodies) at a 0.5º resolution (circa 55.5 km x 55.5 km at the equator) and a 2º buffer using posterior weights from the R package sampbias (Zizka, Antonelli and Silvestro, 2021). Higher posterior weights indicate stronger bias effect. Generalised linear mixed models and population abundance trends We used the rlpi R package (Freeman et al., 2017) to calculate trends in relative abundance. We calculated the average lambda (logged annual rate of change) for each time-series by averaging the lambda values across all years between the start and the end year of the time-series. We then built generalised linear mixed models (GLMM) to test how average lambdas changed across language (Portuguese vs English), journal origin (national vs international), and taxonomic group, using location, journal name, and species as random intercepts (Table 1). We offset these by the number of sampled years to adjust summed lambda to a standardised measure, to allow comparison across different observations with different length of time series and plotted the beta coefficients (effect sizes) of all factors. Finally, we performed a post-hoc test to check pairwise differences between taxonomic groups (Table S2). To assess the influence of national-level data on global trends in relative abundance, we calculated the trends for both the International dataset and the two combined Brazilian datasets (Brazilian-Portuguese and Brazilian-English), using only years for which data were available for more than one species, to be able to estimate trend variation. We also plotted the trends for the Brazilian datasets separately. All analyses were performed in R 4.4.1 (R Core Team, 2024).