The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.
What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!
SELECT
age.country_name,
age.life_expectancy,
size.country_area
FROM (
SELECT
country_name,
life_expectancy
FROM
bigquery-public-data.census_bureau_international.mortality_life_expectancy
WHERE
year = 2016) age
INNER JOIN (
SELECT
country_name,
country_area
FROM
bigquery-public-data.census_bureau_international.country_names_area
where country_area > 25000) size
ON
age.country_name = size.country_name
ORDER BY
2 DESC
/* Limit removed for Data Studio Visualization */
LIMIT
10
Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.
SELECT
age.country_name,
SUM(age.population) AS under_25,
pop.midyear_population AS total,
ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25
FROM (
SELECT
country_name,
population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population_agespecific
WHERE
year =2017
AND age < 25) age
INNER JOIN (
SELECT
midyear_population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population
WHERE
year = 2017) pop
ON
age.country_code = pop.country_code
GROUP BY
1,
3
ORDER BY
4 DESC /* Remove limit for visualization*/
LIMIT
10
The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.
SELECT
growth.country_name,
growth.net_migration,
CAST(area.country_area AS INT64) AS country_area
FROM (
SELECT
country_name,
net_migration,
country_code
FROM
bigquery-public-data.census_bureau_international.birth_death_growth_rates
WHERE
year = 2017) growth
INNER JOIN (
SELECT
country_area,
country_code
FROM
bigquery-public-data.census_bureau_international.country_names_area
Historic (none)
United States Census Bureau
Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains carbon fluxes for the 10 largest countries in the world (here EU27 is treated as a country) using GOSAT and OCO-2 observational constraints for 2017-2019.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for GDP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This project analyzes the 2020 World Happiness Report to draw conclusions about the general well being of Africa. It uses several CSV files consisting of survey responses formed from a Google Form survey, data from the 2020 World Happiness Report and data on countries only in Africa from the 2020 World Happiness Report. The main data set used includes over 150 countries and their happiness scores, freedom to make life choices, social support, healthy life expectancy, regional indicator, perceptions of corruption and generosity. This analysis was done to answer the following data-driven questions: 'Which African country ranked the happiest in 2020?' and 'Which variable predicts or explains Africa's happiness score?'
This project includes several programs created in R and Python.
The Gallup World Poll (GWP) is conducted annually to measure and track public attitudes concerning political, social and economic issues, including controversial and sensitive subjects. Annually, this poll tracks attitudes toward law and order, institutions and infrastructure, jobs, well-being and other topics for approximately 150 countries worldwide. The data gathered from the GWP is used to create an annual World Happiness Report (WHR). The World Happiness Report is conducted to review the science of understanding and measuring the subjective well-being and to use survey measures of life satisfaction to track the quality of lives in over 150 countries.
At first glance, it seems that world happiness isn't important or maybe it's just an emotional thing. However, several governments have started to look at happiness as a metric to measure success. Happiness Scores or Subjective Well-being (SWB) are national average responses to questions of life evaluation. They are important because they remind policy makers and people in power that happiness is based on social capital, not just financial. Happiness is often considered an essential and useful way to guide public policies and measure their effectiveness. It is also important to note that happiness scores point out the importance of qualitative rather than quantitative. At times, quality is better than quantity.
Africa is the world's second largest and second most populous continent in the world. It consists of 54 countries meaning that Africa has the most countries. Africa has approximately 30% of the earth's mineral resources and has the largest reserves of precious metals. Africa reserves over 40% of the gold reserves, 60% on cobalt and 90% of platinum. However, Africa unfortunately has the most developmental challenges. It is the world's poorest and most underdeveloped continent. Africa is also almost 100% colonized with the exceptions of Ethiopia and Liberia. Given this information, one can wonder what the SWB or state of happiness is in Africa?
This site analyzes the 2020 World Happiness Report to draw conclusions to data-drive questions listed later on this page. The focus is specifically on countries in Africa. Even though there are 54 countries in Africa, only 43 participated in the 2020 WHR.
The dataset used is generated from the 'World Happiness Report 2020'. This dataset contains the Happiness Score for over 150 countries for the year of 2020. The data gathered from the Gallup World Poll gives a national average of Happiness scores for countries all over the world. It is a annual landmark survey of the state of global happiness.
This dataset is from the data repository "Kaggle". On Kaggle's dataset page, I searched for Africa Happiness after filtering the search to CSV file type. I wasn't able to find any datasets that could answer my questions that didn't include other countries from different continents. I decided to use a Global Happiness Report to answer the questions I have. The dataset I am using was publish by Micheal Londeen and it was created on March 24, 2020. His main source is the World Happiness Report for 2020.
Happiness score or subjective well-being (variable name ladder ): The survey measure of SWB is from the Feb 28, 2020 release of the Gallup World Poll (GWP) covering years from 2005 to 2019. Unless stated otherwise, it is the national average response to the question of life evaluations. The English wording of the question is “Please imagine a ladder, with steps numbered from 0 at the bottom to 10 at the top. The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. On which step of the ladder would you say you personally feel you stand at this time?” This measure is also referred to as Cantril life ladder, or just life ladder in our analysis.
Healthy Life Expectancy (HLE). Healthy life expectancies at birth are based on the data extracted from the World Health Organization’s (WHO) Global Health Observatory dat...
Even though Canada is the second largest country in the world in terms of land area, it ranks 33rd in terms of population. Almost all of Canada’s population is concentrated in a narrow band along the country’s southern edge. Nearly 80% of the total population lives within the 25 major metropolitan areas, which represent only 0.79% of the total area of the country.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for GOLD RESERVES reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Supply Chain Shipment Pricing Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/f130af56-ebf3-447f-b426-7d3b6f204c4d on 12 February 2022.
--- Dataset description provided by original source is as follows ---
This data set provides supply chain health commodity shipment and pricing data. Specifically, the data set identifies Antiretroviral (ARV) and HIV lab shipments to supported countries. In addition, the data set provides the commodity pricing and associated supply chain expenses necessary to move the commodities to countries for use. The dataset has similar fields to the Global Fund's Price, Quality and Reporting (PQR) data. PEPFAR and the Global Fund represent the two largest procurers of HIV health commodities. This dataset, when analyzed in conjunction with the PQR data, provides a more complete picture of global spending on specific health commodities. The data are particularly valuable for understanding ranges and trends in pricing as well as volumes delivered by country. The US Government believes this data will help stakeholders make better, data-driven decisions. Care should be taken to consider contextual factors when using the database. Conclusions related to costs associated with moving specific line items or products to specific countries and lead times by product/country will not be accurate.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for GDP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
http://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html
The Counter-Trafficking Data Collaborative is the first global data hub on human trafficking, publishing harmonized data from counter-trafficking organizations around the world. Launched in November 2017, the goal of CTDC is to break down information-sharing barriers and equip the counter-trafficking community with up to date, reliable data on human trafficking.
The CTDC global victim of trafficking dataset is the largest of its kind in the world, and currently exists in two forms. The data are based on case management data, gathered from identified cases of human trafficking, disaggregated at the level of the individual. The cases are recorded in a case management system during the provision of protection and assistance services, or are logged when individuals contact a counter-trafficking hotline. The number of observations in the dataset increases as new records are added by the contributing organizations. The global victim of trafficking dataset that is available to download from the website in csv format has been mathematically anonymized, and the complete, non k-anonymized version of the dataset is displayed throughout the website through visualizations and charts showing detailed analysis.
The data come from a variety of sources. The data featured in the global victim of trafficking dataset come from the assistance activities of the contributing organizations, including from case management services and from counter-trafficking hotline logs.
Each dataset has been created through a process of comparing and harmonizing existing data models of contributing partners and data classification systems. Initial areas of compatibility were identified to create a unified system for organizing and mapping data to a single standard. Each contributing organization transforms its data to this shared standard and any identifying information is removed before the datasets are made available.
Counter-trafficking case data contains highly sensitive information, and maintaining privacy and confidentiality is of paramount importance for CTDC. For example, all explicit identifiers, such as names, were removed from the global victim dataset and some data such as age has been transformed into age ranges. No personally identifying information is transferred to or hosted by CTDC, and organizations that want to contribute are asked to anonymize in accordance to the standards set by CTDC.
In addition to the safeguard measures outlined in step 1 the global victim dataset has been anonymized to a higher level, through a mathematical approach called k-anonymization. For a full description of k-anonymization, please refer to the definitions page.
IOM collects and processes data in accordance to its own Data Protection Policy. The other contributors adhere to relevant national and international standards through their policies for collecting and processing personal data.
These data reflect the victims assisted/identified/referred/reported to the contributing organizations, which may not represent all victims identified within a country. Nevertheless, the larger the sample size for a given country (or, the more victims displayed on the map for a given country), the more representative the data are likely to be of the identified victim of trafficking population.
A larger number of identified victims of trafficking does not imply that there is a larger number of undetected victims of trafficking (i.e. a higher prevalence of trafficking).
In addition, samples of identified victims of trafficking cannot be considered random samples of the wider population of victims of trafficking (which includes unidentified victims), since counter-trafficking agencies may be more likely to identify some trafficking cases rather than others. However, with this caveat in mind, the profile of identified victims of trafficking tends to be considered as indicative of the profile of the wider population, given that the availability of other data sources is close to zero.
There are currently no global or regional estimates of the prevalence of human trafficking. National estimates have been conducted in a few countries but they are also based on modelling of existing administrative data from identified cases and should therefore only be considered as basic baseline estimates. Historically, producing estimates of the prevalence of trafficking based on the collection of new primary data through surveys, for example, has been difficult. This is due to trafficking’s complicated legal definition and the challenges of a...
Business-critical Data Types We offer access to robust datasets sourced from over 13M job ads daily. Track companies’ growth, market focus, technological shifts, planned geographic expansion, and more: - Identify new business opportunities - Identify and forecast industry & technological trends - Help identify the jobs, teams, and business units that have the highest impact on corporate goals - Identify most in-demand skills and qualifications for key positions.
Fresh Datasets We regularly update our datasets, assuring you access to the latest data and allowing for timely analysis of rapidly evolving markets & dynamic businesses.
Historical Datasets We maintain at your disposal historical datasets, allowing for comprehensive, reliable, and statistically sound historical analysis, trend identification, and forecasting.
Easy Access and Retrieval Our job listing datasets are available in industry-standard, convenient JSON and CSV formats. These structured formats make our datasets compatible with machine learning, artificial intelligence training, and similar applications. The historical data retrieval process is quick and reliable thanks to our robust, easy-to-implement API integration.
Datasets for investors Investment firms and hedge funds use our datasets to better inform their investment decisions by gaining up-to-date, reliable insights into workforce growth, geographic expansion, market focus, technology shifts, and other factors of start-ups and established companies.
Datasets for businesses Our datasets are used by retailers, manufacturers, real estate agents, and many other types of B2B & B2C businesses to stay ahead of the curve. They can gain insights into the competitive landscape, technology, and product adoption trends as well as power their lead generation processes with data-driven decision-making.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present the GLOBAL ROADKILL DATA, the largest worldwide compilation of roadkill data on terrestrial vertebrates. We outline the workflow (Fig. 1) to illustrate the sequential steps of the study, in which we merged local-scale survey datasets and opportunistic records into a unified roadkill large dataset comprising 208,570 roadkill records. These records include 2283 species and subspecies from 54 countries across six continents, ranging from 1971 to 2024.Large roadkill datasets offer the advantage ofpreventing the collection of redundant data and are valuable resources for both local and macro-scale analyses regarding roadkill rates, road and landscape features associated with roadkill risk, species more vulnerable to road traffic, and populations at risk due to additional mortality. The standardization of data - such as scientific names, projection coordinates, and units - in a user-friendly format, makes themreadily accessible to a broader scientific and non-scientific community, including NGOs, consultants, public administration officials, and road managers. The open-access approach promotes collaboration among researchers and road practitioners, facilitating the replication of studies, validation of findings, and expansion of previous work. Moreover, researchers can utilize suchdatasets to develop new hypotheses, conduct meta-analyses, address pressing challenges more efficiently and strengthen the robustness of road ecology research. Ensuring widespreadaccess to roadkill data fosters a more diverse and inclusive research community. This not only grants researchers in emerging economies with more data for analysis, but also cultivates a diverse array of perspectives and insightspromoting the advance of infrastructure ecology.MethodsInformation sources: A core team from different continents performed a systematic literature search in Web of Science and Google Scholar for published peer-reviewed papers and dissertations. It was searched for the following terms: “roadkill* OR “road-kill” OR “road mortality” AND (country) in English, Portuguese, Spanish, French and/or Mandarin. This initiative was also disseminated to the mailing lists associated with transport infrastructure: The CCSG Transport Working Group (WTG), Infrastructure & Ecology Network Europe (IENE) and Latin American & Caribbean Transport Working Group (LACTWG) (Fig. 1). The core team identified 750 scientific papers and dissertations with information on roadkill and contacted the first authors of the publications to request georeferenced locations of roadkill andofferco-authorship to this data paper. Of the 824 authors contacted, 145agreed to sharegeoreferenced roadkill locations, often involving additional colleagues who contributed to data collection. Since our main goal was to provide open access to data that had never been shared in this format before, data from citizen science projects (e.g., globalroakill.net) that are already available were not included.Data compilation: A total of 423 co-authors compiled the following information: continent, country, latitude and longitude in WGS 84 decimal degrees of the roadkill, coordinates uncertainty, class, order, family, scientific name of the roadkill, vernacular name, IUCN status, number of roadkill, year, month, and day of the record, identification of the road, type of road, survey type, references, and observers that recorded the roadkill (Supplementary Information Table S1 - description of the fields and Table S2 - reference list). When roadkill data were derived from systematic surveys, the dataset included additional information on road length that was surveyed, latitude and longitude of the road (initial and final part of the road segment), survey period, start year of the survey, final year of the survey, 1st month of the year surveyed, last month of the year surveyed, and frequency of the survey. We consolidated 142 valid datasets into a single dataset. We complemented this data with OccurenceID (a UUID generated using Java code), basisOfRecord, countryCode, locality using OpenStreetMap’s API (https://www.openstreetmap.org), geodeticDatum, verbatimScientificName, Kingdom, phylum, genus, specificEpithet, infraspecificEpithet, acceptedNameUsage, scientific name authorship, matchType, taxonRank using Darwin Core Reference Guide (https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters) and link of the associatedReference (URL).Data standardization - We conducted a clustering analysis on all text fields to identify similar entries with minor variations, such as typos, and corrected them using OpenRefine (http://openrefine.org). Wealsostandardized all date values using OpenRefine. Coordinate uncertainties listed as 0 m were adjusted to either 30m or 100m, depending on whether they were recorded after or before 2000, respectively, following the recommendation in the Darwin Core Reference Guide (https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters).Taxonomy - We cross-referenced all species names with the Global Biodiversity Information Facility (GBIF) Backbone Taxonomy using Java and GBIF’s API (https://doi.org/10.15468/39omei). This process aimed to rectify classification errors, include additional fields such as Kingdom, Phylum, and scientific authorship, and gather comprehensive taxonomic information to address any gap withinthe datasets. For species not automatically matched (matchType - Table S1), we manually searched for correct synonyms when available.Species conservation status - Using the species names, we retrieved their conservation status and also vernacular names by cross-referencing with the database downloaded from the IUCNRed List of Threatened Species (https://www.iucnredlist.org). Species without a match were categorized as "Not Evaluated".Data RecordsGLOBAL ROADKILL DATA is available at Figshare27 https://doi.org/10.6084/m9.figshare.25714233. The dataset incorporates opportunistic (collected incidentally without data collection efforts) and systematic data (collected through planned, structured, and controlled methods designed to ensure consistency and reliability). In total, it comprises 208,570 roadkill records across 177,428 different locations(Fig. 2). Data were collected from the road network of 54 countries from 6 continents: Europe (n = 19), Asia (n = 16), South America (n=7), North America (n = 4), Africa (n = 6) and Oceania (n = 2).(Figure 2 goes here)All data are georeferenced in WGS84 decimals with maximum uncertainty of 5000 m. Approximately 92% of records have a location uncertainty of 30 m or less, with only 1138 records having location uncertainties ranging from 1000 to 5000 m. Mammals have the highest number of roadkill records (61%), followed by amphibians (21%), reptiles (10%) and birds (8%). The species with the highest number of records were roe deer (Capreolus capreolus, n = 44,268), pool frog (Pelophylax lessonae, n = 11,999) and European fallow deer (Dama dama, n = 7,426).We collected information on 126 threatened species with a total of 4570 records. Among the threatened species, the giant anteater (Myrmecophaga tridactyla, VULNERABLE) has the highest number of records n = 1199), followed by the common fire salamander (Salamandra salamandra, VULNERABLE, n=1043), and European rabbit (Oryctolagus cuniculus, ENDANGERED, n = 440). Records ranged from 1971 and 2024, comprising 72% of the roadkill recorded since 2013. Over 46% of the records were obtained from systematic surveys, with road length and survey period averaging, respectively, 66 km (min-max: 0.09-855 km) and 780 days (1-25,720 days).Technical ValidationWe employed the OpenStreetMap API through Java todetect location inaccuracies, andvalidate whether the geographic coordinates aligned with the specified country. We calculated the distance of each occurrence to the nearest road using the GRIP global roads database28, ensuring that all records were within the defined coordinate uncertainty. We verified if the survey duration matched the provided initial and final survey dates. We calculated the distance between the provided initial and final road coordinates and cross-checked it with the given road length. We identified and merged duplicate entries within the same dataset (same location, species, and date), aggregating the number of roadkills for each occurrence.Usage NotesThe GLOBAL ROADKILL DATA is a compilation of roadkill records and was designed to serve as a valuable resource for a wide range of analyses. Nevertheless, to prevent the generation of meaningless results, users should be aware of the followinglimitations:- Geographic representation – There is an evident bias in the distribution of records. Data originatedpredominantly from Europe (60% of records), South America (22%), and North America (12%). Conversely, there is a notable lack of records from Asia (5%), Oceania (1%) and Africa (0.3%). This dataset represents 36% of the initial contacts that provided geo-referenced records, which may not necessarily correspond to locations where high-impact roads are present.- Location accuracy - Insufficient location accuracy was observed for 1% of the data (ranging from 1000 to 5000 m), that was associated with various factors, such as survey methods, recording practices, or timing of the survey.- Sampling effort - This dataset comprised both opportunistic data and records from systematic surveys, with a high variability in survey duration and frequency. As a result, the use of both opportunistic and systematic surveys may affect the relative abundance of roadkill making it hard to make sound comparisons among species or areas.- Detectability and carcass removal bias - Although several studies had a high frequency of road surveys,the duration of carcass persistence on roads may vary with species size and environmental conditions, affecting detectability. Accordingly, several approaches account for survey frequency and target speciesto estimate more
Svarah: An Indic Accented English Speech Dataset
Overview
India is the second largest English-speaking country in the world, with a speaker base of roughly 130 million. Unfortunately, Indian speakers are underrepresented in many existing English ASR benchmarks such as LibriSpeech, Switchboard, and the Speech Accent Archive. To address this gap, we introduce Svarah—a benchmark that comprises 9.6 hours of transcribed English audio from 117 speakers across 65… See the full description on the dataset page: https://huggingface.co/datasets/ai4bharat/Svarah.
The majority of the Canadian population, about 60% is concentrated within a thin belt of land representing 2.2% of the land between Windsor, Ontario and Quebec City. Even though Canada is the second largest country in the world in terms of land area, it only ranks 33rd in terms of population. The agricultural areas in the Prairies and eastern Canada have higher population densities than the sparsely populated North, but not as high as southern Ontario or southern Quebec.
The global gender gap index benchmarks national gender gaps on economic, political, education, and health-based criteria. In 2025, the country offering the most gender equal conditions was Iceland, with a score of 0.93. Overall, the Nordic countries make up 3 of the 5 most gender equal countries worldwide. The Nordic countries are known for their high levels of gender equality, including high female employment rates and evenly divided parental leave. Sudan is the second-least gender equal country Pakistan is found on the other end of the scale, ranked as the least gender equal country in the world. Conditions for civilians in the North African country have worsened significantly after a civil war broke out in April 2023. Especially girls and women are suffering and have become victims of sexual violence. Moreover, nearly 9 million people are estimated to be at acute risk of famine. The Middle East and North Africa have the largest gender gap Looking at the different world regions, the Middle East and North Africa have the largest gender gap as of 2023, just ahead of South Asia. Moreover, it is estimated that it will take another 152 years before the gender gap in the Middle East and North Africa is closed. On the other hand, Europe has the lowest gender gap in the world.
Street Noise-Level — Statistically Interpolated + Processed Measurements
Connect with our experts for the world’s most comprehensive Street Noise-Level Dataset. Access hyper-local and global average noise levels (dBA) from public streets across over 200 countries. This dataset, built using over 35 billion datapoints and developed in collaboration with leading acoustics professionals, provides unparalleled insight into real-world urban soundscapes. Unlike conventional noise models, which rely solely on simulations, our dataset combines real measurements with AI-powered interpolation to deliver statistically robust, highly accurate, and spatially complete noise-level data.
Power Your AI & Urban Analytics with Real-World Noise Insights
What makes this dataset unique? Silencio’s processed and interpolated Street Noise-Level Dataset is the largest and most precise global collection of acoustic data available. It integrates real user-collected measurements with AI-driven modeling, ensuring unmatched ground truth for AI training, urban intelligence, and noise-impact assessments.
Optimized for AI, Urban Planning & Research:
Empower your AI models and spatial analyses with rich, diverse, and realistic noise data. Ideal for sound recognition, smart cities, mobility modeling, noise mapping, real estate analysis, and sustainable urban planning.
Trusted & Compliant:
All data is collected via our mobile app, strictly anonymized, fully consented, and 100% GDPR-compliant — ensuring privacy and ethical integrity.
Historical & Up-to-Date:
Leverage both historical and continuously updated noise data to uncover trends, detect change, and power predictive models.
Hyper-Local & Global Coverage:
With coverage of over 200 countries and high spatial granularity, the dataset provides insights from the city level down to street segments.
Seamless Integration:
Delivered via CSV exports or S3 bucket delivery (APIs coming soon) for easy integration into AI training pipelines, geospatial tools, or analytics platforms.
Techsalerator’s Import/Export Trade Data for North America
Techsalerator’s Import/Export Trade Data for North America delivers an exhaustive and nuanced analysis of trade activities across the North American continent. This extensive dataset provides detailed insights into import and export transactions involving companies across various sectors within North America.
Coverage Across All North American Countries
The dataset encompasses all key countries within North America, including:
The dataset provides detailed trade information for the United States, the largest economy in the region. It includes extensive data on trade volumes, product categories, and the key trading partners of the U.S. 2. Canada
Data for Canada covers a wide range of trade activities, including import and export transactions, product classifications, and trade relationships with major global and regional partners. 3. Mexico
Comprehensive data for Mexico includes detailed records on its trade activities, including exports and imports, key sectors, and trade agreements affecting its trade dynamics. 4. Central American Countries:
Belize Costa Rica El Salvador Guatemala Honduras Nicaragua Panama The dataset covers these countries with information on their trade flows, key products, and trade relations with North American and international partners. 5. Caribbean Countries:
Bahamas Barbados Cuba Dominica Dominican Republic Grenada Haiti Jamaica Saint Kitts and Nevis Saint Lucia Saint Vincent and the Grenadines Trinidad and Tobago Trade data for these Caribbean nations includes detailed transaction records, sector-specific trade information, and their interactions with North American trade partners. Comprehensive Data Features
Transaction Details: The dataset includes precise details on each trade transaction, such as product descriptions, quantities, values, and dates. This allows for an accurate understanding of trade flows and patterns across North America.
Company Information: It provides data on companies involved in trade, including names, locations, and industry sectors, enabling targeted business analysis and competitive intelligence.
Categorization: Transactions are categorized by industry sectors, product types, and trade partners, offering insights into market dynamics and sector-specific trends within North America.
Trade Trends: Historical data helps users analyze trends over time, identify emerging markets, and assess the impact of economic or political events on trade flows in the region.
Geographical Insights: The data offers insights into regional trade flows and cross-border dynamics between North American countries and their global trade partners, including significant international trade relationships.
Regulatory and Compliance Data: Information on trade regulations, tariffs, and compliance requirements is included, helping businesses navigate the complex regulatory environments within North America.
Applications and Benefits
Market Research: Companies can leverage the data to discover new market opportunities, analyze competitive landscapes, and understand demand for specific products across North American countries.
Strategic Planning: Insights from the data enable companies to refine trade strategies, optimize supply chains, and manage risks associated with international trade in North America.
Economic Analysis: Analysts and policymakers can monitor economic performance, evaluate trade balances, and make informed decisions on trade policies and economic development strategies.
Investment Decisions: Investors can assess trade trends and market potentials to make informed decisions about investments in North America's diverse economies.
Techsalerator’s Import/Export Trade Data for North America offers a vital resource for organizations involved in international trade, providing a thorough, reliable, and detailed view of trade activities across the continent.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The study of the patterns and evolution of international migration often requires high-frequency data on migration flows on a global scale. However, the presently existing databases force a researcher to choose between the frequency of the data and its geographical scale. Yearly data exist but only for a small subset of countries, while most others are only covered every 5 to 10 years. To fill in the gaps in the coverage, the vast majority of databases use some imputation method. Gaps in the stock of migrants are often filled by combining information on migrants based on their country of birth with data based on nationality or using ‘model’ countries and propensity methods. Gaps in the data on the flow of migrants, on the other hand, are often filled by taking the difference in the stock, which the ’demographic accounting’ methods then adjust for demographic evolutions.
This database aims to fill this gap by providing a global, yearly, bilateral database on the stock of migrants according to their country of birth. This database contains close to 2.9 million observations on over 56,000 country pairs from 1960 to 2022, a tenfold increase relative to the second-largest database. In addition, it also produces an estimate of the net flow of migrants. For a subset of countries –over 8,000 country pairs and half a million observations– we also have lower-bound estimates of the gross in- and outflow.
This database was constructed using a novel approach to estimating the most likely values of missing migration stocks and flows. Specifically, we use a Bayesian state-space model to combine the information from multiple datasets on both stocks and flows into a single estimate. Like the demographic accounting technique, the state-space model is built on the demographic relationship between migrant stocks, flows, births and deaths. The most crucial difference is that the state-space model combines the information from multiple databases, including those covering migrant stocks, net flows, and gross flows.
More details on the construction can currently be found in the UNU-CRIS working paper: Standaert, Samuel and Rayp, Glenn (2022) "Where Did They Come From, Where Did They Go? Bridging the Gaps in Migration Data" UNU-CRIS working paper 22.04. Bruges.
https://cris.unu.edu/where-did-they-come-where-did-they-go-bridging-gaps-migration-data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for CORONAVIRUS DEATHS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.
What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!
SELECT
age.country_name,
age.life_expectancy,
size.country_area
FROM (
SELECT
country_name,
life_expectancy
FROM
bigquery-public-data.census_bureau_international.mortality_life_expectancy
WHERE
year = 2016) age
INNER JOIN (
SELECT
country_name,
country_area
FROM
bigquery-public-data.census_bureau_international.country_names_area
where country_area > 25000) size
ON
age.country_name = size.country_name
ORDER BY
2 DESC
/* Limit removed for Data Studio Visualization */
LIMIT
10
Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.
SELECT
age.country_name,
SUM(age.population) AS under_25,
pop.midyear_population AS total,
ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25
FROM (
SELECT
country_name,
population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population_agespecific
WHERE
year =2017
AND age < 25) age
INNER JOIN (
SELECT
midyear_population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population
WHERE
year = 2017) pop
ON
age.country_code = pop.country_code
GROUP BY
1,
3
ORDER BY
4 DESC /* Remove limit for visualization*/
LIMIT
10
The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.
SELECT
growth.country_name,
growth.net_migration,
CAST(area.country_area AS INT64) AS country_area
FROM (
SELECT
country_name,
net_migration,
country_code
FROM
bigquery-public-data.census_bureau_international.birth_death_growth_rates
WHERE
year = 2017) growth
INNER JOIN (
SELECT
country_area,
country_code
FROM
bigquery-public-data.census_bureau_international.country_names_area
Historic (none)
United States Census Bureau
Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data