Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
This dataset contains global COVID-19 case and death data by country, collected directly from the official World Health Organization (WHO) COVID-19 Dashboard. It provides a comprehensive view of the pandemic’s impact worldwide, covering the period up to 2025. The dataset is intended for researchers, analysts, and anyone interested in understanding the progression and global effects of COVID-19 through reliable, up-to-date information.
The World Health Organization is the United Nations agency responsible for international public health. The WHO COVID-19 Dashboard is a trusted source that aggregates official reports from countries and territories around the world, providing daily updates on cases, deaths, and other key metrics related to COVID-19.
This dataset can be used for: - Tracking the spread and trends of COVID-19 globally and by country - Modeling and forecasting pandemic progression - Comparative analysis of the pandemic’s impact across countries and regions - Visualization and reporting
The data is sourced from the WHO, widely regarded as the most authoritative source for global health statistics. However, reporting practices and data completeness may vary by country and may be subject to revision as new information becomes available.
Special thanks to the WHO for making this data publicly available and to all those working to collect, verify, and report COVID-19 statistics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of first and last name frequencies of academic authors by country.
Spreadsheet 1 contains 50 countries, with names based on affiliations in Scopus journal articles 2001-2021.
Spreadsheet 2 contains 200 countries, with names based on affiliations in Scopus journal articles 2001-2021, using a marginally updated last name extraction algorithm that is almost the same except for Dutch/Flemish names.
From the paper: Can national researcher mobility be tracked by first or last name uniqueness?
For example the distribution for the UK shows a single peak for international names, with no national names, Belgium has a national peak and an international peak, and China has mainly a national peak. The 50 countries are:
No Code Country 1 SB Serbia 2 IE Ireland 3 HU Hungary 4 CL Chile 5 CO Columbia 6 NG Nigeria 7 HK Hong Kong 8 AR Argentina 9 SG Singapore 10 NZ New Zealand 11 PK Pakistan 12 TH Thailand 13 UA Ukraine 14 SA Saudi Arabia 15 RO Israel 16 ID Indonesia 17 IL Israel 18 MY Malaysia 19 DK Denmark 20 CZ Czech Republic 21 ZA South Africa 22 AT Austria 23 FI Finland 24 PT Portugal 25 GR Greece 26 NO Norway 27 EG Egypt 28 MX Mexico 29 BE Belgium 30 CH Switzerland 31 SW Sweden 32 PL Poland 33 TW Taiwan 34 NL Netherlands 35 TK Turkey 36 IR Iran 37 RU Russia 38 AU Australia 39 BR Brazil 40 KR South Korea 41 ES Spain 42 CA Canada 43 IT France 44 FR France 45 IN India 46 DE Germany 47 US USA 48 UK UK 49 JP Japan 50 CN China
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.
Key Features
- Country: Name of the country.
- Density (P/Km2): Population density measured in persons per square kilometer.
- Abbreviation: Abbreviation or code representing the country.
- Agricultural Land (%): Percentage of land area used for agricultural purposes.
- Land Area (Km2): Total land area of the country in square kilometers.
- Armed Forces Size: Size of the armed forces in the country.
- Birth Rate: Number of births per 1,000 population per year.
- Calling Code: International calling code for the country.
- Capital/Major City: Name of the capital or major city.
- CO2 Emissions: Carbon dioxide emissions in tons.
- CPI: Consumer Price Index, a measure of inflation and purchasing power.
- CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
- Currency_Code: Currency code used in the country.
- Fertility Rate: Average number of children born to a woman during her lifetime.
- Forested Area (%): Percentage of land area covered by forests.
- Gasoline_Price: Price of gasoline per liter in local currency.
- GDP: Gross Domestic Product, the total value of goods and services produced in the country.
- Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
- Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
- Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
- Largest City: Name of the country's largest city.
- Life Expectancy: Average number of years a newborn is expected to live.
- Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
- Minimum Wage: Minimum wage level in local currency.
- Official Language: Official language(s) spoken in the country.
- Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
- Physicians per Thousand: Number of physicians per thousand people.
- Population: Total population of the country.
- Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
- Tax Revenue (%): Tax revenue as a percentage of GDP.
- Total Tax Rate: Overall tax burden as a percentage of commercial profits.
- Unemployment Rate: Percentage of the labor force that is unemployed.
- Urban Population: Percentage of the population living in urban areas.
- Latitude: Latitude coordinate of the country's location.
- Longitude: Longitude coordinate of the country's location.
Potential Use Cases
- Analyze population density and land area to study spatial distribution patterns.
- Investigate the relationship between agricultural land and food security.
- Examine carbon dioxide emissions and their impact on climate change.
- Explore correlations between economic indicators such as GDP and various socio-economic factors.
- Investigate educational enrollment rates and their implications for human capital development.
- Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
- Study labor market dynamics through indicators such as labor force participation and unemployment rates.
- Investigate the role of taxation and its impact on economic development.
- Explore urbanization trends and their social and environmental consequences.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Copy of https://www.kaggle.com/datasets/kisoibo/countries-databasesqlite
Updated the name of the table from 'countries of the world' to 'countries', for ease of writing queries.
Info about the dataset:
Table Total Rows Total Columns countries of the world **0 ** ** 20** Country, Region, Population, Area (sq. mi.), Pop. Density (per sq. mi.), Coastline (coast/area ratio), Net migration, Infant mortality (per 1000 births), GDP ($ per capita), Literacy (%), Phones (per 1000), Arable (%), Crops (%), Other (%), Climate, Birthrate, Deathrate, Agriculture, Industry, Service
Acknowledgements Source: All these data sets are made up of data from the US government. Generally they are free to use if you use the data in the US. If you are outside of the US, you may need to contact the US Govt to ask. Data from the World Factbook is public domain. The website says "The World Factbook is in the public domain and may be used freely by anyone at anytime without seeking permission." https://www.cia.gov/library/publications/the-world-factbook/docs/faqs.html
When making visualisations related to countries, sometimes it is interesting to group them by attributes such as region, or weigh their importance by population, GDP or other variables.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated soundscape datasets. This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description.
The overview of all sampling sites can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. More information on the project can be found here and on ResearchGate.
The audio recording criteria justifying inclusion into the meta-database are:
The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database.
datasets
datasets-sites
sites
deployments
The Associated Press is sharing data from the COVID Impact Survey, which provides statistics about physical health, mental health, economic security and social dynamics related to the coronavirus pandemic in the United States.
Conducted by NORC at the University of Chicago for the Data Foundation, the probability-based survey provides estimates for the United States as a whole, as well as in 10 states (California, Colorado, Florida, Louisiana, Minnesota, Missouri, Montana, New York, Oregon and Texas) and eight metropolitan areas (Atlanta, Baltimore, Birmingham, Chicago, Cleveland, Columbus, Phoenix and Pittsburgh).
The survey is designed to allow for an ongoing gauge of public perception, health and economic status to see what is shifting during the pandemic. When multiple sets of data are available, it will allow for the tracking of how issues ranging from COVID-19 symptoms to economic status change over time.
The survey is focused on three core areas of research:
Instead, use our queries linked below or statistical software such as R or SPSS to weight the data.
If you'd like to create a table to see how people nationally or in your state or city feel about a topic in the survey, use the survey questionnaire and codebook to match a question (the variable label) to a variable name. For instance, "How often have you felt lonely in the past 7 days?" is variable "soc5c".
Nationally: Go to this query and enter soc5c as the variable. Hit the blue Run Query button in the upper right hand corner.
Local or State: To find figures for that response in a specific state, go to this query and type in a state name and soc5c as the variable, and then hit the blue Run Query button in the upper right hand corner.
The resulting sentence you could write out of these queries is: "People in some states are less likely to report loneliness than others. For example, 66% of Louisianans report feeling lonely on none of the last seven days, compared with 52% of Californians. Nationally, 60% of people said they hadn't felt lonely."
The margin of error for the national and regional surveys is found in the attached methods statement. You will need the margin of error to determine if the comparisons are statistically significant. If the difference is:
The survey data will be provided under embargo in both comma-delimited and statistical formats.
Each set of survey data will be numbered and have the date the embargo lifts in front of it in the format of: 01_April_30_covid_impact_survey. The survey has been organized by the Data Foundation, a non-profit non-partisan think tank, and is sponsored by the Federal Reserve Bank of Minneapolis and the Packard Foundation. It is conducted by NORC at the University of Chicago, a non-partisan research organization. (NORC is not an abbreviation, it part of the organization's formal name.)
Data for the national estimates are collected using the AmeriSpeak Panel, NORC’s probability-based panel designed to be representative of the U.S. household population. Interviews are conducted with adults age 18 and over representing the 50 states and the District of Columbia. Panel members are randomly drawn from AmeriSpeak with a target of achieving 2,000 interviews in each survey. Invited panel members may complete the survey online or by telephone with an NORC telephone interviewer.
Once all the study data have been made final, an iterative raking process is used to adjust for any survey nonresponse as well as any noncoverage or under and oversampling resulting from the study specific sample design. Raking variables include age, gender, census division, race/ethnicity, education, and county groupings based on county level counts of the number of COVID-19 deaths. Demographic weighting variables were obtained from the 2020 Current Population Survey. The count of COVID-19 deaths by county was obtained from USA Facts. The weighted data reflect the U.S. population of adults age 18 and over.
Data for the regional estimates are collected using a multi-mode address-based (ABS) approach that allows residents of each area to complete the interview via web or with an NORC telephone interviewer. All sampled households are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Interviews are conducted with adults age 18 and over with a target of achieving 400 interviews in each region in each survey.Additional details on the survey methodology and the survey questionnaire are attached below or can be found at https://www.covid-impact.org.
Results should be credited to the COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides an extensive view of global population statistics and health metrics across various countries from 2014 to 2024. It combines population data with vital health-related indicators, making it a valuable resource for understanding trends in population growth and health outcomes worldwide. Researchers, data scientists, and policymakers can utilize this dataset to analyze correlations between population dynamics and health performance at a global scale.
Key Features: - Country: Name of the country. - Year: Year of the data (2014–2024). - Population: Total population for the respective year and country. - Country Code: ISO 3-letter country codes for easy identification. - Health Expenditure (health_exp): Percentage of GDP spent on healthcare. - Life Expectancy (life_expect): Average life expectancy at birth in years. - Maternal Mortality (maternal_mortality): Maternal deaths per 100,000 live births. - Infant Mortality (infant_mortality): Deaths of infants under 1 year per 1,000 live births. - Neonatal Mortality (neonatal_mortality): Deaths of newborns (0–28 days) per 1,000 live births. - Under-5 Mortality (under_5_mortality): Deaths of children under 5 years per 1,000 live births. - HIV Prevalence (prev_hiv): Percentage of the population living with HIV. - Tuberculosis Incidence (inci_tuberc): Estimated new and relapse TB cases per 100,000 people. - Undernourishment Prevalence (prev_undernourishment): Percentage of the population that is undernourished.
Use Cases: - Health Policy Analysis: Understand trends in healthcare expenditure and its relationship to health outcomes. - Global Health Research: Investigate global or regional disparities in health and nutrition. - Population Studies: Analyze population growth trends alongside health indicators. - Data Visualization: Build visual dashboards for storytelling and impactful data representation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The GBIF Backbone Taxonomy is a single, synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another.
It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families. Additional scientific names only found in other authoritative nomenclatural and taxonomic datasets are then merged into the tree, thus extending the original catalogue and broadening the backbones name coverage. The GBIF Backbone taxonomy also includes identifiers for Operational Taxonomic Units (OTUs) drawn from the barcoding resources iBOL and UNITE.
International Barcode of Life project (iBOL), Barcode Index Numbers (BINs). BINs are connected to a taxon name and its classification by taking into account all names applied to the BIN and picking names with at least 80% consensus. If there is no consensus of name at the species level, the selection process is repeated moving up the major Linnaean ranks until consensus is achieved.
UNITE - Unified system for the DNA based fungal species, Species Hypotheses (SHs). SHs are connected to a taxon name and its classification based on the determination of the RefS (reference sequence) if present or the RepS (representative sequence). In the latter case, if there is no match in the UNITE taxonomy, the lowest rank with 100% consensus within the SH will be used.
The GBIF Backbone Taxonomy is available for download at https://hosted-datasets.gbif.org/datasets/backbone/ in different formats together with an archive of all previous versions.
The following 105 sources have been used to assemble the GBIF backbone with number of names given in brackets:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated passive acoustic monitoring meta-datasets (i.e. meta-data collections). This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description. Additionally, R scripts are provided to replicate the analysis published in [placeholder].
The overview of all sampling sites and timelines can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. The recordings of this collection were annotated and analysed to explore macro-ecological trends.
The audio recording criteria justifying inclusion into the meta-database are:
Stationary (no transects, towed sensors or microphones mounted on cars)
Passive (unattended, no human disturbance by the recordist)
Ambient (no directional microphone or triggered recordings, non-experimental conditions)
Spatially and/or temporally replicated (i.e. multiple sites sampled at the same time and/or multiple days - covering the same daytime - sampled at the same site)
The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database. The data shared here only includes validated collections.
Changes from version 3.0.1
Added files needed to reproduce the metadata and the acoustic analyses found in the publication.
Dropped underused fields: spatial_selection, temporal_exclusion, freshwater_recordist_position from collections table; secondary realm, biome, and functional group from sites table.
Meta-database CSV files
collections
collection_id: unique integer, primary key
name: name of the dataset. if it is repeated, incremental integers should be used in the "subset" column to differentiate them.
ecoSound-web_link: link of validated meta-collection on ecoSound-web
primary_contributors: full names of people deemed corresponding contributors who are responsible for the dataset
secondary_contributors: full names of people who are not primary contributors but who have significantly contributed to the dataset, and who could be contacted for in-depth analyses
date_added: when the datased was added (YYYY-MM-DD)
URL_open_recordings: internet link for openly-available recordings from this collection
URL_project: internet link for further information about the corresponding project
DOI_publication: Digital Object Identifiers of corresponding publications
core_realm_IUCN: The main, core realm of the dataset according to IUCN Global Ecosystem Typology (v2.0): https://global-ecosystems.org/
medium: the physical medium the microphone is situated in
locality: optional free text about the locality
contributor_comments: free-text field for comments by the primary contributors
collections-sites
dataset_ID: primary key of collections table
site_ID: primary key of sites table
sites
site_ID: unique integer, primary key
site_name: internal name or code of sampling site as used in respective projects
latitude_numeric: site's numeric degrees of latitude
longitude_numeric: site's numeric degrees of longitude
blurred_coordinates: whether latitude and longitude coordinates are inaccurate, boolean. Coordinates may be blurred with random offsets, rounding, snapping, etc. Indicate the blurring method inside the comments field
topography_m: vertical position of the microphone relative to the sea level. for sites on land: elevation. For marine sites: depth (negative). in meters. Only indicate if the values were measured by the collaborator.
freshwater_depth_m: microphone depth, only used for sites inside freshwater bodies that also have an elevation value above the sea level
realm: Ecosystem type: main realm according to IUCN GET https://global-ecosystems.org/
biome: Ecosystem type: main biome according to IUCN GET https://global-ecosystems.org/
functional_group: Ecosystem type: main functional group according to IUCN GET https://global-ecosystems.org/
contributor_comments: free text field for contributor comments
GADM_0: Global ADMinistrative Database level 0 classification of terrestrial site or marine site that is within territorial waters. Source: https://gadm.org/download_world.html
IHO: International Hydrographic Organization classification of marine site. Source: https://marineregions.org/downloads.php
WDPA: World Database on Protected Areas classification of the site. Source: https://www.protectedplanet.net/en/thematic-areas/wdpa?tab=WDPA
deployments
dataset_ID: primary key of datasets table
deployment: identical subscript letters to denote rows that belong to the same deployment. For instance, you may use different operation times and schedules for different target taxa within one deployment.
subset_site_ID: If the deployment was not done in all the sites of the corresponding collection, site IDs where the deployment was conducted
start_date: date of deployment start
start_time_mixed: deployment start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset). Corresponds to the recording start time for continuous recording deployments. If multiple start times were used, you should mention the latest start time (corresponds to the earliest daytime from which all recorders are active). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")
permanent: whether the deployment is permanent, boolean
end_date: date of deployment end (date when last scheduled operation starts)
end_time_mixed: deployment end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording end time for continuous recording deployments.
operation_mode: continuous: recording takes place from the deployment start date-time to deployment end date-time.periodical: recording takes place periodically (i.e., with duty cycle) from the deployment start date-time to deployment end date-time.scheduled: recording takes place during scheduled daily time intervals (optionally with duty cycle)
duty_cycle_minutes: duty cycle of the recording (i.e. the fraction of minutes when it is recording), written as "recording(minutes)/period(minutes)". empty if no duty cycle is used. For example: "1/6" if the recorder is active for 1 minute and standing by for 5 minutes
operation_start_time_mixed: only for scheduled recordings: start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")
operation_duration_minutes: only for scheduled recordings: duration of operation in minutes, if constant
operation_end_time_mixed: only for scheduled recordings: end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Only required if durations are variable. Do not use when end times are ambiguous (for instance, if a recording could be 1 hour or 25 hours long because the end is on the next day). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")
high_pass_filter_Hz: frequency of the high-pass filter of the recorder if applied, in Hz. Otherwise, write "none". This may be called a "low-cut" filter too.
bit_depth: sampling bit depth of the recordings. Often constant for a particular recorder
channels: number of recorded audio channels
sampling_frequency_kHz: frequency at which the microphone signal was sampled by the recorder (sounds of half that frequency will be recorded)
recorder: recorder used for deployment
microphone: microphone used for deployment
target_taxa: main IUCN animal taxa that were studied with this deployment, using the exact IUCN Red list names (http://www.iucnredlist.org/), separated by commas. Only genera, families, orders, and classes are accepted. Empty if there was no taxonomic focus (i.e., general soundscapes were the study focus).
contributor_comments: free text field for contributor comments
exact_recordings: whether the deployment data here have been superseded by inserting more exact recording date-time ranges into the meta-collection on ecoSound-web
recordings (partial download from ecoSound-web)
recording_id: primary key of the recordings table
collection_id: ID of the collection the recording belongs to
name: name of the recording
site_id: site ID the recording belongs to:
recorder_id: ID of the recorder used for the recording (internal ecoSound-web code)
microphone_id: ID of the microphone used for the recording (internal ecoSound-web code)
recording_gain:recording gain applied for amplifying the audio signal, in decibels
duty_cycle_recording: fraction of the recording periode when the recorder is actively recording audio
duty_cycle_period: period of the duty cycle, i.e., time between the starts of two subsequent recordings
note: comments (contains the target taxon)
file_date: date of the recording start
file_time: local time of the recording start
sampling_rate: audio sampling rate in Hz
bitdepth: depth in bits for each audio sample
channel_num: number of channels
duration: duration of the recording in seconds. Note: duty-cycled recordings cover only a proportion of this duration
affiliations
affiliation_id: primary key of affiliations table
lab_research_group: Laboratory or research group name
department_school_institute: department, school, or institute name
university_institution: University or institution name
street_address: street address
region_state_province_city: region, state, province, or city name
postal_code: postal code
country: country
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Vehicle Miles Traveled During Covid-19 Lock-Downs ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/vehicle-miles-travelede on 13 February 2022.
--- Dataset description provided by original source is as follows ---
**This data set was last updated 3:30 PM ET Monday, January 4, 2021. The last date of data in this dataset is December 31, 2020. **
Overview
Data shows that mobility declined nationally since states and localities began shelter-in-place strategies to stem the spread of COVID-19. The numbers began climbing as more people ventured out and traveled further from their homes, but in parallel with the rise of COVID-19 cases in July, travel declined again.
This distribution contains county level data for vehicle miles traveled (VMT) from StreetLight Data, Inc, updated three times a week. This data offers a detailed look at estimates of how much people are moving around in each county.
Data available has a two day lag - the most recent data is from two days prior to the update date. Going forward, this dataset will be updated by AP at 3:30pm ET on Monday, Wednesday and Friday each week.
This data has been made available to members of AP’s Data Distribution Program. To inquire about access for your organization - publishers, researchers, corporations, etc. - please click Request Access in the upper right corner of the page or email kromano@ap.org. Be sure to include your contact information and use case.
Findings
- Nationally, data shows that vehicle travel in the US has doubled compared to the seven-day period ending April 13, which was the lowest VMT since the COVID-19 crisis began. In early December, travel reached a low not seen since May, with a small rise leading up to the Christmas holiday.
- Average vehicle miles traveled continues to be below what would be expected without a pandemic - down 38% compared to January 2020. September 4 reported the largest single day estimate of vehicle miles traveled since March 14.
- New Jersey, Michigan and New York are among the states with the largest relative uptick in travel at this point of the pandemic - they report almost two times the miles traveled compared to their lowest seven-day period. However, travel in New Jersey and New York is still much lower than expected without a pandemic. Other states such as New Mexico, Vermont and West Virginia have rebounded the least.
About This Data
The county level data is provided by StreetLight Data, Inc, a transportation analysis firm that measures travel patterns across the U.S.. The data is from their Vehicle Miles Traveled (VMT) Monitor which uses anonymized and aggregated data from smartphones and other GPS-enabled devices to provide county-by-county VMT metrics for more than 3,100 counties. The VMT Monitor provides an estimate of total vehicle miles travelled by residents of each county, each day since the COVID-19 crisis began (March 1, 2020), as well as a change from the baseline average daily VMT calculated for January 2020. Additional columns are calculations by AP.
Included Data
01_vmt_nation.csv - Data summarized to provide a nationwide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.
02_vmt_state.csv - Data summarized to provide a statewide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.
03_vmt_county.csv - Data providing a county level look at vehicle miles traveled. Includes VMT estimate, percent change compared to January and seven day rolling averages to smooth out the trend lines over time.
Additional Data Queries
* Filter for specific state - filters
02_vmt_state.csv
daily data for specific state.* Filter counties by state - filters
03_vmt_county.csv
daily data for counties in specific state.* Filter for specific county - filters
03_vmt_county.csv
daily data for specific county.Interactive
The AP has designed an interactive map to show percent change in vehicle miles traveled by county since each counties lowest point during the pandemic:
This dataset was created by Angeliki Kastanis and contains around 0 samples along with Date At Low, Mean7 County Vmt At Low, technical information and other features such as: - County Name - County Fips - and more.
- Analyze State Name in relation to Baseline Jan Vmt
- Study the influence of Date At Low on Mean7 County Vmt At Low
- More datasets
If you use this dataset in your research, please credit Angeliki Kastanis
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Synthetic populations for regions of the World (SPW) | Alabama
Dataset information
A synthetic population of a region as provided here, captures the people of the region with selected demographic attributes, their organization into households, their assigned activities for a day, the locations where the activities take place and thus where interactions among population members happen (e.g., spread of epidemics).
License
Acknowledgment
This project was supported by the National Science Foundation under the NSF RAPID: COVID-19 Response Support: Building Synthetic Multi-scale Networks (PI: Madhav Marathe, Co-PIs: Henning Mortveit, Srinivasan Venkatramanan; Fund Number: OAC-2027541).
Contact information
Henning.Mortveit@virginia.edu
Identifiers
Region name | Alabama |
Region ID | usa_140002904 |
Model | coarse |
Version | 0_9_0 |
Statistics
Name | Value |
---|---|
Population | 4768478 |
Average age | 37.8 |
Households | 1933164 |
Average household size | 2.5 |
Residence locations | 1933164 |
Activity locations | 398709 |
Average number of activities | 5.7 |
Average travel distance | 65.0 |
Sources
Description | Name | Version | Url |
---|---|---|---|
Activity template data | World Bank | 2021 | https://data.worldbank.org |
Administrative boundaries | ADCW | 7.6 | https://www.adci.com/adc-worldmap |
Curated POIs based on OSM | SLIPO/OSM POIs | http://slipo.eu/?p=1551 https://www.openstreetmap.org/ | |
Household data | IPUMS | https://international.ipums.org/international | |
Population count with demographic attributes | GPW | v4.11 | https://sedac.ciesin.columbia.edu/data/set/gpw-v4-admin-unit-center-points-population-estimates-rev11 |
Files description
Base data files (usa_140002904_data_v_0_9.zip)
Filename | Description |
---|---|
usa_140002904_person_v_0_9.csv | Data for each person including attributes such as age, gender, and household ID. |
usa_140002904_household_v_0_9.csv | Data at household level. |
usa_140002904_residence_locations_v_0_9.csv | Data about residence locations |
usa_140002904_activity_locations_v_0_9.csv | Data about activity locations, including what activity types are supported at these locations |
usa_140002904_activity_location_assignment_v_0_9.csv | For each person and for each of their activities, this file specifies the location where the activity takes place |
Derived data files
Filename | Description |
---|---|
usa_140002904_contact_matrix_v_0_9.csv | A POLYMOD-type contact matrix constructed from a network representation of the location assignment data and a within-location contact model. |
Validation and measures files
Filename | Description |
---|---|
usa_140002904_household_grouping_validation_v_0_9.pdf | Validation plots for household construction |
usa_140002904_activity_durations_{adult,child}_v_0_9.pdf | Comparison of time spent on generated activities with survey data |
usa_140002904_activity_patterns_{adult,child}_v_0_9.pdf | Comparison of generated activity patterns by the time of day with survey data |
usa_140002904_location_construction_0_9.pdf | Validation plots for location construction |
usa_140002904_location_assignement_0_9.pdf | Validation plots for location assignment, including travel distribution plots |
usa_140002904_usa_140002904_ver_0_9_0_avg_travel_distance.pdf | Choropleth map visualizing average travel distance |
usa_140002904_usa_140002904_ver_0_9_0_travel_distr_combined.pdf | Travel distance distribution |
usa_140002904_usa_140002904_ver_0_9_0_num_activity_loc.pdf | Choropleth map visualizing number of activity locations |
usa_140002904_usa_140002904_ver_0_9_0_avg_age.pdf | Choropleth map visualizing average age |
usa_140002904_usa_140002904_ver_0_9_0_pop_density_per_sqkm.pdf | Choropleth map visualizing population density |
usa_140002904_usa_140002904_ver_0_9_0_pop_size.pdf | Choropleth map visualizing population size |
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset captures the popularity of news articles across various social media platforms, providing valuable insights into how news content performs online [1, 2]. It is a subset of a larger dataset, specifically designed for analysing engagement and reach of news items [1, 2]. The data includes key details about news articles and their final popularity scores on Facebook, Google+, and LinkedIn [1-3]. It serves as an excellent resource for understanding social media trends and the dissemination of news [2].
The dataset features the following columns: * IDLink: A unique identifier for each news item [1, 2]. * Title: The title of the news item as it appeared from the official media sources [1, 2]. * Headline: The headline of the news item, also from official media sources [1, 2]. * Source: The original news outlet that published the news item [1, 2]. * Topic: The query topic used to obtain the news items from official media sources [1, 2]. * PublishDate: The date and time when the news item was published [1, 2]. * Facebook: The final popularity score of the news item on Facebook [2, 3]. * GooglePlus: The final popularity score of the news item on Google+ [2, 3]. * LinkedIn: The final popularity score of the news item on LinkedIn [2, 3]. This subset of the data is specifically noted to be missing the 'SentimentTitle' and 'SentimentHeadline' columns that are present in the full dataset [1].
This dataset comprises approximately 37,000 news articles [1]. While exact row counts for files are not specified beyond this total, the dataset format is typically CSV [4]. * Unique Values: * IDLink: 37,288 unique values [3]. * Title: 32,366 unique values [3]. * Headline: 34,634 unique values [3]. * Source Distribution: * Bloomberg: 2% [3]. * Reuters: 1% [3]. * Other: 97% (from 35,990 sources) [3]. * Topic Distribution: * Economy: 36% [3]. * Obama: 31% [3]. * Other: 33% (from 12,165 topics) [3]. * Time Range Sample (2016): * 03/29 - 04/03: 2,239 items [5]. * 04/03 - 04/08: 2,020 items [5]. * 06/17 - 06/22: 1,650 items [5]. * 06/27 - 07/02: 2,024 items [5]. The data spans from 2016-03-29 to 2016-07-07 [6].
This dataset is ideal for: * Analysing news popularity trends across different social media platforms [2]. * Studying the impact of news content on online engagement [2]. * Exploratory data analysis of news consumption patterns [7]. * Understanding the spread of information in digital environments. * Developing models to predict social media reach for news articles. * Insights into media outlets' influence and topic relevance [1, 3].
The dataset covers an approximate 8-month period, between November 2015 and July 2016 [2]. The specific subset provided covers 29 March 2016 to 07 July 2016 [6]. It includes news items on four primary topics: economy, Microsoft, Obama, and Palestine [2], with distribution details for 'economy' and 'obama' [3]. The region of coverage is global [8].
CCO
Original Data Source: News Popularity in Multiple Social Media Platforms
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset for the article "A Predictive Method to Improve the Effectiveness of Twitter Communication in a Cultural Heritage Scenario".
Abstract:
Museums are embracing social technologies in the attempt to broaden their audience and to engage people. Although social communication seems an easy task, media managers know how hard it is to reach millions of people with a simple message. Indeed, millions of posts are competing every day to get visibility in terms of likes and shares and very little research focused on museums communication to identify best practices. In this paper, we focus on Twitter and we propose a novel method that exploits interpretable machine learning techniques to: (a) predict whether a tweet will likely be appreciated by Twitter users or not; (b) present simple suggestions that will help enhancing the message and increasing the probability of its success. Using a real-world dataset of around 40,000 tweets written by 23 world famous museums, we show that our proposed method allows identifying tweet features that are more likely to influence the tweet success.
Code to run a selection of experiments is available at https://github.com/rmartoglia/predict-twitter-ch
Dataset structure
The dataset contains the dataset used in the experiments of the above research paper. Only the extracted features for the museum tweet threads (and not the message full text) are provided and needed for the analyses.
We selected 23 well known world spread art museums and grouped them into five groups: G1 (museums with at least three million of followers); G2 (museums with more than one million of followers); G3 (museums with more than 400,000 followers); G4 (museums with more that 200,000 followers); G5 (Italian museums). From these museums, we analyzed ca. 40,000 tweets, with a number varying from 5k ca. to 11k ca. for each museum group, depending on the number of museums in each group.
Content features: these are the features that can be drawn form the content of the tweet itself. We further divide such features in the following two categories:
– Countable: these features have a value ranging into different intervals. We take into consideration: the number of hashtags (i.e., words preceded by #) in the tweet, the number of URLs (i.e., links to external resources), the number of images (e.g., photos and graphical emoticons), the number of mentions (i.e., twitter accounts preceded by @), the length of the tweet;
– On-Off : these features have binary values in {0, 1}. We observe whether the tweet has exclamation marks, question marks, person names, place names, organization names, other names. Moreover, we also take into consideration the tweet topic density: assuming that the involved topics correspond to the hashtags mentioned in the text, we define a tweet as dense of topics if the number of hashtags it contains is greater than a given threshold, set to 5. Finally, we observe the tweet sentiment that might be present (positive or negative) or not (neutral).
Context features: these features are not drawn form the content of the tweet itself and might give a larger picture of the context in which the tweet was sent. Namely, we take into consideration the part of the day in which the tweet was sent (morning, afternoon, evening and night respectively from 5:00am to 11:59am, from 12:00pm to 5:59pm, from 6:00pm to 10:59pm and from 11pm to 4:59am), and a boolean feature indicating whether the tweet is a retweet or not.
User features: these features are proper of the user that sent the tweet, and are the same for all the tweets of this user. Namely we consider the name of the museum and the number of followers of the user.
🌍 Global B2B Person Dataset | 755M+ LinkedIn Profiles | Verified & Bi-Weekly Updated Access the world’s most comprehensive professional dataset, enriched with over 755 million LinkedIn profiles. The Forager.ai Global B2B Person Dataset delivers work-verified professional contacts with 95%+ accuracy, refreshed every two weeks. Ideal for recruitment, sales, research, and talent mapping, it provides direct access to decision-makers, specialists, and executives across industries and geographies.
Dataset Features Full Name & Job Title: Up-to-date first/last name with current professional role.
Emails & Phone Numbers: AI-validated work and personal email addresses, plus mobile numbers.
Company Info: Current employer name, industry, and company size (employee count).
Career History: Detailed work history with job titles, durations, and role progressions.
Skills & Endorsements: Extracted from public LinkedIn profiles.
Education & Certifications: Universities, degrees, and professional certifications.
Location & LinkedIn URL: City, country, and direct link to public LinkedIn profile.
Distribution Data Volume: 755M+ total profiles, with 270M+ containing full contact information.
Formats Available: CSV, JSON via S3 or Snowflake; API for real-time access.
Access Methods: REST API, Enrichment API (lookup), full dataset delivery, or custom solutions.
Usage This dataset is ideal for a variety of applications:
Executive Recruitment: Source passive talent, build role-based maps, and assess mobility.
Sales Intelligence: Find decision-makers, personalize outreach, and trigger campaigns on job changes.
Market Research: Understand talent concentration by company, geography, and skill set.
Partnership Development: Identify key stakeholders in target firms for business development.
Talent Mapping & Strategic Hiring: Build full organizational charts and skill distribution heatmaps.
Coverage Geographic Coverage: Global – including North America, EMEA, LATAM, and APAC.
Time Range: Continuously updated; profiles refreshed bi-weekly.
Demographics: Cross-industry coverage of seniority levels from entry-level to C-suite, across all sectors.
License CUSTOM
Who Can Use It Recruiters & Staffing Firms: For building target lists and sourcing niche talent.
Sales & RevOps Teams: For targeting by department, title, or decision-making authority.
VCs & PE Firms: To assess leadership teams and monitor executive movement.
Data Scientists & Analysts: To train models for job mobility, hiring trends, or org structure prediction.
B2B Platforms: For enriching internal databases and powering account-based marketing (ABM).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Construction
This dataset captures the temporal network of Bitcoin (BTC) flow exchanged between entities at the finest time resolution in UNIX timestamp. Its construction is based on the blockchain covering the period from January, 3rd of 2009 to January the 25th of 2021. The blockchain extraction has been made using bitcoin-etl (https://github.com/blockchain-etl/bitcoin-etl) Python package. The entity-entity network is built by aggregating Bitcoin addresses using the common-input heuristic [1] as well as popular Bitcoin users' addresses provided by https://www.walletexplorer.com/
[1] M. Harrigan and C. Fretter, "The Unreasonable Effectiveness of Address Clustering," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 2016, pp. 368-373, doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071.keywords: {Online banking;Merging;Protocols;Upper bound;Bipartite graph;Electronic mail;Size measurement;bitcoin;cryptocurrency;blockchain},
Dataset Description
Bitcoin Activity Temporal Coverage: From 03 January 2009 to 25 January 2021
Overview:
This dataset provides a comprehensive representation of Bitcoin exchanges between entities over a significant temporal span, spanning from the inception of Bitcoin to recent years. It encompasses various temporal resolutions and representations to facilitate Bitcoin transaction network analysis in the context of temporal graphs.
Every dates have been retrieved from bloc UNIX timestamp and GMT timezone.
Contents:
The dataset is distributed across three compressed archives:
All data are stored in the Apache Parquet file format, a columnar storage format optimized for analytical queries. It can be used with pyspark Python package.
orbitaal-stream_graph.tar.gz:
The root directory is STREAM_GRAPH/
Contains a stream graph representation of Bitcoin exchanges at the finest temporal scale, corresponding to the validation time of each block (averaging approximately 10 minutes).
The stream graph is divided into 13 files, one for each year
Files format is parquet
Name format is orbitaal-stream_graph-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering
These files are in the subdirectory STREAM_GRAPH/EDGES/
orbitaal-snapshot-all.tar.gz:
The root directory is SNAPSHOT/
Contains the snapshot network representing all transactions aggregated over the whole dataset period (from Jan. 2009 to Jan. 2021).
Files format is parquet
Name format is orbitaal-snapshot-all.snappy.parquet.
These files are in the subdirectory SNAPSHOT/EDGES/ALL/
orbitaal-snapshot-year.tar.gz:
The root directory is SNAPSHOT/
Contains the yearly resolution of snapshot networks
Files format is parquet
Name format is orbitaal-snapshot-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering
These files are in the subdirectory SNAPSHOT/EDGES/year/
orbitaal-snapshot-month.tar.gz:
The root directory is SNAPSHOT/
Contains the monthly resoluted snapshot networks
Files format is parquet
Name format is orbitaal-snapshot-date-[YYYY]-[MM]-file-id-[ID].snappy.parquet, where
[YYYY] and [MM] stands for the corresponding year and month, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year and month ordering
These files are in the subdirectory SNAPSHOT/EDGES/month/
orbitaal-snapshot-day.tar.gz:
The root directory is SNAPSHOT/
Contains the daily resoluted snapshot networks
Files format is parquet
Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-file-id-[ID].snappy.parquet, where
[YYYY], [MM], and [DD] stand for the corresponding year, month, and day, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, and day ordering
These files are in the subdirectory SNAPSHOT/EDGES/day/
orbitaal-snapshot-hour.tar.gz:
The root directory is SNAPSHOT/
Contains the hourly resoluted snapshot networks
Files format is parquet
Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-[hh]-file-id-[ID].snappy.parquet, where
[YYYY], [MM], [DD], and [hh] stand for the corresponding year, month, day, and hour, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, day and hour ordering
These files are in the subdirectory SNAPSHOT/EDGES/hour/
orbitaal-nodetable.tar.gz:
The root directory is NODE_TABLE/
Contains two files in parquet format, the first one gives information related to nodes present in stream graphs and snapshots such as period of activity and associated global Bitcoin balance, and the other one contains the list of all associated Bitcoin addresses.
Small samples in CSV format
orbitaal-stream_graph-2016_07_08.csv and orbitaal-stream_graph-2016_07_09.csv
These two CSV files are related to stream graph representations of an halvening happening in 2016.
orbitaal-snapshot-2016_07_08.csv and orbitaal-snapshot-2016_07_09.csv
These two CSV files are related to daily snapshot representations of an halvening happening in 2016.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘List of Top Data Breaches (2004 - 2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hishaamarmghan/list-of-top-data-breaches-2004-2021 on 14 February 2022.
--- Dataset description provided by original source is as follows ---
This is a dataset containing all the major data breaches in the world from 2004 to 2021
As we know, there is a big issue related to the privacy of our data. Many major companies in the world still to this day face this issue every single day. Even with a great team of people working on their security, many still suffer. In order to tackle this situation, it is only right that we must study this issue in great depth and therefore I pulled this data from Wikipedia to conduct data analysis. I would encourage others to take a look at this as well and find as many insights as possible.
This data contains 5 columns: 1. Entity: The name of the company, organization or institute 2. Year: In what year did the data breach took place 3. Records: How many records were compromised (can include information like email, passwords etc.) 4. Organization type: Which sector does the organization belong to 5. Method: Was it hacked? Were the files lost? Was it an inside job?
Here is the source for the dataset: https://en.wikipedia.org/wiki/List_of_data_breaches
Here is the GitHub link for a guide on how it was scraped: https://github.com/hishaamarmghan/Data-Breaches-Scraping-Cleaning
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Freebase is amongst the largest public cross-domain knowledge graphs. It possesses three main data modeling idiosyncrasies. It has a strong type system; its properties are purposefully represented in reverse pairs; and it uses mediator objects to represent multiary relationships. These design choices are important in modeling the real-world. But they also pose nontrivial challenges in research of embedding models for knowledge graph completion, especially when models are developed and evaluated agnostically of these idiosyncrasies. We make available several variants of the Freebase dataset by inclusion and exclusion of these data modeling idiosyncrasies. This is the first-ever publicly available full-scale Freebase dataset that has gone through proper preparation.
Dataset Details
The dataset consists of the four variants of Freebase dataset as well as related mapping/support files. For each variant, we made three kinds of files available:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This codebook of data is related to the study conducted on the Batwa’s Rights to Recognition as a Minority and Indigenous People in Rwanda through the Lens of a Human Rights Based Approach. The dataset displays information in 7 columns. The first column is called Code level 1 which consists of the main code extracted from the findings, the second column is code level 2 which consists of sub-codes extracted from code level 1 and the third column is called code level 3 which is extracted from code level 2. The 4th column provides a snapshot of definition of the content of the codes. The column 5 concerns what the codes should include and the 6th column concerns what the codes should not include. The 7th column concerns the types of questions asked to respondents based on which codes were generated. These codes were generated following data extracted from questionnaire summarized in 7th column. For example, the first column (Code level 1) is made of 4 rows. The first two rows concern findings from the literature review and the last two rows concern empirical data from the fieldwork. Both data from literature review and empirical data from the fieldwork were combined to come up with findings based on which an interpretation was made. These codes allowed the researchers to give meaningful findings which in return facilitated researchers to provided a consolidated interpretation. The data generated aligned to epistemological interpretivism and they concern views from respondents on socio-cultural narratives and emotional experiences that the they have endured in their lives. The data collection was conducted in three rural districts of Nyaruguru (southern province), Rubavu and Rutsiro (western Province) and in three urban districts of Nyarugenge, Kicukiro and Gasabo (Kigali City). The justification for the three rural and three urban districts was to find out if there were divergent socio-cultural realities within each and across the diverse settings. The selected rural sites were those near protected areas from where the Batwa were the subjects of eviction following the legislation of protected areas in 1930 by colonial authorities. The urban districts were the sites in which some Batwa had lived after the imposition of a new lifestyle which differs from their hunting and gathering tradition following their eviction from forests. The study sites were purposively selected through the facilitation of gatekeepers namely, local entities. Authorization was sent to the district level which subsequently allowed a team of researchers to approach the sector, the cell and the village levels of administration. At the village level, which is the lowest entity where households of HMP live, respondents were again identified through the help of the Chief of the Village (umudugudu) who served as a gatekeeper.Focus Group Discussions (FGDs) along with direct observation were administered to the members of HMP (formerly referred to as Batwa). The groups comprised individuals who were above the age of 18 years, and were deemed to have experienced hardship as result of socio-economic vulnerability resulting from forest eviction. In-depth interviews were also carried out with officials of selected public institutions, including officials from the National Commission of Unity and Reconciliation and the National Commission of Human Rights. Key informants’ interviews (KIIs) were administered to leaders from NGOs and cooperative societies working towards the promotion of the rights of HMPs. These included one top manager and another who used to among the top managers of Cooperative des Potiers au Rwanda (COPORWA), a local NGO advocating for the rights of Batwa in Rwanda as well as one person who used to be among the leaders of CAURWA (Communauté des Autochtones au Rwanda, translated as Community of Autochthonies in Rwanda). The latter was also among one of the founding pioneers of a local NGO advocating for the rights of the Batwa in Rwanda. A former representative of HMP in Rwanda’s Senate was also contacted for an in-depth interview.All respondents were purposively selected due to their expertise or lived experience on the subject of self-identity and non-discrimination. Key informants from COPORWA, and a representative of the HMP in the Rwandan Senate and authorities from the government were to provide information on convergences or divergences on the phenomenon under investigation.In total, 226 respondents divided into four categories were approached for feedback. These were 220 heads of households from HMP for FGDs and direct observation; 3 leaders from COPORWA for in-depth interviews; 1 ex-Senator representing HMP in Rwanda Senate for an in-depth interview including 2 authorities from governmental institutions. The aim of using different tools for different respondents was to not only get a wide range of perceptions on the subject matter of self-identity and non-discrimination under investigation, but to enable the triangulation of information. FGDs along with direct observation facilitated the exploration of opinions and observation of behaviour and body language of the respondents when a sensitive issue, such as discrimination, was mentioned. As ethical consideration, all respondents were requested for their consent prior to data collection. All interviews were guided by the principle of ‘theoretical saturation’, which consists of administering inquiry until respondents start to repeat themselvesTo meet the reliability and validity of data, some measures were taken. Meetings were held every morning to plan for the day and every evening to evaluate the day spent in the field. For each day of data collection, the data collectors gave a daily report highlighting the progress made and any special information relating to the subject matter under investigation, which was observed from the field. The study used thematic analysis embedded in a deductive approach guided by the human rights-based approach in which two variables of self-identity and non-discrimination were the focus of study. The human rights-based approach facilitated generating data around themes related to self-identity and non-discrimination.In short, findings around the Batwa’s rights to self-identity and to non-discrimination indicated different information over the two variables. On the self-identity, findings indicated that the identity of the Batwa has been shifting because of socio-cultural dynamics affecting the contexts in which they find themselves and live. For example, the name “HMP” which conflate all vulnerable groups in Rwanda provides divergent views for respondents. For ordinary respondents from the Batwa, the name provides a negative profile while for the elites from Batwa the name means obscuring their problems since it disconnects from other indigenous people across Africa and the World. For respondents from the GoR, the name means upholding unit and reconciliation. Findings from the data indicated also that the identity Batwa has been characterised with negative profile of someone who is the poorest, dirty, indigent because of their lowest social status resulting from non-dominant context. This reality corroborates other recent studies that the identity of the Batwa does not have a fixed boundary.On the variable of non-discrimination, findings from the data indicated that negative profiles mentioned above are forms of indirect discrimination resulting from microaggressions and stereotypes. For further information how to use the dataset kindly contact the correspondent author at: ndikubwimana.genbattista@gmail.com, tel: (+250)788 751 225
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name