64 datasets found
  1. o

    Geonames - All Cities with a population > 1000

    • public.opendatasoft.com
    • data.smartidf.services
    • +1more
    csv, excel, geojson +1
    Updated Mar 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
    Explore at:
    csv, json, geojson, excelAvailable download formats
    Dataset updated
    Mar 10, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name

  2. Worldwide COVID-19 Data from WHO (2025 Edition)

    • kaggle.com
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adil Shamim (2025). Worldwide COVID-19 Data from WHO (2025 Edition) [Dataset]. https://www.kaggle.com/datasets/adilshamim8/worldwide-covid-19-data-from-who
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 3, 2025
    Dataset provided by
    Kaggle
    Authors
    Adil Shamim
    Description

    Dataset Overview

    This dataset contains global COVID-19 case and death data by country, collected directly from the official World Health Organization (WHO) COVID-19 Dashboard. It provides a comprehensive view of the pandemic’s impact worldwide, covering the period up to 2025. The dataset is intended for researchers, analysts, and anyone interested in understanding the progression and global effects of COVID-19 through reliable, up-to-date information.

    Source Information

    • Website: WHO COVID-19 Dashboard
    • Organization: World Health Organization (WHO)
    • Data Coverage: Global (by country/territory)
    • Time Period: Up to 2025

    The World Health Organization is the United Nations agency responsible for international public health. The WHO COVID-19 Dashboard is a trusted source that aggregates official reports from countries and territories around the world, providing daily updates on cases, deaths, and other key metrics related to COVID-19.

    Dataset Contents

    • Country/Region: The name of the country or territory.
    • Date: Reporting date.
    • New Cases: Number of new confirmed COVID-19 cases.
    • Cumulative Cases: Total confirmed COVID-19 cases to date.
    • New Deaths: Number of new confirmed deaths due to COVID-19.
    • Cumulative Deaths: Total deaths reported to date.
    • Additional fields may include population, rates per 100,000, and more (see data files for details).

    How to Use

    This dataset can be used for: - Tracking the spread and trends of COVID-19 globally and by country - Modeling and forecasting pandemic progression - Comparative analysis of the pandemic’s impact across countries and regions - Visualization and reporting

    Data Reliability

    The data is sourced from the WHO, widely regarded as the most authoritative source for global health statistics. However, reporting practices and data completeness may vary by country and may be subject to revision as new information becomes available.

    Acknowledgements

    Special thanks to the WHO for making this data publicly available and to all those working to collect, verify, and report COVID-19 statistics.

  3. f

    Distribution of first name and last name frequencies by country

    • figshare.com
    xlsx
    Updated Feb 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike Thelwall (2023). Distribution of first name and last name frequencies by country [Dataset]. http://doi.org/10.6084/m9.figshare.21956795.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 2, 2023
    Dataset provided by
    figshare
    Authors
    Mike Thelwall
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Distribution of first and last name frequencies of academic authors by country.

    Spreadsheet 1 contains 50 countries, with names based on affiliations in Scopus journal articles 2001-2021.

    Spreadsheet 2 contains 200 countries, with names based on affiliations in Scopus journal articles 2001-2021, using a marginally updated last name extraction algorithm that is almost the same except for Dutch/Flemish names.

    From the paper: Can national researcher mobility be tracked by first or last name uniqueness?

    For example the distribution for the UK shows a single peak for international names, with no national names, Belgium has a national peak and an international peak, and China has mainly a national peak. The 50 countries are:

    No Code Country 1 SB Serbia 2 IE Ireland 3 HU Hungary 4 CL Chile 5 CO Columbia 6 NG Nigeria 7 HK Hong Kong 8 AR Argentina 9 SG Singapore 10 NZ New Zealand 11 PK Pakistan 12 TH Thailand 13 UA Ukraine 14 SA Saudi Arabia 15 RO Israel 16 ID Indonesia 17 IL Israel 18 MY Malaysia 19 DK Denmark 20 CZ Czech Republic 21 ZA South Africa 22 AT Austria 23 FI Finland 24 PT Portugal 25 GR Greece 26 NO Norway 27 EG Egypt 28 MX Mexico 29 BE Belgium 30 CH Switzerland 31 SW Sweden 32 PL Poland 33 TW Taiwan 34 NL Netherlands 35 TK Turkey 36 IR Iran 37 RU Russia 38 AU Australia 39 BR Brazil 40 KR South Korea 41 ES Spain 42 CA Canada 43 IT France 44 FR France 45 IN India 46 DE Germany 47 US USA 48 UK UK 49 JP Japan 50 CN China

  4. Global Country Information 2023

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jun 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidula Elgiriyewithana; Nidula Elgiriyewithana (2024). Global Country Information 2023 [Dataset]. http://doi.org/10.5281/zenodo.8165229
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nidula Elgiriyewithana; Nidula Elgiriyewithana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

    Key Features

    • Country: Name of the country.
    • Density (P/Km2): Population density measured in persons per square kilometer.
    • Abbreviation: Abbreviation or code representing the country.
    • Agricultural Land (%): Percentage of land area used for agricultural purposes.
    • Land Area (Km2): Total land area of the country in square kilometers.
    • Armed Forces Size: Size of the armed forces in the country.
    • Birth Rate: Number of births per 1,000 population per year.
    • Calling Code: International calling code for the country.
    • Capital/Major City: Name of the capital or major city.
    • CO2 Emissions: Carbon dioxide emissions in tons.
    • CPI: Consumer Price Index, a measure of inflation and purchasing power.
    • CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
    • Currency_Code: Currency code used in the country.
    • Fertility Rate: Average number of children born to a woman during her lifetime.
    • Forested Area (%): Percentage of land area covered by forests.
    • Gasoline_Price: Price of gasoline per liter in local currency.
    • GDP: Gross Domestic Product, the total value of goods and services produced in the country.
    • Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
    • Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
    • Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
    • Largest City: Name of the country's largest city.
    • Life Expectancy: Average number of years a newborn is expected to live.
    • Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
    • Minimum Wage: Minimum wage level in local currency.
    • Official Language: Official language(s) spoken in the country.
    • Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
    • Physicians per Thousand: Number of physicians per thousand people.
    • Population: Total population of the country.
    • Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
    • Tax Revenue (%): Tax revenue as a percentage of GDP.
    • Total Tax Rate: Overall tax burden as a percentage of commercial profits.
    • Unemployment Rate: Percentage of the labor force that is unemployed.
    • Urban Population: Percentage of the population living in urban areas.
    • Latitude: Latitude coordinate of the country's location.
    • Longitude: Longitude coordinate of the country's location.

    Potential Use Cases

    • Analyze population density and land area to study spatial distribution patterns.
    • Investigate the relationship between agricultural land and food security.
    • Examine carbon dioxide emissions and their impact on climate change.
    • Explore correlations between economic indicators such as GDP and various socio-economic factors.
    • Investigate educational enrollment rates and their implications for human capital development.
    • Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
    • Study labor market dynamics through indicators such as labor force participation and unemployment rates.
    • Investigate the role of taxation and its impact on economic development.
    • Explore urbanization trends and their social and environmental consequences.
  5. World Population Statistics - 2023

    • kaggle.com
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhavik Jikadara (2024). World Population Statistics - 2023 [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/world-population-statistics-2023
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 9, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bhavik Jikadara
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description
    • The current US Census Bureau world population estimate in June 2019 shows that the current global population is 7,577,130,400 people on Earth, which far exceeds the world population of 7.2 billion in 2015. Our estimate based on UN data shows the world's population surpassing 7.7 billion.
    • China is the most populous country in the world with a population exceeding 1.4 billion. It is one of just two countries with a population of more than 1 billion, with India being the second. As of 2018, India has a population of over 1.355 billion people, and its population growth is expected to continue through at least 2050. By the year 2030, India is expected to become the most populous country in the world. This is because India’s population will grow, while China is projected to see a loss in population.
    • The following 11 countries that are the most populous in the world each have populations exceeding 100 million. These include the United States, Indonesia, Brazil, Pakistan, Nigeria, Bangladesh, Russia, Mexico, Japan, Ethiopia, and the Philippines. Of these nations, all are expected to continue to grow except Russia and Japan, which will see their populations drop by 2030 before falling again significantly by 2050.
    • Many other nations have populations of at least one million, while there are also countries that have just thousands. The smallest population in the world can be found in Vatican City, where only 801 people reside.
    • In 2018, the world’s population growth rate was 1.12%. Every five years since the 1970s, the population growth rate has continued to fall. The world’s population is expected to continue to grow larger but at a much slower pace. By 2030, the population will exceed 8 billion. In 2040, this number will grow to more than 9 billion. In 2055, the number will rise to over 10 billion, and another billion people won’t be added until near the end of the century. The current annual population growth estimates from the United Nations are in the millions - estimating that over 80 million new lives are added yearly.
    • This population growth will be significantly impacted by nine specific countries which are situated to contribute to the population growth more quickly than other nations. These nations include the Democratic Republic of the Congo, Ethiopia, India, Indonesia, Nigeria, Pakistan, Uganda, the United Republic of Tanzania, and the United States of America. Particularly of interest, India is on track to overtake China's position as the most populous country by 2030. Additionally, multiple nations within Africa are expected to double their populations before fertility rates begin to slow entirely.

    Content

    • In this Dataset, we have Historical Population data for every Country/Territory in the world by different parameters like Area Size of the Country/Territory, Name of the Continent, Name of the Capital, Density, Population Growth Rate, Ranking based on Population, World Population Percentage, etc. >Dataset Glossary (Column-Wise):
    • Rank: Rank by Population.
    • CCA3: 3 Digit Country/Territories Code.
    • Country/Territories: Name of the Country/Territories.
    • Capital: Name of the Capital.
    • Continent: Name of the Continent.
    • 2022 Population: Population of the Country/Territories in the year 2022.
    • 2020 Population: Population of the Country/Territories in the year 2020.
    • 2015 Population: Population of the Country/Territories in the year 2015.
    • 2010 Population: Population of the Country/Territories in the year 2010.
    • 2000 Population: Population of the Country/Territories in the year 2000.
    • 1990 Population: Population of the Country/Territories in the year 1990.
    • 1980 Population: Population of the Country/Territories in the year 1980.
    • 1970 Population: Population of the Country/Territories in the year 1970.
    • Area (km²): Area size of the Country/Territories in square kilometers.
    • Density (per km²): Population Density per square kilometer.
    • Growth Rate: Population Growth Rate by Country/Territories.
    • World Population Percentage: The population percentage by each Country/Territories.
  6. countries of the world

    • kaggle.com
    Updated Jan 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rob Cobb (2023). countries of the world [Dataset]. https://www.kaggle.com/datasets/robbcobb/countries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rob Cobb
    Area covered
    World
    Description

    Copy of https://www.kaggle.com/datasets/kisoibo/countries-databasesqlite

    Updated the name of the table from 'countries of the world' to 'countries', for ease of writing queries.

    Info about the dataset:

    Content

    Table Total Rows Total Columns countries of the world **0 ** ** 20** Country, Region, Population, Area (sq. mi.), Pop. Density (per sq. mi.), Coastline (coast/area ratio), Net migration, Infant mortality (per 1000 births), GDP ($ per capita), Literacy (%), Phones (per 1000), Arable (%), Crops (%), Other (%), Climate, Birthrate, Deathrate, Agriculture, Industry, Service

    Acknowledgements

    Acknowledgements Source: All these data sets are made up of data from the US government. Generally they are free to use if you use the data in the US. If you are outside of the US, you may need to contact the US Govt to ask. Data from the World Factbook is public domain. The website says "The World Factbook is in the public domain and may be used freely by anyone at anytime without seeking permission." https://www.cia.gov/library/publications/the-world-factbook/docs/faqs.html

    Inspiration

    When making visualisations related to countries, sometimes it is interesting to group them by attributes such as region, or weigh their importance by population, GDP or other variables.

  7. Worldwide Soundscapes project meta-data

    • zenodo.org
    Updated Dec 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; 松海 李; 松海 李; 黎君 董; 黎君 董; Yuhang Song; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger; Yuhang Song (2022). Worldwide Soundscapes project meta-data [Dataset]. http://doi.org/10.5281/zenodo.7415473
    Explore at:
    Dataset updated
    Dec 9, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; 松海 李; 松海 李; 黎君 董; 黎君 董; Yuhang Song; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger; Yuhang Song
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated soundscape datasets. This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description.

    The overview of all sampling sites can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. More information on the project can be found here and on ResearchGate.

    The audio recording criteria justifying inclusion into the meta-database are:

    • Stationary (no transects, towed sensors or microphones mounted on cars)
    • Passive (unattended, no human disturbance by the recordist)
    • Ambient (no spatial or temporal focus on a particular species or direction)
    • Spatially and/or temporally replicated (multiple sites sampled at least at one common daytime or multiple days sampled at least in one common site)

    The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database.

    datasets

    • dataset_id: incremental integer, primary key
    • name: name of the dataset. if it is repeated, incremental integers should be used in the "subset" column to differentiate them.
    • subset: incremental integer that can be used to distinguish datasets with identical names
    • collaborators: full names of people deemed responsible for the dataset, separated by commas
    • contributors: full names of people who are not the main collaborators but who have significantly contributed to the dataset, and who could be contacted for in-depth analyses, separated by commas.
    • date_added: when the datased was added (DD/MM/YYYY)
    • URL_open_recordings: if recordings (even only some) from this dataset are openly available, indicate the internet link where they can be found.
    • URL_project: internet link for further information about the corresponding project
    • DOI_publication: DOI of corresponding publications, separated by comma
    • core_realm_IUCN: The core realm of the dataset. Datasets may have multiple realms, but the main one should be listed. Datasets may contain sampling sites from different realms in the "sites" sheet. IUCN Global Ecosystem Typology (v2.0): https://global-ecosystems.org/
    • medium: the physical medium the microphone is situated in
    • protected_area: Whether the sampling sites were situated in protected areas or not, or only some.
    • GADM0: For datasets on land or in territorial waters, Global Administrative Database level0
      https://gadm.org/
    • GADM1: For datasets on land or in territorial waters, Global Administrative Database level1
      https://gadm.org/
    • GADM2: For datasets on land or in territorial waters, Global Administrative Database level2
      https://gadm.org/
    • IHO: For marine locations, the sea area that encompassess all the sampling locations according to the International Hydrographic Organisation. Map here: https://www.arcgis.com/home/item.html?id=44e04407fbaf4d93afcb63018fbca9e2
    • locality: optional free text about the locality
    • latitude_numeric_region: study region approximate centroid latitude in WGS84 decimal degrees
    • longitude_numeric_region: study region approximate centroid longitude in WGS84 decimal degrees
    • sites_number: number of sites sampled
    • year_start: starting year of the sampling
    • year_end: ending year of the sampling
    • deployment_schedule: description of the sampling schedule, provisional
    • temporal_recording_selection: list environmental exclusion criteria that were used to determine which recording days or times to discard
    • high_pass_filter_Hz: frequency of the high-pass filter of the recorder, in Hz
    • variable_sampling_frequency: Does the sampling frequency vary? If it does, write "NA" in the sampling_frequency_kHz column and indicate it in the sampling_frequency_kHz column inside the deployments sheet
    • sampling_frequency_kHz: frequency the microphone was sampled at (sounds of half that frequency will be recorded)
    • variable_recorder:
    • recorder: recorder model used
    • microphone: microphone used
    • freshwater_recordist_position: position of the recordist relative to the microphone during sampling (only for freshwater)
    • collaborator_comments: free-text field for comments by the collaborators
    • validated: This cell is checked if the contents of all sheets are complete and have been found to be coherent and consistent with our requirements.
    • validator_name: name of person doing the validation
    • validation_comments: validators: please insert the date when someone was contacted
    • cross-check: this cell is checked if the collaborators confirm the spatial and temporal data after checking the corresponding site maps, deployment and operation time graphs found at https://drive.google.com/drive/folders/1qfwXH_7dpFCqyls-c6b8RZ_fbcn9kXbp?usp=share_link

    datasets-sites

    • dataset_ID: primary key of datasets table
    • dataset_name: lookup field
    • site_ID: primary key of sites table
    • site_name: lookup field

    sites

    • site_ID: unique site IDs, larger than 1000 for compatibility with ecoSound-web
    • site_name: name or code of sampling site as used in respective projects
    • latitude_numeric: exact numeric degrees coordinates of latitude
    • longitude_numeric: exact numeric degrees coordinates of longitude
    • topography_m: for sites on land: elevation. For marine sites: depth (negative). in meters
    • freshwater_depth_m
    • realm: Ecosystem type according to IUCN GET https://global-ecosystems.org/
    • biome: Ecosystem type according to IUCN GET https://global-ecosystems.org/
    • functional_group: Ecosystem type according to IUCN GET https://global-ecosystems.org/
    • comments

    deployments

    • dataset_ID: primary key of datasets table
    • dataset_name: lookup field
    • deployment: use identical subscript letters to denote rows that belong to the same deployment. For instance, you may use different operation times and schedules for different target taxa within one deployment.
    • start_date_min: earliest date of deployment start, double-click cell to get date-picker
    • start_date_max: latest date of deployment start, if applicable (only used when recorders were deployed over several days), double-click cell to get date-picker
    • start_time_mixed: deployment start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording start time for continuous recording deployments. If multiple start times were used, you should mention the latest start time (corresponds to the earliest daytime from which all recorders are active). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")
    • permanent: is the deployment permanent (in which case it would be ongoing and the end date or duration would be unknown)?
    • variable_duration_days: is the duration of the deployment variable? in days
    • duration_days: deployment duration per recorder (use the minimum if variable)
    • end_date_min: earliest date of deployment end, only needed if duration is variable, double-click cell to get date-picker
    • end_date_max: latest date of deployment end, only needed if duration is variable, double-click cell to get date-picker
    • end_time_mixed: deployment end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording end time for continuous recording deployments.
    • recording_time: does the recording last from the deployment start time to the end time (continuous) or at scheduled daily intervals (scheduled)? Note: we consider recordings with duty cycles to be continuous.
    • operation_start_time_mixed: scheduled recording start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")
    • operation_duration_minutes: duration of operation in minutes, if constant
    • operation_end_time_mixed: scheduled recording end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")
    • duty_cycle_minutes: duty cycle of the recording (i.e. the fraction of minutes when it is recording), written as "recording(minutes)/period(minutes)". For example: "1/6" if the recorder is active for 1 minute and standing by for 5 minutes.
    • sampling_frequency_kHz: only indicate the sampling frequency if it is variable within a particular dataset so that we need to code different frequencies for different deployments
    • recorder
    • subset_sites: If the deployment was not done in all the sites of the

  8. d

    COVID Impact Survey - Public Data

    • data.world
    csv, zip
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2024). COVID Impact Survey - Public Data [Dataset]. https://data.world/associatedpress/covid-impact-survey-public-data
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Oct 16, 2024
    Authors
    The Associated Press
    Description

    Overview

    The Associated Press is sharing data from the COVID Impact Survey, which provides statistics about physical health, mental health, economic security and social dynamics related to the coronavirus pandemic in the United States.

    Conducted by NORC at the University of Chicago for the Data Foundation, the probability-based survey provides estimates for the United States as a whole, as well as in 10 states (California, Colorado, Florida, Louisiana, Minnesota, Missouri, Montana, New York, Oregon and Texas) and eight metropolitan areas (Atlanta, Baltimore, Birmingham, Chicago, Cleveland, Columbus, Phoenix and Pittsburgh).

    The survey is designed to allow for an ongoing gauge of public perception, health and economic status to see what is shifting during the pandemic. When multiple sets of data are available, it will allow for the tracking of how issues ranging from COVID-19 symptoms to economic status change over time.

    The survey is focused on three core areas of research:

    • Physical Health: Symptoms related to COVID-19, relevant existing conditions and health insurance coverage.
    • Economic and Financial Health: Employment, food security, and government cash assistance.
    • Social and Mental Health: Communication with friends and family, anxiety and volunteerism. (Questions based on those used on the U.S. Census Bureau’s Current Population Survey.) ## Using this Data - IMPORTANT This is survey data and must be properly weighted during analysis: DO NOT REPORT THIS DATA AS RAW OR AGGREGATE NUMBERS!!

    Instead, use our queries linked below or statistical software such as R or SPSS to weight the data.

    Queries

    If you'd like to create a table to see how people nationally or in your state or city feel about a topic in the survey, use the survey questionnaire and codebook to match a question (the variable label) to a variable name. For instance, "How often have you felt lonely in the past 7 days?" is variable "soc5c".

    Nationally: Go to this query and enter soc5c as the variable. Hit the blue Run Query button in the upper right hand corner.

    Local or State: To find figures for that response in a specific state, go to this query and type in a state name and soc5c as the variable, and then hit the blue Run Query button in the upper right hand corner.

    The resulting sentence you could write out of these queries is: "People in some states are less likely to report loneliness than others. For example, 66% of Louisianans report feeling lonely on none of the last seven days, compared with 52% of Californians. Nationally, 60% of people said they hadn't felt lonely."

    Margin of Error

    The margin of error for the national and regional surveys is found in the attached methods statement. You will need the margin of error to determine if the comparisons are statistically significant. If the difference is:

    • At least twice the margin of error, you can report there is a clear difference.
    • At least as large as the margin of error, you can report there is a slight or apparent difference.
    • Less than or equal to the margin of error, you can report that the respondents are divided or there is no difference. ## A Note on Timing Survey results will generally be posted under embargo on Tuesday evenings. The data is available for release at 1 p.m. ET Thursdays.

    About the Data

    The survey data will be provided under embargo in both comma-delimited and statistical formats.

    Each set of survey data will be numbered and have the date the embargo lifts in front of it in the format of: 01_April_30_covid_impact_survey. The survey has been organized by the Data Foundation, a non-profit non-partisan think tank, and is sponsored by the Federal Reserve Bank of Minneapolis and the Packard Foundation. It is conducted by NORC at the University of Chicago, a non-partisan research organization. (NORC is not an abbreviation, it part of the organization's formal name.)

    Data for the national estimates are collected using the AmeriSpeak Panel, NORC’s probability-based panel designed to be representative of the U.S. household population. Interviews are conducted with adults age 18 and over representing the 50 states and the District of Columbia. Panel members are randomly drawn from AmeriSpeak with a target of achieving 2,000 interviews in each survey. Invited panel members may complete the survey online or by telephone with an NORC telephone interviewer.

    Once all the study data have been made final, an iterative raking process is used to adjust for any survey nonresponse as well as any noncoverage or under and oversampling resulting from the study specific sample design. Raking variables include age, gender, census division, race/ethnicity, education, and county groupings based on county level counts of the number of COVID-19 deaths. Demographic weighting variables were obtained from the 2020 Current Population Survey. The count of COVID-19 deaths by county was obtained from USA Facts. The weighted data reflect the U.S. population of adults age 18 and over.

    Data for the regional estimates are collected using a multi-mode address-based (ABS) approach that allows residents of each area to complete the interview via web or with an NORC telephone interviewer. All sampled households are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Interviews are conducted with adults age 18 and over with a target of achieving 400 interviews in each region in each survey.Additional details on the survey methodology and the survey questionnaire are attached below or can be found at https://www.covid-impact.org.

    Attribution

    Results should be credited to the COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation.

    AP Data Distributions

    ​To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

  9. World Population & Health Data 2014 - 2024

    • kaggle.com
    Updated Jan 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Faizal Rosyid (2025). World Population & Health Data 2014 - 2024 [Dataset]. https://www.kaggle.com/datasets/faizalrosyid/world-population-and-health-data-2014-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 21, 2025
    Dataset provided by
    Kaggle
    Authors
    Faizal Rosyid
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    World
    Description

    This dataset provides an extensive view of global population statistics and health metrics across various countries from 2014 to 2024. It combines population data with vital health-related indicators, making it a valuable resource for understanding trends in population growth and health outcomes worldwide. Researchers, data scientists, and policymakers can utilize this dataset to analyze correlations between population dynamics and health performance at a global scale.

    Key Features: - Country: Name of the country. - Year: Year of the data (2014–2024). - Population: Total population for the respective year and country. - Country Code: ISO 3-letter country codes for easy identification. - Health Expenditure (health_exp): Percentage of GDP spent on healthcare. - Life Expectancy (life_expect): Average life expectancy at birth in years. - Maternal Mortality (maternal_mortality): Maternal deaths per 100,000 live births. - Infant Mortality (infant_mortality): Deaths of infants under 1 year per 1,000 live births. - Neonatal Mortality (neonatal_mortality): Deaths of newborns (0–28 days) per 1,000 live births. - Under-5 Mortality (under_5_mortality): Deaths of children under 5 years per 1,000 live births. - HIV Prevalence (prev_hiv): Percentage of the population living with HIV. - Tuberculosis Incidence (inci_tuberc): Estimated new and relapse TB cases per 100,000 people. - Undernourishment Prevalence (prev_undernourishment): Percentage of the population that is undernourished.

    Use Cases: - Health Policy Analysis: Understand trends in healthcare expenditure and its relationship to health outcomes. - Global Health Research: Investigate global or regional disparities in health and nutrition. - Population Studies: Analyze population growth trends alongside health indicators. - Data Visualization: Build visual dashboards for storytelling and impactful data representation.

  10. GBIF Backbone Taxonomy

    • gbif.org
    • smng.net
    • +2more
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GBIF Secretariat (2023). GBIF Backbone Taxonomy [Dataset]. http://doi.org/10.15468/39omei
    Explore at:
    Dataset updated
    Nov 17, 2023
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The GBIF Backbone Taxonomy is a single, synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another.

    It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families. Additional scientific names only found in other authoritative nomenclatural and taxonomic datasets are then merged into the tree, thus extending the original catalogue and broadening the backbones name coverage. The GBIF Backbone taxonomy also includes identifiers for Operational Taxonomic Units (OTUs) drawn from the barcoding resources iBOL and UNITE.

    International Barcode of Life project (iBOL), Barcode Index Numbers (BINs). BINs are connected to a taxon name and its classification by taking into account all names applied to the BIN and picking names with at least 80% consensus. If there is no consensus of name at the species level, the selection process is repeated moving up the major Linnaean ranks until consensus is achieved.

    UNITE - Unified system for the DNA based fungal species, Species Hypotheses (SHs). SHs are connected to a taxon name and its classification based on the determination of the RefS (reference sequence) if present or the RepS (representative sequence). In the latter case, if there is no match in the UNITE taxonomy, the lowest rank with 100% consensus within the SH will be used.

    The GBIF Backbone Taxonomy is available for download at https://hosted-datasets.gbif.org/datasets/backbone/ in different formats together with an archive of all previous versions.

    The following 105 sources have been used to assemble the GBIF backbone with number of names given in brackets:

    • Catalogue of Life Checklist - 4766428 names
    • International Barcode of Life project (iBOL) Barcode Index Numbers (BINs) - 635951 names
    • UNITE - Unified system for the DNA based fungal species linked to the classification - 611208 names
    • The Paleobiology Database - 212054 names
    • World Register of Marine Species - 188857 names
    • The Interim Register of Marine and Nonmarine Genera - 183894 names
    • The World Checklist of Vascular Plants (WCVP) - 131891 names
    • GBIF Backbone Taxonomy - 114350 names
    • TAXREF - 109374 names
    • The Leipzig catalogue of vascular plants - 75380 names
    • ZooBank - 73549 names
    • Integrated Taxonomic Information System (ITIS) - 68377 names
    • Plazi.org taxonomic treatments database - 61346 names
    • Genome Taxonomy Database r207 - 60545 names
    • International Plant Names Index - 52329 names
    • Fauna Europaea - 45077 names
    • The National Checklist of Taiwan (Catalogue of Life in Taiwan, TaiCoL) - 36193 names
    • Dyntaxa. Svensk taxonomisk databas - 35892 names
    • The Plant List with literature - 32692 names
    • United Kingdom Species Inventory (UKSI) - 29643 names
    • Artsnavnebasen - 29208 names
    • The IUCN Red List of Threatened Species - 21221 names
    • Afromoths, online database of Afrotropical moth species (Lepidoptera) - 13961 names
    • Brazilian Flora 2020 project - Projeto Flora do Brasil 2020 - 13829 names
    • Prokaryotic Nomenclature Up-to-Date (PNU) - 10079 names
    • Checklist Dutch Species Register - Nederlands Soortenregister - 8814 names
    • ICTV Master Species List (MSL) - 7852 names
    • Cockroach Species File - 6020 names
    • GRIN Taxonomy - 5882 names
    • Taxon list of fungi and fungal-like organisms from Germany compiled by the DGfM - 4570 names
    • Catalogue of Afrotropical Bees - 3623 names
    • Catalogue of Tenebrionidae (Coleoptera) of North America - 3327 names
    • Checklist of Beetles (Coleoptera) of Canada and Alaska. Second Edition. - 3312 names
    • Systema Dipterorum - 2850 names
    • Catalogue of the Pterophoroidea of the World - 2807 names
    • The Clements Checklist - 2675 names
    • Taxon list of Hymenoptera from Germany compiled in the context of the GBOL project - 2496 names
    • IOC World Bird List, v13.2 - 2366 names
    • Official Lists and Indexes of Names in Zoology - 2310 names
    • National checklist of all species occurring in Denmark - 1922 names
    • Myriatrix - 1876 names
    • Database of Vascular Plants of Canada (VASCAN) - 1822 names
    • Taxon list of vascular plants from Bavaria, Germany compiled in the context of the BFL project - 1771 names
    • Orthoptera Species File - 1742 names
    • A list of the terrestrial fungi, flora and fauna of Madeira and Selvagens archipelagos - 1602 names
    • Aphid Species File - 1565 names
    • World Spider Catalog - 1561 names
    • Taxon list of Jurassic Pisces of the Tethys Palaeo-Environment compiled at the SNSB-JME - 1270 names
    • Backbone Family Classification Patch - 1143 names
    • GBIF Algae Classification - 1100 names
    • International Cichorieae Network (ICN): Cichorieae Portal - 975 names
    • Psocodea Species File - 803 names
    • New Zealand Marine Macroalgae Species Checklist - 787 names
    • Annotated checklist of endemic species from the Western Balkans - 754 names
    • Taxon list of animals with German names (worldwide) compiled at the SMNS - 503 names
    • Catalogue of the Alucitoidea of the World - 472 names
    • Lygaeoidea Species File - 462 names
    • Catálogo de Plantas y Líquenes de Colombia - 422 names
    • GBIF Backbone Patch - 317 names
    • Phasmida Species File - 259 names
    • Cortinariaceae fetched from the Index Fungorum API - 234 names
    • Coreoidea Species File - 233 names
    • GTDB supplement - 139 names
    • Mantodea Species File - 119 names
    • Endemic species in Taiwan - 93 names
    • Taxon list of Araneae from Germany compiled in the context of the GBOL project - 88 names
    • Species of Hominidae - 78 names
    • Taxon list of Sternorrhyncha from Germany compiled in the context of the GBOL project - 77 names
    • Taxon list of mosses from Germany compiled in the context of the GBOL project - 75 names
    • Mammal Species of the World - 73 names
    • Plecoptera Species File - 71 names
    • Species Fungorum Plus - 64 names
    • Catalogue of the type specimens of Cosmopterigidae (Lepidoptera: Gelechioidea) from research collections of the Zoological Institute, Russian Academy of Sciences - 47 names
    • Species named after famous people - 41 names
    • Dermaptera Species File - 36 names
    • Taxon list of Trichoptera from Germany compiled in the context of the GBOL project - 34 names
    • True Fruit Flies (Diptera, Tephritidae) of the Afrotropical Region - 33 names
    • Range and Regularities in the Distribution of Earthworms of the Earthworms of the USSR Fauna. Perel, 1979 - 32 names
    • Taxon list of Diplura from Germany compiled in the context of the GBOL project - 30 names
    • Lista de referencia de especies de aves de Colombia - 2022 - 24 names
    • Taxon list of Auchenorrhyncha from Germany compiled in the context of the GBOL project - 20 names
    • Catalogue of the type specimens of Polycestinae (Coleoptera: Buprestidae) from research collections of the Zoological Institute, Russian Academy of Sciences - 19 names
    • Taxon list of Thysanoptera from Germany compiled in the context of the GBOL project - 19 names
    • Lista de especies de vertebrados registrados en jurisdicción del Departamento del Huila - 18 names
    • Taxon list of Microcoryphia (Archaeognatha) from Germany compiled in the context of the GBOL project - 15 names
    • Catalogue of the type specimens of Bufonidae and Megophryidae (Amphibia: Anura) from research collections of the Zoological Institute, Russian Academy of Sciences - 12 names
    • Grylloblattodea Species File - 11 names
    • Coleorrhyncha Species File - 9 names
    • Taxon list of liverworts from Germany compiled in the context of the GBOL project - 9 names
    • Embioptera Species File - 7 names
    • Taxon list of Pisces and Cyclostoma from Germany compiled in the context of the GBOL project - 6 names
    • Taxon list of Pteridophyta from Germany compiled in the context of the GBOL project - 6 names
    • Taxon list of Siphonaptera from Germany compiled in the context of the GBOL project - 5 names
    • The Earthworms of the Fauna of Russia. Perel, 1997 - 5 names
    • Taxon list of Zygentoma from Germany compiled in the context of the GBOL project - 4 names
    • Asiloid Flies: new taxa of Diptera: Apioceridae, Asilidae, and Mydidae - 3 names
    • Taxon list of Protura from Germany compiled in the context of the GBOL project - 3 names
    • Taxon list of hornworts from Germany compiled in the context of the GBOL project - 2 names
    • Chrysididae Species File - 1 names
    • Taxon list of Dermaptera from Germany compiled in the context of the GBOL project - 1 names
    • Taxon list of Diplopoda from Germany in the context of the GBOL project - 1 names
    • Taxon list of Orthoptera (Grashoppers) from Germany compiled at the SNSB - 1 names
    • Taxon list of Pscoptera from Germany compiled in the context of the GBOL project - 1 names
    • Taxon list of Pseudoscorpiones from Germany compiled in the context of the GBOL project - 1 names
    • Taxon list of Raphidioptera from Germany compiled in the context of the GBOL project - 1 names

  11. Z

    Worldwide Soundscapes project metadata and analysis scripts

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amandine Gasc (2025). Worldwide Soundscapes project metadata and analysis scripts [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6486835
    Explore at:
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    Thomas Cherico Wanger
    Youfang Chen
    Amandine Gasc
    Dong, Lijun
    Li, Songhai
    Steven Van Wilgenburg
    Rodney Rountree
    Kevin F.A. Darras
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated passive acoustic monitoring meta-datasets (i.e. meta-data collections). This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description. Additionally, R scripts are provided to replicate the analysis published in [placeholder].

    The overview of all sampling sites and timelines can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. The recordings of this collection were annotated and analysed to explore macro-ecological trends.

    The audio recording criteria justifying inclusion into the meta-database are:

    Stationary (no transects, towed sensors or microphones mounted on cars)

    Passive (unattended, no human disturbance by the recordist)

    Ambient (no directional microphone or triggered recordings, non-experimental conditions)

    Spatially and/or temporally replicated (i.e. multiple sites sampled at the same time and/or multiple days - covering the same daytime - sampled at the same site)

    The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database. The data shared here only includes validated collections.

    Changes from version 3.0.1

    Added files needed to reproduce the metadata and the acoustic analyses found in the publication.

    Dropped underused fields: spatial_selection, temporal_exclusion, freshwater_recordist_position from collections table; secondary realm, biome, and functional group from sites table.

    Meta-database CSV files

    collections

    collection_id: unique integer, primary key

    name: name of the dataset. if it is repeated, incremental integers should be used in the "subset" column to differentiate them.

    ecoSound-web_link: link of validated meta-collection on ecoSound-web

    primary_contributors: full names of people deemed corresponding contributors who are responsible for the dataset

    secondary_contributors: full names of people who are not primary contributors but who have significantly contributed to the dataset, and who could be contacted for in-depth analyses

    date_added: when the datased was added (YYYY-MM-DD)

    URL_open_recordings: internet link for openly-available recordings from this collection

    URL_project: internet link for further information about the corresponding project

    DOI_publication: Digital Object Identifiers of corresponding publications

    core_realm_IUCN: The main, core realm of the dataset according to IUCN Global Ecosystem Typology (v2.0): https://global-ecosystems.org/

    medium: the physical medium the microphone is situated in

    locality: optional free text about the locality

    contributor_comments: free-text field for comments by the primary contributors

    collections-sites

    dataset_ID: primary key of collections table

    site_ID: primary key of sites table

    sites

    site_ID: unique integer, primary key

    site_name: internal name or code of sampling site as used in respective projects

    latitude_numeric: site's numeric degrees of latitude

    longitude_numeric: site's numeric degrees of longitude

    blurred_coordinates: whether latitude and longitude coordinates are inaccurate, boolean. Coordinates may be blurred with random offsets, rounding, snapping, etc. Indicate the blurring method inside the comments field

    topography_m: vertical position of the microphone relative to the sea level. for sites on land: elevation. For marine sites: depth (negative). in meters. Only indicate if the values were measured by the collaborator.

    freshwater_depth_m: microphone depth, only used for sites inside freshwater bodies that also have an elevation value above the sea level

    realm: Ecosystem type: main realm according to IUCN GET https://global-ecosystems.org/

    biome: Ecosystem type: main biome according to IUCN GET https://global-ecosystems.org/

    functional_group: Ecosystem type: main functional group according to IUCN GET https://global-ecosystems.org/

    contributor_comments: free text field for contributor comments

    GADM_0: Global ADMinistrative Database level 0 classification of terrestrial site or marine site that is within territorial waters. Source: https://gadm.org/download_world.html

    IHO: International Hydrographic Organization classification of marine site. Source: https://marineregions.org/downloads.php

    WDPA: World Database on Protected Areas classification of the site. Source: https://www.protectedplanet.net/en/thematic-areas/wdpa?tab=WDPA

    deployments

    dataset_ID: primary key of datasets table

    deployment: identical subscript letters to denote rows that belong to the same deployment. For instance, you may use different operation times and schedules for different target taxa within one deployment.

    subset_site_ID: If the deployment was not done in all the sites of the corresponding collection, site IDs where the deployment was conducted

    start_date: date of deployment start

    start_time_mixed: deployment start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset). Corresponds to the recording start time for continuous recording deployments. If multiple start times were used, you should mention the latest start time (corresponds to the earliest daytime from which all recorders are active). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

    permanent: whether the deployment is permanent, boolean

    end_date: date of deployment end (date when last scheduled operation starts)

    end_time_mixed: deployment end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording end time for continuous recording deployments.

    operation_mode: continuous: recording takes place from the deployment start date-time to deployment end date-time.periodical: recording takes place periodically (i.e., with duty cycle) from the deployment start date-time to deployment end date-time.scheduled: recording takes place during scheduled daily time intervals (optionally with duty cycle)

    duty_cycle_minutes: duty cycle of the recording (i.e. the fraction of minutes when it is recording), written as "recording(minutes)/period(minutes)". empty if no duty cycle is used. For example: "1/6" if the recorder is active for 1 minute and standing by for 5 minutes

    operation_start_time_mixed: only for scheduled recordings: start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

    operation_duration_minutes: only for scheduled recordings: duration of operation in minutes, if constant

    operation_end_time_mixed: only for scheduled recordings: end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Only required if durations are variable. Do not use when end times are ambiguous (for instance, if a recording could be 1 hour or 25 hours long because the end is on the next day). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

    high_pass_filter_Hz: frequency of the high-pass filter of the recorder if applied, in Hz. Otherwise, write "none". This may be called a "low-cut" filter too.

    bit_depth: sampling bit depth of the recordings. Often constant for a particular recorder

    channels: number of recorded audio channels

    sampling_frequency_kHz: frequency at which the microphone signal was sampled by the recorder (sounds of half that frequency will be recorded)

    recorder: recorder used for deployment

    microphone: microphone used for deployment

    target_taxa: main IUCN animal taxa that were studied with this deployment, using the exact IUCN Red list names (http://www.iucnredlist.org/), separated by commas. Only genera, families, orders, and classes are accepted. Empty if there was no taxonomic focus (i.e., general soundscapes were the study focus).

    contributor_comments: free text field for contributor comments

    exact_recordings: whether the deployment data here have been superseded by inserting more exact recording date-time ranges into the meta-collection on ecoSound-web

    recordings (partial download from ecoSound-web)

    recording_id: primary key of the recordings table

    collection_id: ID of the collection the recording belongs to

    name: name of the recording

    site_id: site ID the recording belongs to:

    recorder_id: ID of the recorder used for the recording (internal ecoSound-web code)

    microphone_id: ID of the microphone used for the recording (internal ecoSound-web code)

    recording_gain:recording gain applied for amplifying the audio signal, in decibels

    duty_cycle_recording: fraction of the recording periode when the recorder is actively recording audio

    duty_cycle_period: period of the duty cycle, i.e., time between the starts of two subsequent recordings

    note: comments (contains the target taxon)

    file_date: date of the recording start

    file_time: local time of the recording start

    sampling_rate: audio sampling rate in Hz

    bitdepth: depth in bits for each audio sample

    channel_num: number of channels

    duration: duration of the recording in seconds. Note: duty-cycled recordings cover only a proportion of this duration

    affiliations

    affiliation_id: primary key of affiliations table

    lab_research_group: Laboratory or research group name

    department_school_institute: department, school, or institute name

    university_institution: University or institution name

    street_address: street address

    region_state_province_city: region, state, province, or city name

    postal_code: postal code

    country: country

  12. A

    ‘Vehicle Miles Traveled During Covid-19 Lock-Downs ’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Vehicle Miles Traveled During Covid-19 Lock-Downs ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-vehicle-miles-traveled-during-covid-19-lock-downs-636d/latest
    Explore at:
    Dataset updated
    Jan 4, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Vehicle Miles Traveled During Covid-19 Lock-Downs ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/vehicle-miles-travelede on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    **This data set was last updated 3:30 PM ET Monday, January 4, 2021. The last date of data in this dataset is December 31, 2020. **

    Overview

    Data shows that mobility declined nationally since states and localities began shelter-in-place strategies to stem the spread of COVID-19. The numbers began climbing as more people ventured out and traveled further from their homes, but in parallel with the rise of COVID-19 cases in July, travel declined again.

    This distribution contains county level data for vehicle miles traveled (VMT) from StreetLight Data, Inc, updated three times a week. This data offers a detailed look at estimates of how much people are moving around in each county.

    Data available has a two day lag - the most recent data is from two days prior to the update date. Going forward, this dataset will be updated by AP at 3:30pm ET on Monday, Wednesday and Friday each week.

    This data has been made available to members of AP’s Data Distribution Program. To inquire about access for your organization - publishers, researchers, corporations, etc. - please click Request Access in the upper right corner of the page or email kromano@ap.org. Be sure to include your contact information and use case.

    Findings

    • Nationally, data shows that vehicle travel in the US has doubled compared to the seven-day period ending April 13, which was the lowest VMT since the COVID-19 crisis began. In early December, travel reached a low not seen since May, with a small rise leading up to the Christmas holiday.
    • Average vehicle miles traveled continues to be below what would be expected without a pandemic - down 38% compared to January 2020. September 4 reported the largest single day estimate of vehicle miles traveled since March 14.
    • New Jersey, Michigan and New York are among the states with the largest relative uptick in travel at this point of the pandemic - they report almost two times the miles traveled compared to their lowest seven-day period. However, travel in New Jersey and New York is still much lower than expected without a pandemic. Other states such as New Mexico, Vermont and West Virginia have rebounded the least.

    About This Data

    The county level data is provided by StreetLight Data, Inc, a transportation analysis firm that measures travel patterns across the U.S.. The data is from their Vehicle Miles Traveled (VMT) Monitor which uses anonymized and aggregated data from smartphones and other GPS-enabled devices to provide county-by-county VMT metrics for more than 3,100 counties. The VMT Monitor provides an estimate of total vehicle miles travelled by residents of each county, each day since the COVID-19 crisis began (March 1, 2020), as well as a change from the baseline average daily VMT calculated for January 2020. Additional columns are calculations by AP.

    Included Data

    01_vmt_nation.csv - Data summarized to provide a nationwide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.

    02_vmt_state.csv - Data summarized to provide a statewide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.

    03_vmt_county.csv - Data providing a county level look at vehicle miles traveled. Includes VMT estimate, percent change compared to January and seven day rolling averages to smooth out the trend lines over time.

    Additional Data Queries

    * Filter for specific state - filters 02_vmt_state.csv daily data for specific state.

    * Filter counties by state - filters 03_vmt_county.csv daily data for counties in specific state.

    * Filter for specific county - filters 03_vmt_county.csv daily data for specific county.

    Interactive

    The AP has designed an interactive map to show percent change in vehicle miles traveled by county since each counties lowest point during the pandemic:

    This dataset was created by Angeliki Kastanis and contains around 0 samples along with Date At Low, Mean7 County Vmt At Low, technical information and other features such as: - County Name - County Fips - and more.

    How to use this dataset

    • Analyze State Name in relation to Baseline Jan Vmt
    • Study the influence of Date At Low on Mean7 County Vmt At Low
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Angeliki Kastanis

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  13. Synthetic population for USA_ALABAMA

    • zenodo.org
    • explore.openaire.eu
    bin, pdf, zip
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie; Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie (2024). Synthetic population for USA_ALABAMA [Dataset]. http://doi.org/10.5281/zenodo.6505866
    Explore at:
    pdf, zip, binAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie; Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States, Alabama
    Description

    Synthetic populations for regions of the World (SPW) | Alabama

    Dataset information

    A synthetic population of a region as provided here, captures the people of the region with selected demographic attributes, their organization into households, their assigned activities for a day, the locations where the activities take place and thus where interactions among population members happen (e.g., spread of epidemics).

    License

    CC-BY-4.0

    Acknowledgment

    This project was supported by the National Science Foundation under the NSF RAPID: COVID-19 Response Support: Building Synthetic Multi-scale Networks (PI: Madhav Marathe, Co-PIs: Henning Mortveit, Srinivasan Venkatramanan; Fund Number: OAC-2027541).

    Contact information

    Henning.Mortveit@virginia.edu

    Identifiers

    Region nameAlabama
    Region IDusa_140002904
    Modelcoarse
    Version0_9_0

    Statistics

    NameValue
    Population4768478
    Average age37.8
    Households1933164
    Average household size2.5
    Residence locations1933164
    Activity locations398709
    Average number of activities5.7
    Average travel distance65.0

    Sources

    DescriptionNameVersionUrl
    Activity template dataWorld Bank2021https://data.worldbank.org
    Administrative boundariesADCW7.6https://www.adci.com/adc-worldmap
    Curated POIs based on OSMSLIPO/OSM POIshttp://slipo.eu/?p=1551 https://www.openstreetmap.org/
    Household dataIPUMShttps://international.ipums.org/international
    Population count with demographic attributesGPWv4.11https://sedac.ciesin.columbia.edu/data/set/gpw-v4-admin-unit-center-points-population-estimates-rev11

    Files description

    Base data files (usa_140002904_data_v_0_9.zip)

    FilenameDescription
    usa_140002904_person_v_0_9.csvData for each person including attributes such as age, gender, and household ID.
    usa_140002904_household_v_0_9.csvData at household level.
    usa_140002904_residence_locations_v_0_9.csvData about residence locations
    usa_140002904_activity_locations_v_0_9.csvData about activity locations, including what activity types are supported at these locations
    usa_140002904_activity_location_assignment_v_0_9.csvFor each person and for each of their activities, this file specifies the location where the activity takes place

    Derived data files

    FilenameDescription
    usa_140002904_contact_matrix_v_0_9.csvA POLYMOD-type contact matrix constructed from a network representation of the location assignment data and a within-location contact model.

    Validation and measures files

    FilenameDescription
    usa_140002904_household_grouping_validation_v_0_9.pdfValidation plots for household construction
    usa_140002904_activity_durations_{adult,child}_v_0_9.pdfComparison of time spent on generated activities with survey data
    usa_140002904_activity_patterns_{adult,child}_v_0_9.pdfComparison of generated activity patterns by the time of day with survey data
    usa_140002904_location_construction_0_9.pdfValidation plots for location construction
    usa_140002904_location_assignement_0_9.pdfValidation plots for location assignment, including travel distribution plots
    usa_140002904_usa_140002904_ver_0_9_0_avg_travel_distance.pdfChoropleth map visualizing average travel distance
    usa_140002904_usa_140002904_ver_0_9_0_travel_distr_combined.pdfTravel distance distribution
    usa_140002904_usa_140002904_ver_0_9_0_num_activity_loc.pdfChoropleth map visualizing number of activity locations
    usa_140002904_usa_140002904_ver_0_9_0_avg_age.pdfChoropleth map visualizing average age
    usa_140002904_usa_140002904_ver_0_9_0_pop_density_per_sqkm.pdfChoropleth map visualizing population density
    usa_140002904_usa_140002904_ver_0_9_0_pop_size.pdfChoropleth map visualizing population size

  14. o

    Global News Popularity Insights Datset

    • opendatabay.com
    .undefined
    Updated Jul 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Global News Popularity Insights Datset [Dataset]. https://www.opendatabay.com/data/ai-ml/b036c2ea-2b40-4afe-8dc2-1c56302ffdbc
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Social Media and Networking
    Description

    This dataset captures the popularity of news articles across various social media platforms, providing valuable insights into how news content performs online [1, 2]. It is a subset of a larger dataset, specifically designed for analysing engagement and reach of news items [1, 2]. The data includes key details about news articles and their final popularity scores on Facebook, Google+, and LinkedIn [1-3]. It serves as an excellent resource for understanding social media trends and the dissemination of news [2].

    Columns

    The dataset features the following columns: * IDLink: A unique identifier for each news item [1, 2]. * Title: The title of the news item as it appeared from the official media sources [1, 2]. * Headline: The headline of the news item, also from official media sources [1, 2]. * Source: The original news outlet that published the news item [1, 2]. * Topic: The query topic used to obtain the news items from official media sources [1, 2]. * PublishDate: The date and time when the news item was published [1, 2]. * Facebook: The final popularity score of the news item on Facebook [2, 3]. * GooglePlus: The final popularity score of the news item on Google+ [2, 3]. * LinkedIn: The final popularity score of the news item on LinkedIn [2, 3]. This subset of the data is specifically noted to be missing the 'SentimentTitle' and 'SentimentHeadline' columns that are present in the full dataset [1].

    Distribution

    This dataset comprises approximately 37,000 news articles [1]. While exact row counts for files are not specified beyond this total, the dataset format is typically CSV [4]. * Unique Values: * IDLink: 37,288 unique values [3]. * Title: 32,366 unique values [3]. * Headline: 34,634 unique values [3]. * Source Distribution: * Bloomberg: 2% [3]. * Reuters: 1% [3]. * Other: 97% (from 35,990 sources) [3]. * Topic Distribution: * Economy: 36% [3]. * Obama: 31% [3]. * Other: 33% (from 12,165 topics) [3]. * Time Range Sample (2016): * 03/29 - 04/03: 2,239 items [5]. * 04/03 - 04/08: 2,020 items [5]. * 06/17 - 06/22: 1,650 items [5]. * 06/27 - 07/02: 2,024 items [5]. The data spans from 2016-03-29 to 2016-07-07 [6].

    Usage

    This dataset is ideal for: * Analysing news popularity trends across different social media platforms [2]. * Studying the impact of news content on online engagement [2]. * Exploratory data analysis of news consumption patterns [7]. * Understanding the spread of information in digital environments. * Developing models to predict social media reach for news articles. * Insights into media outlets' influence and topic relevance [1, 3].

    Coverage

    The dataset covers an approximate 8-month period, between November 2015 and July 2016 [2]. The specific subset provided covers 29 March 2016 to 07 July 2016 [6]. It includes news items on four primary topics: economy, Microsoft, Obama, and Palestine [2], with distribution details for 'economy' and 'obama' [3]. The region of coverage is global [8].

    License

    CCO

    Who Can Use It

    • Data Scientists and Analysts: For exploratory data analysis, feature engineering, and model building related to news popularity and social media engagement [7].
    • Researchers: Studying media studies, social network analysis, and public opinion.
    • Marketing Professionals: To understand content virality and optimise news dissemination strategies.
    • Journalists and Media Organisations: For insights into their content performance and audience engagement on social platforms.

    Dataset Name Suggestions

    • Social Media News Popularity
    • Online News Engagement Metrics
    • Digital News Dissemination Data
    • News Virality on Social Platforms
    • Global News Popularity Insights

    Attributes

    Original Data Source: News Popularity in Multiple Social Media Platforms

  15. Z

    Dataset for the Article "A Predictive Method to Improve the Effectiveness of...

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Riccardo Martoglia (2021). Dataset for the Article "A Predictive Method to Improve the Effectiveness of Twitter Communication in a Cultural Heritage Scenario" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4782983
    Explore at:
    Dataset updated
    May 24, 2021
    Dataset provided by
    Federica Mandreoli
    Marco Furini
    Riccardo Martoglia
    Manuela Montangero
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset for the article "A Predictive Method to Improve the Effectiveness of Twitter Communication in a Cultural Heritage Scenario".

    Abstract:

    Museums are embracing social technologies in the attempt to broaden their audience and to engage people. Although social communication seems an easy task, media managers know how hard it is to reach millions of people with a simple message. Indeed, millions of posts are competing every day to get visibility in terms of likes and shares and very little research focused on museums communication to identify best practices. In this paper, we focus on Twitter and we propose a novel method that exploits interpretable machine learning techniques to: (a) predict whether a tweet will likely be appreciated by Twitter users or not; (b) present simple suggestions that will help enhancing the message and increasing the probability of its success. Using a real-world dataset of around 40,000 tweets written by 23 world famous museums, we show that our proposed method allows identifying tweet features that are more likely to influence the tweet success.

    Code to run a selection of experiments is available at https://github.com/rmartoglia/predict-twitter-ch

    Dataset structure

    The dataset contains the dataset used in the experiments of the above research paper. Only the extracted features for the museum tweet threads (and not the message full text) are provided and needed for the analyses.

    We selected 23 well known world spread art museums and grouped them into five groups: G1 (museums with at least three million of followers); G2 (museums with more than one million of followers); G3 (museums with more than 400,000 followers); G4 (museums with more that 200,000 followers); G5 (Italian museums). From these museums, we analyzed ca. 40,000 tweets, with a number varying from 5k ca. to 11k ca. for each museum group, depending on the number of museums in each group.

    Content features: these are the features that can be drawn form the content of the tweet itself. We further divide such features in the following two categories:

    – Countable: these features have a value ranging into different intervals. We take into consideration: the number of hashtags (i.e., words preceded by #) in the tweet, the number of URLs (i.e., links to external resources), the number of images (e.g., photos and graphical emoticons), the number of mentions (i.e., twitter accounts preceded by @), the length of the tweet;

    – On-Off : these features have binary values in {0, 1}. We observe whether the tweet has exclamation marks, question marks, person names, place names, organization names, other names. Moreover, we also take into consideration the tweet topic density: assuming that the involved topics correspond to the hashtags mentioned in the text, we define a tweet as dense of topics if the number of hashtags it contains is greater than a given threshold, set to 5. Finally, we observe the tweet sentiment that might be present (positive or negative) or not (neutral).

    Context features: these features are not drawn form the content of the tweet itself and might give a larger picture of the context in which the tweet was sent. Namely, we take into consideration the part of the day in which the tweet was sent (morning, afternoon, evening and night respectively from 5:00am to 11:59am, from 12:00pm to 5:59pm, from 6:00pm to 10:59pm and from 11pm to 4:59am), and a boolean feature indicating whether the tweet is a retweet or not.

    User features: these features are proper of the user that sent the tweet, and are the same for all the tweets of this user. Namely we consider the name of the museum and the number of followers of the user.

  16. o

    Global B2B people Data | 720M+ LinkedIn Profiles | Verified & Bi-Weekly...

    • opendatabay.com
    .undefined
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Forager (2025). Global B2B people Data | 720M+ LinkedIn Profiles | Verified & Bi-Weekly Updates [Dataset]. https://www.opendatabay.com/data/premium/5ff38f72-201c-469b-aa7c-5cba9ddb2ac3
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset authored and provided by
    Forager
    Area covered
    Synthetic Data Generation
    Description

    🌍 Global B2B Person Dataset | 755M+ LinkedIn Profiles | Verified & Bi-Weekly Updated Access the world’s most comprehensive professional dataset, enriched with over 755 million LinkedIn profiles. The Forager.ai Global B2B Person Dataset delivers work-verified professional contacts with 95%+ accuracy, refreshed every two weeks. Ideal for recruitment, sales, research, and talent mapping, it provides direct access to decision-makers, specialists, and executives across industries and geographies.

    Dataset Features Full Name & Job Title: Up-to-date first/last name with current professional role.

    Emails & Phone Numbers: AI-validated work and personal email addresses, plus mobile numbers.

    Company Info: Current employer name, industry, and company size (employee count).

    Career History: Detailed work history with job titles, durations, and role progressions.

    Skills & Endorsements: Extracted from public LinkedIn profiles.

    Education & Certifications: Universities, degrees, and professional certifications.

    Location & LinkedIn URL: City, country, and direct link to public LinkedIn profile.

    Distribution Data Volume: 755M+ total profiles, with 270M+ containing full contact information.

    Formats Available: CSV, JSON via S3 or Snowflake; API for real-time access.

    Access Methods: REST API, Enrichment API (lookup), full dataset delivery, or custom solutions.

    Usage This dataset is ideal for a variety of applications:

    Executive Recruitment: Source passive talent, build role-based maps, and assess mobility.

    Sales Intelligence: Find decision-makers, personalize outreach, and trigger campaigns on job changes.

    Market Research: Understand talent concentration by company, geography, and skill set.

    Partnership Development: Identify key stakeholders in target firms for business development.

    Talent Mapping & Strategic Hiring: Build full organizational charts and skill distribution heatmaps.

    Coverage Geographic Coverage: Global – including North America, EMEA, LATAM, and APAC.

    Time Range: Continuously updated; profiles refreshed bi-weekly.

    Demographics: Cross-industry coverage of seniority levels from entry-level to C-suite, across all sectors.

    License CUSTOM

    Who Can Use It Recruiters & Staffing Firms: For building target lists and sourcing niche talent.

    Sales & RevOps Teams: For targeting by department, title, or decision-making authority.

    VCs & PE Firms: To assess leadership teams and monitor executive movement.

    Data Scientists & Analysts: To train models for job mobility, hiring trends, or org structure prediction.

    B2B Platforms: For enriching internal databases and powering account-based marketing (ABM).

  17. Z

    ORBITAAL: cOmpRehensive BItcoin daTaset for temporAl grAph anaLysis

    • data.niaid.nih.gov
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cazabet, Remy (2024). ORBITAAL: cOmpRehensive BItcoin daTaset for temporAl grAph anaLysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10844224
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    Coquidé, Célestin
    Cazabet, Remy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Construction

    This dataset captures the temporal network of Bitcoin (BTC) flow exchanged between entities at the finest time resolution in UNIX timestamp. Its construction is based on the blockchain covering the period from January, 3rd of 2009 to January the 25th of 2021. The blockchain extraction has been made using bitcoin-etl (https://github.com/blockchain-etl/bitcoin-etl) Python package. The entity-entity network is built by aggregating Bitcoin addresses using the common-input heuristic [1] as well as popular Bitcoin users' addresses provided by https://www.walletexplorer.com/

    [1] M. Harrigan and C. Fretter, "The Unreasonable Effectiveness of Address Clustering," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 2016, pp. 368-373, doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071.keywords: {Online banking;Merging;Protocols;Upper bound;Bipartite graph;Electronic mail;Size measurement;bitcoin;cryptocurrency;blockchain},

    Dataset Description

    Bitcoin Activity Temporal Coverage: From 03 January 2009 to 25 January 2021

    Overview:

    This dataset provides a comprehensive representation of Bitcoin exchanges between entities over a significant temporal span, spanning from the inception of Bitcoin to recent years. It encompasses various temporal resolutions and representations to facilitate Bitcoin transaction network analysis in the context of temporal graphs.

    Every dates have been retrieved from bloc UNIX timestamp and GMT timezone.

    Contents:

    The dataset is distributed across three compressed archives:

    All data are stored in the Apache Parquet file format, a columnar storage format optimized for analytical queries. It can be used with pyspark Python package.

    orbitaal-stream_graph.tar.gz:

    The root directory is STREAM_GRAPH/

    Contains a stream graph representation of Bitcoin exchanges at the finest temporal scale, corresponding to the validation time of each block (averaging approximately 10 minutes).

    The stream graph is divided into 13 files, one for each year

    Files format is parquet

    Name format is orbitaal-stream_graph-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering

    These files are in the subdirectory STREAM_GRAPH/EDGES/

    orbitaal-snapshot-all.tar.gz:

    The root directory is SNAPSHOT/

    Contains the snapshot network representing all transactions aggregated over the whole dataset period (from Jan. 2009 to Jan. 2021).

    Files format is parquet

    Name format is orbitaal-snapshot-all.snappy.parquet.

    These files are in the subdirectory SNAPSHOT/EDGES/ALL/

    orbitaal-snapshot-year.tar.gz:

    The root directory is SNAPSHOT/

    Contains the yearly resolution of snapshot networks

    Files format is parquet

    Name format is orbitaal-snapshot-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering

    These files are in the subdirectory SNAPSHOT/EDGES/year/

    orbitaal-snapshot-month.tar.gz:

    The root directory is SNAPSHOT/

    Contains the monthly resoluted snapshot networks

    Files format is parquet

    Name format is orbitaal-snapshot-date-[YYYY]-[MM]-file-id-[ID].snappy.parquet, where

    [YYYY] and [MM] stands for the corresponding year and month, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year and month ordering

    These files are in the subdirectory SNAPSHOT/EDGES/month/

    orbitaal-snapshot-day.tar.gz:

    The root directory is SNAPSHOT/

    Contains the daily resoluted snapshot networks

    Files format is parquet

    Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-file-id-[ID].snappy.parquet, where

    [YYYY], [MM], and [DD] stand for the corresponding year, month, and day, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, and day ordering

    These files are in the subdirectory SNAPSHOT/EDGES/day/

    orbitaal-snapshot-hour.tar.gz:

    The root directory is SNAPSHOT/

    Contains the hourly resoluted snapshot networks

    Files format is parquet

    Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-[hh]-file-id-[ID].snappy.parquet, where

    [YYYY], [MM], [DD], and [hh] stand for the corresponding year, month, day, and hour, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, day and hour ordering

    These files are in the subdirectory SNAPSHOT/EDGES/hour/

    orbitaal-nodetable.tar.gz:

    The root directory is NODE_TABLE/

    Contains two files in parquet format, the first one gives information related to nodes present in stream graphs and snapshots such as period of activity and associated global Bitcoin balance, and the other one contains the list of all associated Bitcoin addresses.

    Small samples in CSV format

    orbitaal-stream_graph-2016_07_08.csv and orbitaal-stream_graph-2016_07_09.csv

    These two CSV files are related to stream graph representations of an halvening happening in 2016.

    orbitaal-snapshot-2016_07_08.csv and orbitaal-snapshot-2016_07_09.csv

    These two CSV files are related to daily snapshot representations of an halvening happening in 2016.

  18. A

    ‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-list-of-top-data-breaches-2004-2021-e7ac/746cf4e2/?iid=002-608&v=presentation
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘List of Top Data Breaches (2004 - 2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hishaamarmghan/list-of-top-data-breaches-2004-2021 on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    This is a dataset containing all the major data breaches in the world from 2004 to 2021

    As we know, there is a big issue related to the privacy of our data. Many major companies in the world still to this day face this issue every single day. Even with a great team of people working on their security, many still suffer. In order to tackle this situation, it is only right that we must study this issue in great depth and therefore I pulled this data from Wikipedia to conduct data analysis. I would encourage others to take a look at this as well and find as many insights as possible.

    This data contains 5 columns: 1. Entity: The name of the company, organization or institute 2. Year: In what year did the data breach took place 3. Records: How many records were compromised (can include information like email, passwords etc.) 4. Organization type: Which sector does the organization belong to 5. Method: Was it hacked? Were the files lost? Was it an inside job?

    Here is the source for the dataset: https://en.wikipedia.org/wiki/List_of_data_breaches

    Here is the GitHub link for a guide on how it was scraped: https://github.com/hishaamarmghan/Data-Breaches-Scraping-Cleaning

    --- Original source retains full ownership of the source dataset ---

  19. Freebase Datasets for Robust Evaluation of Knowledge Graph Link Prediction...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li; Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li (2023). Freebase Datasets for Robust Evaluation of Knowledge Graph Link Prediction Models [Dataset]. http://doi.org/10.5281/zenodo.7909511
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li; Nasim Shirvani Mahdavi; Farahnaz Akrami; Mohammed Samiul Saeef; Xiao Shi; Chengkai Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Freebase is amongst the largest public cross-domain knowledge graphs. It possesses three main data modeling idiosyncrasies. It has a strong type system; its properties are purposefully represented in reverse pairs; and it uses mediator objects to represent multiary relationships. These design choices are important in modeling the real-world. But they also pose nontrivial challenges in research of embedding models for knowledge graph completion, especially when models are developed and evaluated agnostically of these idiosyncrasies. We make available several variants of the Freebase dataset by inclusion and exclusion of these data modeling idiosyncrasies. This is the first-ever publicly available full-scale Freebase dataset that has gone through proper preparation.

    Dataset Details

    The dataset consists of the four variants of Freebase dataset as well as related mapping/support files. For each variant, we made three kinds of files available:

    • Subject matter triples file
      • fb+/-CVT+/-REV One folder for each variant. In each folder there are 5 files: train.txt, valid.txt, test.txt, entity2id.txt, relation2id.txt Subject matter triples are the triples belong to subject matters domains—domains describing real-world facts.
        • Example of a row in train.txt, valid.txt, and test.txt:
          • 2, 192, 0
        • Example of a row in entity2id.txt:
          • /g/112yfy2xr, 2
        • Example of a row in relation2id.txt:
          • /music/album/release_type, 192
        • Explaination
          • "/g/112yfy2xr" and "/m/02lx2r" are the MID of the subject entity and object entity, respectively. "/music/album/release_type" is the realtionship between the two entities. 2, 192, and 0 are the IDs assigned by the authors to the objects.
    • Type system file
      • freebase_endtypes: Each row maps an edge type to its required subject type and object type.
        • Example
          • 92, 47178872, 90
        • Explanation
          • "92" and "90" are the type id of the subject and object which has the relationship id "47178872".
    • Metadata files
      • object_types: Each row maps the MID of a Freebase object to a type it belongs to.
        • Example
          • /g/11b41c22g, /type/object/type, /people/person
        • Explanation
          • The entity with MID "/g/11b41c22g" has a type "/people/person"
      • object_names: Each row maps the MID of a Freebase object to its textual label.
        • Example
          • /g/11b78qtr5m, /type/object/name, "Viroliano Tries Jazz"@en
        • Explanation
          • The entity with MID "/g/11b78qtr5m" has name "Viroliano Tries Jazz" in English.
      • object_ids: Each row maps the MID of a Freebase object to its user-friendly identifier.
        • Example
          • /m/05v3y9r, /type/object/id, "/music/live_album/concert"
        • Explanation
          • The entity with MID "/m/05v3y9r" can be interpreted by human as a music concert live album.
      • domains_id_label: Each row maps the MID of a Freebase domain to its label.
        • Example
          • /m/05v4pmy, geology, 77
        • Explanation
          • The object with MID "/m/05v4pmy" in Freebase is the domain "geology", and has id "77" in our dataset.
      • types_id_label: Each row maps the MID of a Freebase type to its label.
        • Example
          • /m/01xljxh, /government/political_party, 147
        • Explanation
          • The object with MID "/m/01xljxh" in Freebase is the type "/government/political_party", and has id "147" in our dataset.
      • entities_id_label: Each row maps the MID of a Freebase entity to its label.
        • Example
          • /g/11b78qtr5m, Viroliano Tries Jazz, 2234
        • Explanation
          • The entity with MID "/g/11b78qtr5m" in Freebase is "Viroliano Tries Jazz", and has id "2234" in our dataset.
        • properties_id_label: Each row maps the MID of a Freebase property to its label.
          • Example
            • /m/010h8tp2, /comedy/comedy_group/members, 47178867
          • Explanation
            • The object with MID "/m/010h8tp2" in Freebase is a property(relation/edge), it has label "/comedy/comedy_group/members" and has id "47178867" in our dataset.
        • uri_original2simplified and uri_simplified2original: The mapping between original URI and simplified URI and the mapping between simplified URI and original URI repectively.

  20. A dataset related to the Batwa’s Right to Recognition as a Minority and...

    • figshare.com
    xlsx
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ndikubwimana Jean Baptiste; Anangwe Kathleen A; Nyarwath Oriare; Mwimali Jack; Kabwete Charles (2023). A dataset related to the Batwa’s Right to Recognition as a Minority and Indigenous People in Rwanda [Dataset]. http://doi.org/10.6084/m9.figshare.24612147.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ndikubwimana Jean Baptiste; Anangwe Kathleen A; Nyarwath Oriare; Mwimali Jack; Kabwete Charles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Rwanda
    Description

    This codebook of data is related to the study conducted on the Batwa’s Rights to Recognition as a Minority and Indigenous People in Rwanda through the Lens of a Human Rights Based Approach. The dataset displays information in 7 columns. The first column is called Code level 1 which consists of the main code extracted from the findings, the second column is code level 2 which consists of sub-codes extracted from code level 1 and the third column is called code level 3 which is extracted from code level 2. The 4th column provides a snapshot of definition of the content of the codes. The column 5 concerns what the codes should include and the 6th column concerns what the codes should not include. The 7th column concerns the types of questions asked to respondents based on which codes were generated. These codes were generated following data extracted from questionnaire summarized in 7th column. For example, the first column (Code level 1) is made of 4 rows. The first two rows concern findings from the literature review and the last two rows concern empirical data from the fieldwork. Both data from literature review and empirical data from the fieldwork were combined to come up with findings based on which an interpretation was made. These codes allowed the researchers to give meaningful findings which in return facilitated researchers to provided a consolidated interpretation. The data generated aligned to epistemological interpretivism and they concern views from respondents on socio-cultural narratives and emotional experiences that the they have endured in their lives. The data collection was conducted in three rural districts of Nyaruguru (southern province), Rubavu and Rutsiro (western Province) and in three urban districts of Nyarugenge, Kicukiro and Gasabo (Kigali City). The justification for the three rural and three urban districts was to find out if there were divergent socio-cultural realities within each and across the diverse settings. The selected rural sites were those near protected areas from where the Batwa were the subjects of eviction following the legislation of protected areas in 1930 by colonial authorities. The urban districts were the sites in which some Batwa had lived after the imposition of a new lifestyle which differs from their hunting and gathering tradition following their eviction from forests. The study sites were purposively selected through the facilitation of gatekeepers namely, local entities. Authorization was sent to the district level which subsequently allowed a team of researchers to approach the sector, the cell and the village levels of administration. At the village level, which is the lowest entity where households of HMP live, respondents were again identified through the help of the Chief of the Village (umudugudu) who served as a gatekeeper.Focus Group Discussions (FGDs) along with direct observation were administered to the members of HMP (formerly referred to as Batwa). The groups comprised individuals who were above the age of 18 years, and were deemed to have experienced hardship as result of socio-economic vulnerability resulting from forest eviction. In-depth interviews were also carried out with officials of selected public institutions, including officials from the National Commission of Unity and Reconciliation and the National Commission of Human Rights. Key informants’ interviews (KIIs) were administered to leaders from NGOs and cooperative societies working towards the promotion of the rights of HMPs. These included one top manager and another who used to among the top managers of Cooperative des Potiers au Rwanda (COPORWA), a local NGO advocating for the rights of Batwa in Rwanda as well as one person who used to be among the leaders of CAURWA (Communauté des Autochtones au Rwanda, translated as Community of Autochthonies in Rwanda). The latter was also among one of the founding pioneers of a local NGO advocating for the rights of the Batwa in Rwanda. A former representative of HMP in Rwanda’s Senate was also contacted for an in-depth interview.All respondents were purposively selected due to their expertise or lived experience on the subject of self-identity and non-discrimination. Key informants from COPORWA, and a representative of the HMP in the Rwandan Senate and authorities from the government were to provide information on convergences or divergences on the phenomenon under investigation.In total, 226 respondents divided into four categories were approached for feedback. These were 220 heads of households from HMP for FGDs and direct observation; 3 leaders from COPORWA for in-depth interviews; 1 ex-Senator representing HMP in Rwanda Senate for an in-depth interview including 2 authorities from governmental institutions. The aim of using different tools for different respondents was to not only get a wide range of perceptions on the subject matter of self-identity and non-discrimination under investigation, but to enable the triangulation of information. FGDs along with direct observation facilitated the exploration of opinions and observation of behaviour and body language of the respondents when a sensitive issue, such as discrimination, was mentioned. As ethical consideration, all respondents were requested for their consent prior to data collection. All interviews were guided by the principle of ‘theoretical saturation’, which consists of administering inquiry until respondents start to repeat themselvesTo meet the reliability and validity of data, some measures were taken. Meetings were held every morning to plan for the day and every evening to evaluate the day spent in the field. For each day of data collection, the data collectors gave a daily report highlighting the progress made and any special information relating to the subject matter under investigation, which was observed from the field. The study used thematic analysis embedded in a deductive approach guided by the human rights-based approach in which two variables of self-identity and non-discrimination were the focus of study. The human rights-based approach facilitated generating data around themes related to self-identity and non-discrimination.In short, findings around the Batwa’s rights to self-identity and to non-discrimination indicated different information over the two variables. On the self-identity, findings indicated that the identity of the Batwa has been shifting because of socio-cultural dynamics affecting the contexts in which they find themselves and live. For example, the name “HMP” which conflate all vulnerable groups in Rwanda provides divergent views for respondents. For ordinary respondents from the Batwa, the name provides a negative profile while for the elites from Batwa the name means obscuring their problems since it disconnects from other indigenous people across Africa and the World. For respondents from the GoR, the name means upholding unit and reconciliation. Findings from the data indicated also that the identity Batwa has been characterised with negative profile of someone who is the poorest, dirty, indigent because of their lowest social status resulting from non-dominant context. This reality corroborates other recent studies that the identity of the Batwa does not have a fixed boundary.On the variable of non-discrimination, findings from the data indicated that negative profiles mentioned above are forms of indirect discrimination resulting from microaggressions and stereotypes. For further information how to use the dataset kindly contact the correspondent author at: ndikubwimana.genbattista@gmail.com, tel: (+250)788 751 225

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/

Geonames - All Cities with a population > 1000

Explore at:
15 scholarly articles cite this dataset (View in Google Scholar)
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name

Search
Clear search
Close search
Google apps
Main menu