56 datasets found
  1. World Bank: GHNP Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). World Bank: GHNP Data [Dataset]. https://www.kaggle.com/theworldbank/world-bank-health-population
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    Authors
    World Bank
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

    Content

    This dataset combines key health statistics from a variety of sources to provide a look at global health and population trends. It includes information on nutrition, reproductive health, education, immunization, and diseases from over 200 countries.

    Update Frequency: Biannual

    For more information, see the World Bank website.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics

    https://cloud.google.com/bigquery/public-data/world-bank-hnp

    Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Citation: The World Bank: Health Nutrition and Population Statistics

    Banner Photo by @till_indeman from Unplash.

    Inspiration

    What’s the average age of first marriages for females around the world?

  2. World Bank: Education Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    Authors
    World Bank
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

    Content

    This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

    For more information, see the World Bank website.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

    http://data.worldbank.org/data-catalog/ed-stats

    https://cloud.google.com/bigquery/public-data/world-bank-education

    Citation: The World Bank: Education Statistics

    Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @till_indeman from Unplash.

    Inspiration

    Of total government spending, what percentage is spent on education?

  3. C

    China CN: Internet Service: No of Domain: ORG

    • ceicdata.com
    Updated Feb 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). China CN: Internet Service: No of Domain: ORG [Dataset]. https://www.ceicdata.com/en/china/internet-number-of-domain-and-website/cn-internet-service-no-of-domain-org
    Explore at:
    Dataset updated
    Feb 15, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2017 - Dec 1, 2024
    Area covered
    China
    Variables measured
    Internet Statistics
    Description

    China Internet Service: Number of Domain: ORG data was reported at 0.023 Unit mn in Dec 2024. This records a decrease from the previous number of 0.026 Unit mn for Jun 2024. China Internet Service: Number of Domain: ORG data is updated semiannually, averaging 0.128 Unit mn from Dec 2005 (Median) to Dec 2024, with 35 observations. The data reached an all-time high of 0.398 Unit mn in Dec 2015 and a record low of 0.023 Unit mn in Dec 2024. China Internet Service: Number of Domain: ORG data remains active status in CEIC and is reported by China Internet Network Information Center. The data is categorized under China Premium Database’s Information and Communication Sector – Table CN.ICE: Internet: Number of Domain and Website.

  4. E

    World Sites (TimeMap Sample Dataset)

    • ecaidata.org
    Updated Oct 4, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECAI Clearinghouse (2014). World Sites (TimeMap Sample Dataset) [Dataset]. https://ecaidata.org/dataset/ecaiclearinghouse-id-12
    Explore at:
    Dataset updated
    Oct 4, 2014
    Dataset provided by
    ECAI Clearinghouse
    Area covered
    World
    Description

    Initial data source was UNESCO web site, supplemented by individual work on different countires/regions;A database of cultural heritage sites assembled by volunteers at the Archaeological Computing Laboratory, University of Sydney

  5. Data from: Exploring the Dominance of the English Language on the Websites...

    • zenodo.org
    • data.niaid.nih.gov
    bin, xls
    Updated Mar 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giannakoulopoulos Andreas; Pergantis Minas; Konstantinou Nikos; Lamprogeorgos Aristeidis; Limniati Laida; Varlamis Iraklis; Giannakoulopoulos Andreas; Pergantis Minas; Konstantinou Nikos; Lamprogeorgos Aristeidis; Limniati Laida; Varlamis Iraklis (2020). Exploring the Dominance of the English Language on the Websites of EU Countries [Dataset]. http://doi.org/10.5281/zenodo.3698008
    Explore at:
    xls, binAvailable download formats
    Dataset updated
    Mar 5, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Giannakoulopoulos Andreas; Pergantis Minas; Konstantinou Nikos; Lamprogeorgos Aristeidis; Limniati Laida; Varlamis Iraklis; Giannakoulopoulos Andreas; Pergantis Minas; Konstantinou Nikos; Lamprogeorgos Aristeidis; Limniati Laida; Varlamis Iraklis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    European Union
    Description

    This Dataset, in 29 files of xlsx format, contains the data of all metrics and accumulated information as they are described in the methodology, results and discussion section of the research article "Exploring the Dominance of the English Language on the Websites of EU Countries".

  6. A global database for the distributions of crop wild relatives

    • gbif.org
    • researchdata.edu.au
    • +1more
    Updated Feb 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crop Wild Relatives Occurrence data consortia; Crop Wild Relatives Occurrence data consortia (2024). A global database for the distributions of crop wild relatives [Dataset]. http://doi.org/10.15468/jyrthk
    Explore at:
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    International Center for Tropical Agriculturehttps://alliancebioversityciat.org/
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Authors
    Crop Wild Relatives Occurrence data consortia; Crop Wild Relatives Occurrence data consortia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This dataset originally held 5 647 442 total records, where 34% of the records corresponded to germplasm accessions and 66% to herbarium samples. A total of 3 231 286 records had cross-checked coordinates (see Figure 2). 322 735 records were newly georeferenced using The Google Geocoding API and 15 713 new records were obtained after digitizing the information contained in herbaria specimens. Data was gathered from more than 100 data providers, including GBIF (a comprehensive list of institutions and individuals is available here: http://www.cwrdiversity.org/data-sources/ ).

    The geographic coverage of the dataset includes 96% of the world countries and also includes records of cultivated plants (1/3 of the dataset). Records of the crop wild relatives of 80 crop gene pools can be queried and visualized in this interactive map: http://www.cwrdiversity.org/distribution-map/

    This dataset was assembled as part of the project ‘Adapting Agriculture to Climate Change: Collecting, Protecting and Preparing Crop Wild Relatives’, which is supported by the Government of Norway. The project is managed by the Global Crop Diversity Trust and the Millennium Seed Bank of the Royal Botanic Gardens, Kew, and implemented in partnership with national and international genebanks and plant breeding institutes around the world. For further information, please refer to the project website: http://www.cwrdiversity.org/

    For publication to GBIF, all records originally gathered from GBIF have been removed to avoid data duplication.

    Citation: Crop Wild Relatives Occurrence data consortia ([year]). A global database for the distributions of crop wild relatives. Centro Internacional de Agricultura Tropical (CIAT). Occurrence dataset https://doi.org/10.15468/jyrthk accessed via GBIF.org on [date].

  7. s

    Data from: World Database on Protected Areas

    • fsm-data.sprep.org
    • pacificdata.org
    • +13more
    geojson, html, jpeg +3
    Updated Feb 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UN Environment World Conservation Monitoring Centre (UNEP-WCMC) (2022). World Database on Protected Areas [Dataset]. https://fsm-data.sprep.org/dataset/world-database-protected-areas
    Explore at:
    html, jpeg, pdf, zip, geojson, websiteAvailable download formats
    Dataset updated
    Feb 15, 2022
    Dataset provided by
    The Nature Conservancy
    Authors
    UN Environment World Conservation Monitoring Centre (UNEP-WCMC)
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Area covered
    164.23324584961 4.7844689665794, 155.88363647461 0.043945308191358, 154.38949584961 0.39550467153202, 136.54769897461 7.3188817303668, 153.42269897461 9.9255659124055, 152.98324584961 3.995780512963, 139.71176147461 11.135287077054)), 162.91488647461 6.1842461612806, POLYGON ((136.54769897461 10.531020008465, 142.61215209961 5.5722498011139, Federated States of Micronesia
    Description

    The World Database on Protected Areas (WDPA) is the most comprehensive global database of marine and terrestrial protected areas, updated on a monthly basis, and is one of the key global biodiversity data sets being widely used by scientists, businesses, governments, International secretariats and others to inform planning, policy decisions and management. The WDPA is a joint project between UN Environment and the International Union for Conservation of Nature (IUCN). The compilation and management of the WDPA is carried out by UN Environment World Conservation Monitoring Centre (UNEP-WCMC), in collaboration with governments, non-governmental organisations, academia and industry. There are monthly updates of the data which are made available online through the Protected Planet website where the data is both viewable and downloadable. Data and information on the world's protected areas compiled in the WDPA are used for reporting to the Convention on Biological Diversity on progress towards reaching the Aichi Biodiversity Targets (particularly Target 11), to the UN to track progress towards the 2030 Sustainable Development Goals, to some of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) core indicators, and other international assessments and reports including the Global Biodiversity Outlook, as well as for the publication of the United Nations List of Protected Areas. Every two years, UNEP-WCMC releases the Protected Planet Report on the status of the world's protected areas and recommendations on how to meet international goals and targets. Many platforms are incorporating the WDPA to provide integrated information to diverse users, including businesses and governments, in a range of sectors including mining, oil and gas, and finance. For example, the WDPA is included in the Integrated Biodiversity Assessment Tool, an innovative decision support tool that gives users easy access to up-to-date information that allows them to identify biodiversity risks and opportunities within a project boundary. The reach of the WDPA is further enhanced in services developed by other parties, such as the Global Forest Watch and the Digital Observatory for Protected Areas, which provide decision makers with access to monitoring and alert systems that allow whole landscapes to be managed better. Together, these applications of the WDPA demonstrate the growing value and significance of the Protected Planet initiative.

  8. s

    Data from: Ramsar Sites

    • pacific-data.sprep.org
    • pacificdata.org
    • +1more
    pdf
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PNG Conservation and Environment Protection Authority (2025). Ramsar Sites [Dataset]. https://pacific-data.sprep.org/dataset/ramsar-sites
    Explore at:
    pdf(115614), pdf(15018951)Available download formats
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    PNG Conservation and Environment Protection Authority
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Area covered
    Papua New Guinea
    Description

    Ramsar and wetlands

  9. i

    Building a DGA Classifier: Part 1, Data Preparation

    • impactcybertrust.org
    • search.datacite.org
    Updated Jan 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    External Data Source (2019). Building a DGA Classifier: Part 1, Data Preparation [Dataset]. http://doi.org/10.23721/100/1478811
    Explore at:
    Dataset updated
    Jan 28, 2019
    Authors
    External Data Source
    Description

    The purpose of building aDGAclassifier isn't specifically for takedowns of botnets, but to discover and detect the use on our network or services. If we can you have a list of domains resolved and accessed at your organization, it is possible now to see which of those are potentially generated and used bymalware.

    The dataset consists of three sources (as decribed in the Data-Driven Security blog):

    Alexa: For samples of legitimate domains, an obvious choice is to go to the Alexa list of top web sites. But it's not ready for our use as is. If you grab thetop 1 Million Alexa domainsand parse it, you'll find just over 11 thousand are full URLs and not just domains, and there are thousands of domains with subdomains that don't help us (we are only classifying on domains here). So after I remove the URLs, de-duplicated the domains and clean it up, I end up with the Alexa top965,843.

    "Real World" Data fromOpenDNS: After reading the post from Frank Denis at OpenDNS titled"Why Using Real World Data Matters For Building Effective Security Models", I grabbed their10,000 Top Domainsand their10,000 Random samples. If we compare that to the top Alexa domains, 6,901 of the top ten thousand are in the alexa data and 893 of the random domains are in the Alexa data. I will clean that up as I make the final training dataset.

    DGAdo: The Click Security version wasn't very clear in where they got their bad domains so I decided to collect my own and this was rather fun. Because I work with some interesting characters (who know interesting characters), I was able to collect several data sets from recent botnets: "Cryptolocker", two seperate "Game-Over Zues" algorithms, and an anonymous collection of malicious (and algorithmically generated) domains. In the end, I was able to collect 73,598 algorithmically generateddomains.
    ;

  10. C

    China CN: Internet Service: No of Website: ORG

    • ceicdata.com
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2024). China CN: Internet Service: No of Website: ORG [Dataset]. https://www.ceicdata.com/en/china/internet-number-of-domain-and-website/cn-internet-service-no-of-website-org
    Explore at:
    Dataset updated
    Dec 15, 2024
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2005 - Dec 1, 2008
    Area covered
    China
    Variables measured
    Internet Statistics
    Description

    China Internet Service: Number of Website: ORG data was reported at 0.021 Unit mn in Dec 2008. This records an increase from the previous number of 0.017 Unit mn for Jun 2008. China Internet Service: Number of Website: ORG data is updated semiannually, averaging 0.017 Unit mn from Dec 2005 (Median) to Dec 2008, with 7 observations. The data reached an all-time high of 0.021 Unit mn in Dec 2008 and a record low of 0.009 Unit mn in Dec 2007. China Internet Service: Number of Website: ORG data remains active status in CEIC and is reported by China Internet Network Information Center. The data is categorized under China Premium Database’s Information and Communication Sector – Table CN.ICE: Internet: Number of Domain and Website.

  11. Share of global mobile website traffic 2015-2025

    • statista.com
    • tokrwards.com
    • +1more
    Updated Sep 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of global mobile website traffic 2015-2025 [Dataset]. https://www.statista.com/statistics/277125/share-of-website-traffic-coming-from-mobile-devices/
    Explore at:
    Dataset updated
    Sep 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In the second quarter of 2025, mobile devices (excluding tablets) accounted for 62.54 percent of global website traffic. Since consistently maintaining a share of around 50 percent beginning in 2017, mobile usage surpassed this threshold in 2020 and has demonstrated steady growth in its dominance of global web access. Mobile traffic Due to low infrastructure and financial restraints, many emerging digital markets skipped the desktop internet phase entirely and moved straight onto mobile internet via smartphone and tablet devices. India is a prime example of a market with a significant mobile-first online population. Other countries with a significant share of mobile internet traffic include Nigeria, Ghana and Kenya. In most African markets, mobile accounts for more than half of the web traffic. By contrast, mobile only makes up around 45.49 percent of online traffic in the United States. Mobile usage The most popular mobile internet activities worldwide include watching movies or videos online, e-mail usage and accessing social media. Apps are a very popular way to watch video on the go and the most-downloaded entertainment apps in the Apple App Store are Netflix, Tencent Video and Amazon Prime Video.

  12. F

    Internet users for the United States

    • fred.stlouisfed.org
    json
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Internet users for the United States [Dataset]. https://fred.stlouisfed.org/series/ITNETUSERP2USA
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 8, 2025
    License

    https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain

    Area covered
    United States
    Description

    Graph and download economic data for Internet users for the United States (ITNETUSERP2USA) from 1990 to 2023 about internet, persons, and USA.

  13. DCASE 2023 Challenge Task 2 Development Dataset

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated May 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kota Dohi; Kota Dohi; Keisuke Imoto; Keisuke Imoto; Noboru Harada; Noboru Harada; Daisuke Niizumi; Daisuke Niizumi; Yuma Koizumi; Yuma Koizumi; Tomoya Nishida; Harsh Purohit; Takashi Endo; Yohei Kawaguchi; Yohei Kawaguchi; Tomoya Nishida; Harsh Purohit; Takashi Endo (2023). DCASE 2023 Challenge Task 2 Development Dataset [Dataset]. http://doi.org/10.5281/zenodo.7882613
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 3, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kota Dohi; Kota Dohi; Keisuke Imoto; Keisuke Imoto; Noboru Harada; Noboru Harada; Daisuke Niizumi; Daisuke Niizumi; Yuma Koizumi; Yuma Koizumi; Tomoya Nishida; Harsh Purohit; Takashi Endo; Yohei Kawaguchi; Yohei Kawaguchi; Tomoya Nishida; Harsh Purohit; Takashi Endo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This dataset is the "development dataset" for the DCASE 2023 Challenge Task 2 "First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring".

    The data consists of the normal/anomalous operating sounds of seven types of real/toy machines. Each recording is a single-channel 10-second audio that includes both a machine's operating sound and environmental noise. The following seven types of real/toy machines are used in this task:

    • ToyCar
    • ToyTrain
    • Fan
    • Gearbox
    • Bearing
    • Slide rail
    • Valve

    Overview of the task

    Anomalous sound detection (ASD) is the task of identifying whether the sound emitted from a target machine is normal or anomalous. Automatic detection of mechanical failure is an essential technology in the fourth industrial revolution, which involves artificial-intelligence-based factory automation. Prompt detection of machine anomalies by observing sounds is useful for monitoring the condition of machines.

    This task is the follow-up from DCASE 2020 Task 2 to DCASE 2022 Task 2. The task this year is to develop an ASD system that meets the following four requirements.

    1. Train a model using only normal sound (unsupervised learning scenario)

    Because anomalies rarely occur and are highly diverse in real-world factories, it can be difficult to collect exhaustive patterns of anomalous sounds. Therefore, the system must detect unknown types of anomalous sounds that are not provided in the training data. This is the same requirement as in the previous tasks.

    2. Detect anomalies regardless of domain shifts (domain generalization task)

    In real-world cases, the operational states of a machine or the environmental noise can change to cause domain shifts. Domain-generalization techniques can be useful for handling domain shifts that occur frequently or are hard-to-notice. In this task, the system is required to use domain-generalization techniques for handling these domain shifts. This requirement is the same as in DCASE 2022 Task 2.

    3. Train a model for a completely new machine type

    For a completely new machine type, hyperparameters of the trained model cannot be tuned. Therefore, the system should have the ability to train models without additional hyperparameter tuning.

    4. Train a model using only one machine from its machine type

    While sounds from multiple machines of the same machine type can be used to enhance detection performance, it is often the case that sound data from only one machine are available for a machine type. In such a case, the system should be able to train models using only one machine from a machine type.

    The last two requirements are newly introduced in DCASE 2023 Task2 as the "first-shot problem".

    Definition

    We first define key terms in this task: "machine type," "section," "source domain," "target domain," and "attributes.".

    • "Machine type" indicates the type of machine, which in the development dataset is one of seven: fan, gearbox, bearing, slide rail, valve, ToyCar, and ToyTrain.
    • A section is defined as a subset of the dataset for calculating performance metrics.
    • The source domain is the domain under which most of the training data and some of the test data were recorded, and the target domain is a different set of domains under which some of the training data and some of the test data were recorded. There are differences between the source and target domains in terms of operating speed, machine load, viscosity, heating temperature, type of environmental noise, signal-to-noise ratio, etc.
    • Attributes are parameters that define states of machines or types of noise.

    Dataset

    This dataset consists of seven machine types. For each machine type, one section is provided, and the section is a complete set of training and test data. For each section, this dataset provides (i) 990 clips of normal sounds in the source domain for training, (ii) ten clips of normal sounds in the target domain for training, and (iii) 100 clips each of normal and anomalous sounds for the test. The source/target domain of each sample is provided. Additionally, the attributes of each sample in the training and test data are provided in the file names and attribute csv files.

    File names and attribute csv files

    File names and attribute csv files provide reference labels for each clip. The given reference labels for each training/test clip include machine type, section index, normal/anomaly information, and attributes regarding the condition other than normal/anomaly. The machine type is given by the directory name. The section index is given by their respective file names. For the datasets other than the evaluation dataset, the normal/anomaly information and the attributes are given by their respective file names. Attribute csv files are for easy access to attributes that cause domain shifts. In these files, the file names, name of parameters that cause domain shifts (domain shift parameter, dp), and the value or type of these parameters (domain shift value, dv) are listed. Each row takes the following format:

    [filename (string)], [d1p (string)], [d1v (int | float | string)], [d2p], [d2v]...

    Recording procedure

    Normal/anomalous operating sounds of machines and its related equipment are recorded. Anomalous sounds were collected by deliberately damaging target machines. For simplifying the task, we use only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. We will publish papers on the dataset to explain the details of the recording procedure by the submission deadline.

    Directory structure

    - /dev_data

    - /raw
    - /fan
    - /train (only normal clips)
    - /section_00_source_train_normal_0000_

    Baseline system

    The baseline system is available on the Github repository dcase2023_task2_baseline_ae.The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.

    Condition of use

    This dataset was created jointly by Hitachi, Ltd. and NTT Corporation and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

    Citation

    If you use this dataset, please cite all the following papers. We will publish a paper on the description of the DCASE 2023 Task 2, so pleasure make sure to cite the paper, too.

    • Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi, and Masahiro Yasuda. First-shot anomaly detection for machine condition monitoring: A domain generalization baseline. In arXiv e-prints: 2303.00455, 2023. [URL]
    • Kota Dohi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Masaaki Yamamoto, Yuki Nikaido, and Yohei Kawaguchi. MIMII DG: sound dataset for malfunctioning industrial machine investigation and inspection for domain generalization task. In Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022), 31-35. Nancy, France, November 2022, . [URL]
    • Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, and Shoichiro Saito. ToyADMOS2: another dataset of miniature-machine operating sounds for

  14. I

    World Heritage Site List

    • ihp-wins.unesco.org
    csv
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). World Heritage Site List [Dataset]. https://ihp-wins.unesco.org/dataset/unesco-world-heritage-sites
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 24, 2025
    License

    http://www.opendefinition.org/licenses/cc-by-sahttp://www.opendefinition.org/licenses/cc-by-sa

    Area covered
    World
    Description

    The World Heritage List includes 1248 properties forming part of the cultural and natural heritage which the World Heritage Committee considers as having outstanding universal value.

    These include 972 cultural, 235 natural and 41 mixed properties in 170 States Parties. As of October 2024, 196 States Parties have ratified the World Heritage Convention.

  15. i

    Demonstrating Data-to-Knowledge Pipelines for Connecting Production Sites in...

    • ieee-dataport.org
    Updated Sep 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leon Gorissen (2025). Demonstrating Data-to-Knowledge Pipelines for Connecting Production Sites in the World Wide Lab: Trajectory Data and Benchmark Models [Dataset]. https://ieee-dataport.org/documents/demonstrating-data-knowledge-pipelines-connecting-production-sites-world-wide-lab
    Explore at:
    Dataset updated
    Sep 9, 2025
    Authors
    Leon Gorissen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    check the project website for the code repository link. In the folder test you can find data used for model evaluation or testing. Metadata must be derived from the metadata_dump_test.json. In the folder train you can find data used for model training and cross validation.

  16. World Bank: International Debt Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). World Bank: International Debt Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-debt
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    Authors
    World Bank
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

    Content

    This dataset contains both national and regional debt statistics captured by over 200 economic indicators. Time series data is available for those indicators from 1970 to 2015 for reporting countries.

    For more information, see the World Bank website.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_intl_debt

    https://cloud.google.com/bigquery/public-data/world-bank-international-debt

    Citation: The World Bank: International Debt Statistics

    Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @till_indeman from Unplash.

    Inspiration

    What countries have the largest outstanding debt?

    https://cloud.google.com/bigquery/images/outstanding-debt.png" alt="enter image description here"> https://cloud.google.com/bigquery/images/outstanding-debt.png

  17. Data from: Harmonized chronologies of a global late Quaternary pollen...

    • doi.pangaea.de
    • service.tib.eu
    html, tsv
    Updated Jun 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chenzhi Li; Alexander Postl; Thomas Böhmer; Andrew M Dolman; Ulrike Herzschuh (2021). Harmonized chronologies of a global late Quaternary pollen dataset (LegacyAge 1.0) [Dataset]. http://doi.org/10.1594/PANGAEA.933132
    Explore at:
    tsv, htmlAvailable download formats
    Dataset updated
    Jun 28, 2021
    Dataset provided by
    PANGAEA
    Authors
    Chenzhi Li; Alexander Postl; Thomas Böhmer; Andrew M Dolman; Ulrike Herzschuh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 26, 1938 - Mar 18, 2014
    Area covered
    Variables measured
    Site, Type, LATITUDE, Continent, ELEVATION, LONGITUDE, Replicates, Description, Event label, Location type, and 6 more
    Description

    This dataset presents global revised age models for taxonomically harmonized fossil pollen records. The age-depth models were established from mostly Intcal20-calibrated radiocarbon datings with a predefined parameter setting. 1032 sites are located in North America, 1075 sites in Europe, 488 sites in Asia. In the Southern Hemisphere, there are 150 sites in South America, 54 in Africa, and 32 in the Indopacific region. Datings, mostly C14, were retrieved from the Neotoma Paleoecology Database (https://www.neotomadb.org/), with additional data from Cao et al. (2020; https://doi.org/10.5194/essd-12-119-2020), Cao et al. (2013, https://doi.org/10.1016/j.revpalbo.2013.02.003) and our own collection. The related age records were revised by applying a similar approach, i.e., using the Bayesian age-depth modeling routine in R-BACON software. […]

  18. Data cleaning using unstructured data

    • zenodo.org
    zip
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rihem Nasfi; Rihem Nasfi; Antoon Bronselaer; Antoon Bronselaer (2024). Data cleaning using unstructured data [Dataset]. http://doi.org/10.5281/zenodo.13135983
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rihem Nasfi; Rihem Nasfi; Antoon Bronselaer; Antoon Bronselaer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this project, we work on repairing three datasets:

    • Trials design: This dataset was obtained from the European Union Drug Regulating Authorities Clinical Trials Database (EudraCT) register and the ground truth was created from external registries. In the dataset, multiple countries, identified by the attribute country_protocol_code, conduct the same clinical trials which is identified by eudract_number. Each clinical trial has a title that can help find informative details about the design of the trial.
    • Trials population: This dataset delineates the demographic origins of participants in clinical trials primarily conducted across European countries. This dataset include structured attributes indicating whether the trial pertains to a specific gender, age group or healthy volunteers. Each of these categories is labeled as (`1') or (`0') respectively denoting whether it is included in the trials or not. It is important to note that the population category should remain consistent across all countries conducting the same clinical trial identified by an eudract_number. The ground truth samples in the dataset were established by aligning information about the trial populations provided by external registries, specifically the CT.gov database and the German Trials database. Additionally, the dataset comprises other unstructured attributes that categorize the inclusion criteria for trial participants such as inclusion.
    • Allergens: This dataset contains information about products and their allergens. The data was collected from the German version of the `Alnatura' (Access date: 24 November, 2020), a free database of food products from around the world `Open Food Facts', and the websites: `Migipedia', 'Piccantino', and `Das Ist Drin'. There may be overlapping products across these websites. Each product in the dataset is identified by a unique code. Samples with the same code represent the same product but are extracted from a differentb source. The allergens are indicated by (‘2’) if present, or (‘1’) if there are traces of it, and (‘0’) if it is absent in a product. The dataset also includes information on ingredients in the products. Overall, the dataset comprises categorical structured data describing the presence, trace, or absence of specific allergens, and unstructured text describing ingredients.

    N.B: Each '.zip' file contains a set of 5 '.csv' files which are part of the afro-mentioned datasets:

    • "{dataset_name}_train.csv": samples used for the ML-model training. (e.g "allergens_train.csv")
    • "{dataset_name}_test.csv": samples used to test the the ML-model performance. (e.g "allergens_test.csv")
    • "{dataset_name}_golden_standard.csv": samples represent the ground truth of the test samples. (e.g "allergens_golden_standard.csv")
    • "{dataset_name}_parker_train.csv": samples repaired using Parker Engine used for the ML-model training. (e.g "allergens_parker_train.csv")
    • "{dataset_name}_parker_train.csv": samples repaired using Parker Engine used to test the the ML-model performance. (e.g "allergens_parker_test.csv")
  19. T

    civil_comments

    • tensorflow.org
    • huggingface.co
    Updated Feb 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). civil_comments [Dataset]. https://www.tensorflow.org/datasets/catalog/civil_comments
    Explore at:
    Dataset updated
    Feb 28, 2023
    Description

    This version of the CivilComments Dataset provides access to the primary seven labels that were annotated by crowd workers, the toxicity and other tags are a value between 0 and 1 indicating the fraction of annotators that assigned these attributes to the comment text.

    The other tags are only available for a fraction of the input examples. They are currently ignored for the main dataset; the CivilCommentsIdentities set includes those labels, but only consists of the subset of the data with them. The other attributes that were part of the original CivilComments release are included only in the raw data. See the Kaggle documentation for more details about the available features.

    The comments in this dataset come from an archive of the Civil Comments platform, a commenting plugin for independent news sites. These public comments were created from 2015 - 2017 and appeared on approximately 50 English-language news sites across the world. When Civil Comments shut down in 2017, they chose to make the public comments available in a lasting open archive to enable future research. The original data, published on figshare, includes the public comment text, some associated metadata such as article IDs, publication IDs, timestamps and commenter-generated "civility" labels, but does not include user ids. Jigsaw extended this dataset by adding additional labels for toxicity, identity mentions, as well as covert offensiveness. This data set is an exact replica of the data released for the Jigsaw Unintended Bias in Toxicity Classification Kaggle challenge. This dataset is released under CC0, as is the underlying comment text.

    For comments that have a parent_id also in the civil comments data, the text of the previous comment is provided as the "parent_text" feature. Note that the splits were made without regard to this information, so using previous comments may leak some information. The annotators did not have access to the parent text when making the labels.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('civil_comments', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  20. d

    NFA 2018 Edition

    • data.world
    csv, zip
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Global Footprint Network (2025). NFA 2018 Edition [Dataset]. https://data.world/footprint/nfa-2018-edition
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Feb 25, 2025
    Authors
    Global Footprint Network
    Time period covered
    1961 - 2014
    Description

    @youtube

    Our National Footprint Accounts (NFAs) measure the ecological resource use and resource capacity of nations from 1961 to 2014. The calculations in the National Footprint Accounts are primarily based on United Nations data sets, including those published by the Food and Agriculture Organization, United Nations Commodity Trade Statistics Database, and the UN Statistics Division, as well as the International Energy Agency. The 2018 edition of the NFA features some exciting updates from last year’s 2017 edition, including data for more countries and improved data sources and methodology. Methodology changes:

    1. Our conversion of carbon to CO2 increased in precision, which increased the world’s carbon footprint by approximately 1%.
    2. We implemented a new data quality scoring system. This allowed us to publish data for more countries by omitting unreliable data for some years rather than the entire country’s Ecological Footprint timeline.
    3. We used more precise data from the Global Carbon Project to calculate ocean carbon sequestration rates for 2014.

    National Footprint Accounts 2018 Edition

    To visualize our data in our data explorer click here. Dataset provides Ecological Footprint per capita data for years 1961-2014 in global hectares (gha). Ecological Footprint is a measure of how much area of biologically productive land and water an individual, population, or activity requires to produce all the resources it consumes and to absorb the waste it generates, using prevailing technology and resource management practices. The Ecological Footprint is measured in global hectares. Since trade is global, an individual or country's Footprint tracks area from all over the world. Without further specification, Ecological Footprint generally refers to the Ecological Footprint of consumption (rather than only production or export). Ecological Footprint is often referred to in short form as Footprint.

    About this Dataset

    This data includes total and per capita national biocapacity, ecological footprint of consumption, ecological footprint of production and total area in hectares. This dataset, however, does not include any of our yield factors (national or world) nor any equivalence factors. To view these click here.

    Objectives

    Revealing links between human consumption and other human behaviors, geographic characteristics, political landscapes,

    Get involved

    How can others contribute? - [ ] Join this table on other data.world datasets (prefereably country-level data) - [ ] Write queries - [ ] Create graphics - [ ] Post and share discoveries

    External resources

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
World Bank (2019). World Bank: GHNP Data [Dataset]. https://www.kaggle.com/theworldbank/world-bank-health-population
Organization logo

World Bank: GHNP Data

World Bank: Global Health, Nutrition, and Population Data (BigQuery Dataset)

Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset provided by
World Bank Grouphttp://www.worldbank.org/
Authors
World Bank
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

Content

This dataset combines key health statistics from a variety of sources to provide a look at global health and population trends. It includes information on nutrition, reproductive health, education, immunization, and diseases from over 200 countries.

Update Frequency: Biannual

For more information, see the World Bank website.

Fork this kernel to get started with this dataset.

Acknowledgements

https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics

https://cloud.google.com/bigquery/public-data/world-bank-hnp

Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Citation: The World Bank: Health Nutrition and Population Statistics

Banner Photo by @till_indeman from Unplash.

Inspiration

What’s the average age of first marriages for females around the world?

Search
Clear search
Close search
Google apps
Main menu