100+ datasets found
  1. World Bank: Education Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

    Content

    This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

    For more information, see the World Bank website.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

    http://data.worldbank.org/data-catalog/ed-stats

    https://cloud.google.com/bigquery/public-data/world-bank-education

    Citation: The World Bank: Education Statistics

    Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @till_indeman from Unplash.

    Inspiration

    Of total government spending, what percentage is spent on education?

  2. MovieLens full 25-million recommendation data 🎬

    • kaggle.com
    Updated Apr 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    iulia (2023). MovieLens full 25-million recommendation data 🎬 [Dataset]. https://www.kaggle.com/datasets/patriciabrezeanu/movielens-full-25-million-recommendation-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 15, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    iulia
    Description

    Summary This dataset (ml-25m) describes a 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 25000095 ratings and 1093360 tag applications across 62423 movies. These data were created by 162541 users between January 09, 1995, and November 21, 2019. This dataset was generated on November 21, 2019. Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided. The data are contained in the files genome-scores.csv, genome-tags.csv, links.csv, movies.csv, ratings.csv, and tags.csv. More details about the contents and use of all these files follow. This and other GroupLens data sets are publicly available for download at

  3. Data from: ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

    • zenodo.org
    • data.niaid.nih.gov
    csv, zip
    Updated Jan 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hossein Keshavarz; Hossein Keshavarz; Meiyappan Nagappan; Meiyappan Nagappan (2022). ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction [Dataset]. http://doi.org/10.5281/zenodo.5907002
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jan 27, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hossein Keshavarz; Hossein Keshavarz; Meiyappan Nagappan; Meiyappan Nagappan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction
    
    This archive contains the ApacheJIT dataset presented in the paper "ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction" as well as the replication package. The paper is submitted to MSR 2022 Data Showcase Track.
    
    The datasets are available under directory dataset. There are 4 datasets in this directory. 
    
    1. apachejit_total.csv: This file contains the entire dataset. Commits are specified by their identifier and a set of commit metrics that are explained in the paper are provided as features. Column buggy specifies whether or not the commit introduced any bug into the system. 
    2. apachejit_train.csv: This file is a subset of the entire dataset. It provides a balanced set that we recommend for models that are sensitive to class imbalance. This set is obtained from the first 14 years of data (2003 to 2016).
    3. apachejit_test_large.csv: This file is a subset of the entire dataset. The commits in this file are the commits from the last 3 years of data. This set is not balanced to represent a real-life scenario in a JIT model evaluation where the model is trained on historical data to be applied on future data without any modification.
    4. apachejit_test_small.csv: This file is a subset of the test file explained above. Since the test file has more than 30,000 commits, we also provide a smaller test set which is still unbalanced and from the last 3 years of data.
    
    In addition to the dataset, we also provide the scripts using which we built the dataset. These scripts are written in Python 3.8. Therefore, Python 3.8 or above is required. To set up the environment, we have provided a list of required packages in file requirements.txt. Additionally, one filtering step requires GumTree [1]. For Java, GumTree requires Java 11. For other languages, external tools are needed. Installation guide and more details can be found here.
    
    The scripts are comprised of Python scripts under directory src and Python notebooks under directory notebooks. The Python scripts are mainly responsible for conducting GitHub search via GitHub search API and collecting commits through PyDriller Package [2]. The notebooks link the fixed issue reports with their corresponding fixing commits and apply some filtering steps. The bug-inducing candidates then are filtered again using gumtree.py script that utilizes the GumTree package. Finally, the remaining bug-inducing candidates are combined with the clean commits in the dataset_construction notebook to form the entire dataset.
    
    More specifically, git_token.py handles GitHub API token that is necessary for requests to GitHub API. Script collector.py performs GitHub search. Tracing changed lines and git annotate is done in gitminer.py using PyDriller. Finally, gumtree.py applies 4 filtering steps (number of lines, number of files, language, and change significance).
    
    References:
    
    1. GumTree
    
    * https://github.com/GumTreeDiff/gumtree
    
    Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14,Vasteras, Sweden - September 15 - 19, 2014. 313–324
    
    2. PyDriller
    
    * https://pydriller.readthedocs.io/en/latest/
    
    * Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Lake Buena Vista, FL, USA)(ESEC/FSE2018). Association for Computing Machinery, New York, NY, USA, 908–911
    
    
  4. C

    National Hydrography Data - NHD and 3DHP

    • data.cnra.ca.gov
    • data.ca.gov
    • +3more
    Updated Oct 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Water Resources (2024). National Hydrography Data - NHD and 3DHP [Dataset]. https://data.cnra.ca.gov/dataset/national-hydrography-dataset-nhd
    Explore at:
    pdf(1634485), pdf(9867020), pdf(182651), pdf(3684753), website, pdf(4856863), zip(578260992), pdf, zip(15824984), csv(12977), arcgis geoservices rest api, zip(10029073), zip(1647291), zip(972664), zip(128966494), pdf(1175775), zip(13901824), zip(73817620), zip(4657694), pdf(1436424), zip(39288832)Available download formats
    Dataset updated
    Oct 15, 2024
    Dataset authored and provided by
    California Department of Water Resources
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The USGS National Hydrography Dataset (NHD) Downloadable Data Collection from The National Map (TNM) is a comprehensive set of digital spatial data that encodes information about naturally occurring and constructed bodies of surface water (lakes, ponds, and reservoirs), paths through which water flows (canals, ditches, streams, and rivers), and related entities such as point features (springs, wells, stream gages, and dams). The information encoded about these features includes classification and other characteristics, delineation, geographic name, position and related measures, a "reach code" through which other information can be related to the NHD, and the direction of water flow. The network of reach codes delineating water and transported material flow allows users to trace movement in upstream and downstream directions. In addition to this geographic information, the dataset contains metadata that supports the exchange of future updates and improvements to the data. The NHD supports many applications, such as making maps, geocoding observations, flow modeling, data maintenance, and stewardship. For additional information on NHD, go to https://www.usgs.gov/core-science-systems/ngp/national-hydrography.

    DWR was the steward for NHD and Watershed Boundary Dataset (WBD) in California. We worked with other organizations to edit and improve NHD and WBD, using the business rules for California. California's NHD improvements were sent to USGS for incorporation into the national database. The most up-to-date products are accessible from the USGS website. Please note that the California portion of the National Hydrography Dataset is appropriate for use at the 1:24,000 scale.

    For additional derivative products and resources, including the major features in geopackage format, please go to this page: https://data.cnra.ca.gov/dataset/nhd-major-features Archives of previous statewide extracts of the NHD going back to 2018 may be found at https://data.cnra.ca.gov/dataset/nhd-archive.

    In September 2022, USGS officially notified DWR that the NHD would become static as USGS resources will be devoted to the transition to the new 3D Hydrography Program (3DHP). 3DHP will consist of LiDAR-derived hydrography at a higher resolution than NHD. Upon completion, 3DHP data will be easier to maintain, based on a modern data model and architecture, and better meet the requirements of users that were documented in the Hydrography Requirements and Benefits Study (2016). The initial releases of 3DHP will be the NHD data cross-walked into the 3DHP data model. It will take several years for the 3DHP to be built out for California. Please refer to the resources on this page for more information.

    The FINAL,STATIC version of the National Hydrography Dataset for California was published for download by USGS on December 27, 2023. This dataset can no longer be edited by the state stewards.

    The first public release of the 3D Hydrography Program map service may be accessed at https://hydro.nationalmap.gov/arcgis/rest/services/3DHP_all/MapServer.

    Questions about the California stewardship of these datasets may be directed to nhd_stewardship@water.ca.gov.

  5. T

    criteo

    • tensorflow.org
    Updated Dec 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). criteo [Dataset]. https://www.tensorflow.org/datasets/catalog/criteo
    Explore at:
    Dataset updated
    Dec 22, 2022
    Description

    Criteo Uplift Modeling Dataset

    This dataset is released along with the paper: “A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP)

    This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.

    Data description

    This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising. it consists of 25M rows, each one representing a user with 11 features, a treatment indicator and 2 labels (visits and conversions).

    Fields

    Here is a detailed description of the fields (they are comma-separated in the file):

    • f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
    • treatment: treatment group (1 = treated, 0 = control)
    • conversion: whether a conversion occured for this user (binary, label)
    • visit: whether a visit occured for this user (binary, label)
    • exposure: treatment effect, whether the user has been effectively exposed (binary)

    Key figures

    • Format: CSV
    • Size: 459MB (compressed)
    • Rows: 25,309,483
    • Average Visit Rate: .04132
    • Average Conversion Rate: .00229
    • Treatment Ratio: .846

    Tasks

    The dataset was collected and prepared with uplift prediction in mind as the main task. Additionally we can foresee related usages such as but not limited to:

    • benchmark for causal inference
    • uplift modeling
    • interactions between features and treatment
    • heterogeneity of treatment
    • benchmark for observational causality methods

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('criteo', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  6. b

    An Open-Source Iterative python Module for the Automated Identification of...

    • data.bris.ac.uk
    Updated Apr 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). An Open-Source Iterative python Module for the Automated Identification of Photopeaks in Photon Spectra v2.0 - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/n3cm8fnce5ri2k55dlipee3st
    Explore at:
    Dataset updated
    Apr 22, 2022
    Description

    Code and Isotopic Libraries for the python identification and peak-fitting module. see also An Open-Source Iterative python Module for the Automated Identification of Photopeaks in Photon Spectra v1.0 https://data.bris.ac.uk/data/dataset/28ssj76dp5rx02tf8edowbfe3n Complete download (zip, 170.1 KiB) Alternative title

  7. R

    Data from: Signature Detection Dataset

    • universe.roboflow.com
    zip
    Updated Jun 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    minaehyeon (2024). Signature Detection Dataset [Dataset]. https://universe.roboflow.com/minaehyeon/signature-detection-a75nf/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2024
    Dataset authored and provided by
    minaehyeon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Signature Mh7y Bounding Boxes
    Description

    Signature Detection

    ## Overview
    
    Signature Detection is a dataset for object detection tasks - it contains Signature Mh7y annotations for 206 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  8. e

    Simple download service (Atom) of the dataset: Small agricultural regions in...

    • data.europa.eu
    Updated Apr 21, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Simple download service (Atom) of the dataset: Small agricultural regions in the region of New Aquitaine [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-fc630f8a-203c-42c2-bdf7-c68f0c06feab
    Explore at:
    inspire download serviceAvailable download formats
    Dataset updated
    Apr 21, 2022
    Description

    The Small Agricultural Regions (PRA) are the intersections of the Agricultural Regions with the departments. The Agricultural Regions (RA) are regions with the same dominant agricultural vocation. They cover an entire number of municipalities forming a homogeneous agricultural area. They were delimited by INSEE in 1946. The last update was in 1981.

    WMS and WMS addresses: Warnings — Position on the address, right click, copy the link address and paste into the WMS server connection dialog box, WFS. The use of another method leads to the appearance of parasitic spaces. — Problems with displaying multi polygons via the use of WFS (under resolution) — prefer data download if presence of multi polygons — WFS display of more than 500 objects via the WFS impossible at the moment

    WMS address for integration into a GIS from Geoide_Carto (add layers after another): http://data.geo-ide.application.developpement-durable.gouv.fr/WMS/228/PRA_R75?

    WFS address for GIS integration: http://ogc.geo-ide.developpement-durable.gouv.fr/cartes/mapserv?map=/opt/data/carto/geoide-catalogue/REG072A/JDD.www.map

  9. Dataset metadata of known Dataverse installations

    • search.datacite.org
    • dataverse.harvard.edu
    • +1more
    Updated 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Gautier (2019). Dataset metadata of known Dataverse installations [Dataset]. http://doi.org/10.7910/dvn/dcdkzq
    Explore at:
    Dataset updated
    2019
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Harvard Dataverse
    Authors
    Julian Gautier
    Description

    This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected this data, 36 installations were running versions of the Dataverse software that allow depositors to choose a license or data use agreement from a dropdown menu in the dataset deposit form. For more information, see https://guides.dataverse.org/en/5.11.1/user/dataset-management.html#choosing-a-license. The metadatablocks_from_most_known_dataverse_installations.csv file contains the metadata block names, field names and child field names (if the field is a compound field) of the 77 Dataverse installations' metadata blocks. The metadatablocks_from_most_known_dataverse_installations.csv file is useful for comparing each installation's dataset metadata model (the metadata fields and the metadata blocks that each installation uses). The CSV file was created using a Python script at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_csv_file_with_metadata_block_fields_of_all_installations.py, which takes as inputs the directories and files created by the get_dataset_metadata_of_all_installations.py script. Known errors The metadata of two datasets from one of the known installations could not be downloaded because the datasets' pages and metadata could not be accessed with the Dataverse APIs. About metadata blocks Read about the Dataverse software's metadata blocks system at http://guides.dataverse.org/en/latest/admin/metadatacustomization.html

  10. Raccoon Range - CWHR M153 [ds1936]

    • data.ca.gov
    • data.cnra.ca.gov
    • +5more
    Updated Mar 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2020). Raccoon Range - CWHR M153 [ds1936] [Dataset]. https://data.ca.gov/dataset/raccoon-range-cwhr-m153-ds1936
    Explore at:
    arcgis geoservices rest api, kml, csv, geojson, zip, htmlAvailable download formats
    Dataset updated
    Mar 17, 2020
    Dataset authored and provided by
    California Department of Fish and Wildlifehttps://wildlife.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.

  11. e

    Dataset Direct Download Service (WFS): N_PERIMETRE_PPRN_20140454_S_032

    • data.europa.eu
    Updated Feb 18, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Dataset Direct Download Service (WFS): N_PERIMETRE_PPRN_20140454_S_032 [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-d2d85ece-e42e-45d5-9cc4-30ffc18a45c7
    Explore at:
    Dataset updated
    Feb 18, 2022
    Description

    Perimeter of the Natural Risk Prevention Plan Withdrawal of the Argiles of the commune of Salles-d’Armagnac in the department of Gers. This dataset contains the boundaries at the different stages of the development of the RPP. The characteristic of these perimeters is to be the consequence of an official act and to produce their effects from a specified date. This is the:- prescribed perimeter set out in a PPR’s prescription order;- risk exposure perimeter that corresponds to the perimeter regulated by the approved RPP, this approved perimeter is a utility easement;- study scope that corresponds to the envelope in which the hazards were studied.

  12. Fixed Broadband Deployment Data: December 2020

    • catalog.data.gov
    • opendata.fcc.gov
    • +1more
    Updated Sep 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    opendata.fcc.gov (2023). Fixed Broadband Deployment Data: December 2020 [Dataset]. https://catalog.data.gov/dataset/fixed-broadband-deployment-data-december-2020
    Explore at:
    Dataset updated
    Sep 14, 2023
    Dataset provided by
    Federal Communications Commissionhttp://fcc.gov/
    Description

    The data collected to create this dataset was in place through data as of June, 2021. For more recent broadband availability data, please see https://broadbandmap.fcc.gov; for more information about the related data collection, please see https://www.fcc.gov/BroadbandData. All facilities-based broadband providers are required to file data with the FCC twice a year (Form 477) on where they offer Internet access service at speeds exceeding 200 kbps in at least one direction. Fixed providers file lists of census blocks in which they can or do offer service to at least one location, with additional information about the service. Data Download Page: (https://www.fcc.gov/general/broadband-deployment-data-fcc-form-477. Resources page: https://www.fcc.gov/general/form-477-resources-filers

  13. Sarnet Search And Rescue Dataset

    • universe.roboflow.com
    zip
    Updated Jun 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow Public (2022). Sarnet Search And Rescue Dataset [Dataset]. https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 16, 2022
    Dataset provided by
    Roboflow
    Authors
    Roboflow Public
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    SaR Bounding Boxes
    Description

    Description from the SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery GitHub Repository * The "Note" was added by the Roboflow team.

    Satellite Imagery for Search And Rescue Dataset - ArXiv

    This is a single class dataset consisting of tiles of satellite imagery labeled with potential 'targets'. Labelers were instructed to draw boxes around anything they suspect may a paraglider wing, missing in a remote area of Nevada. Volunteers were shown examples of similar objects already in the environment for comparison. The missing wing, as it was found after 3 weeks, is shown below.

    https://michaeltpublic.s3.amazonaws.com/images/anomaly_small.jpg" alt="anomaly">

    The dataset contains the following:

    SetImagesAnnotations
    Train18083048
    Validate490747
    Test254411
    Total25524206

    The data is in the COCO format, and is directly compatible with faster r-cnn as implemented in Facebook's Detectron2.

    Getting hold of the Data

    Download the data here: sarnet.zip

    Or follow these steps

    # download the dataset
    wget https://michaeltpublic.s3.amazonaws.com/sarnet.zip
    
    # extract the files
    unzip sarnet.zip
    

    ***Note* with Roboflow, you can download the data here** (original, raw images, with annotations): https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue/ (download v1, original_raw-images) * Download the dataset in COCO JSON format, or another format of choice, and import them to Roboflow after unzipping the folder to get started on your project.

    Getting started

    Get started with a Faster R-CNN model pretrained on SaRNet: SaRNet_Demo.ipynb

    Source Code for Paper

    Source code for the paper is located here: SaRNet_train_test.ipynb

    Cite this dataset

    @misc{thoreau2021sarnet,
       title={SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery}, 
       author={Michael Thoreau and Frazer Wilson},
       year={2021},
       eprint={2107.12469},
       archivePrefix={arXiv},
       primaryClass={eess.IV}
    }
    

    Acknowledgment

    The source data was generously provided by Planet Labs, Airbus Defence and Space, and Maxar Technologies.

  14. Google Patents Public Data

    • kaggle.com
    zip
    Updated Sep 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/bigquery/patents
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2018
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

    Content

    The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

    For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

    “Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

    Banner photo by Helloquence on Unsplash

  15. d

    Next Generation Simulation (NGSIM) Vehicle Trajectories and Supporting Data

    • catalog.data.gov
    • data.transportation.gov
    • +5more
    Updated Mar 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Highway Administration (2025). Next Generation Simulation (NGSIM) Vehicle Trajectories and Supporting Data [Dataset]. https://catalog.data.gov/dataset/next-generation-simulation-ngsim-vehicle-trajectories-and-supporting-data
    Explore at:
    Dataset updated
    Mar 16, 2025
    Dataset provided by
    Federal Highway Administration
    Description

    Click “Export” on the right to download the vehicle trajectory data. The associated metadata and additional data can be downloaded below under "Attachments". Researchers for the Next Generation Simulation (NGSIM) program collected detailed vehicle trajectory data on southbound US 101 and Lankershim Boulevard in Los Angeles, CA, eastbound I-80 in Emeryville, CA and Peachtree Street in Atlanta, Georgia. Data was collected through a network of synchronized digital video cameras. NGVIDEO, a customized software application developed for the NGSIM program, transcribed the vehicle trajectory data from the video. This vehicle trajectory data provided the precise location of each vehicle within the study area every one-tenth of a second, resulting in detailed lane positions and locations relative to other vehicles. Click the "Show More" button below to find additional contextual data and metadata for this dataset. For site-specific NGSIM video file datasets, please see the following: - NGSIM I-80 Videos: https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Program-I-80-Vide/2577-gpny - NGSIM US-101 Videos: https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Program-US-101-Vi/4qzi-thur - NGSIM Lankershim Boulevard Videos: https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Program-Lankershi/uv3e-y54k - NGSIM Peachtree Street Videos: https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Program-Peachtree/mupt-aksf

  16. C

    Dataset download service: Administrative Boundary

    • ckan.mobidatalab.eu
    wfs
    Updated Apr 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GeoDatiGovIt RNDT (2023). Dataset download service: Administrative Boundary [Dataset]. https://ckan.mobidatalab.eu/dataset/administrative-boundary-dataset-download-service
    Explore at:
    wfsAvailable download formats
    Dataset updated
    Apr 28, 2023
    Dataset provided by
    GeoDatiGovIt RNDT
    Description

    The level structure is harmonized according to the standards of the INSPIRE 2007/2/EC directive of 14 March 2007 starting from the regional information level "Administrative limits". The Inspire level with linear geometry consists of a hierarchical representation of the three types of administrative area present: 4thOrder (Municipality), 3rdOrder (Province), 2ndOrder (Region). The information level has been updated with a change in the toponymy of the Municipality of Ortonovo in Luni L.R. n.5/2017 and the merger of the geometries of the municipalities of Montalto Ligure and Carpasio into the new municipality Montalto Carpasio L.R. 21/2017.

  17. P

    Strandings of Oceania Database

    • pacificdata.org
    • png-data.sprep.org
    • +14more
    docx, pdf, xls
    Updated Feb 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Secretariat of the Pacific Regional Environment Programme (2022). Strandings of Oceania Database [Dataset]. https://pacificdata.org/data/dataset/strandings-of-oceania-databasec57c9884-597b-4fce-8532-4c27f63791df
    Explore at:
    xls, pdf, docxAvailable download formats
    Dataset updated
    Feb 11, 2022
    Dataset provided by
    Secretariat of the Pacific Regional Environment Programme
    License

    https://pacific-data.sprep.org/dataset/data-portal-license-agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588https://pacific-data.sprep.org/dataset/data-portal-license-agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588

    Description

    The Strandings of Oceania database is a collaborative project between SPREP, WildMe and the South Pacific Whale Research Consortium to record stranding and beachcast data for whales, dolphins and dugongs throughout the Pacific. We use a platform called Flukebook. An account is needed to view or use data within Flukebook but the data is available for download here. You can submit data direct into Flukebook (preferably while logged in) or send a completed data form to SPREP for upload. Guidance on using the database is available :

  18. Stock Market Dataset

    • kaggle.com
    zip
    Updated Apr 2, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oleh Onyshchak (2020). Stock Market Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/1054465
    Explore at:
    zip(547714524 bytes)Available download formats
    Dataset updated
    Apr 2, 2020
    Authors
    Oleh Onyshchak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview

    This dataset contains historical daily prices for all tickers currently trading on NASDAQ. The up to date list is available from nasdaqtrader.com. The historic data is retrieved from Yahoo finance via yfinance python package.

    It contains prices for up to 01 of April 2020. If you need more up to date data, just fork and re-run data collection script also available from Kaggle.

    Data Structure

    The date for every symbol is saved in CSV format with common fields:

    • Date - specifies trading date
    • Open - opening price
    • High - maximum price during the day
    • Low - minimum price during the day
    • Close - close price adjusted for splits
    • Adj Close - adjusted close price adjusted for both dividends and splits.
    • Volume - the number of shares that changed hands during a given day

    All that ticker data is then stored in either ETFs or stocks folder, depending on a type. Moreover, each filename is the corresponding ticker symbol. At last, symbols_valid_meta.csv contains some additional metadata for each ticker such as full name.

  19. e

    Inspire Download Service (predefined ATOM) for dataset operations under...

    • data.europa.eu
    atom feed
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministerium für Umwelt, Klima, Mobilität, Agrar und Verbraucherschutz, Inspire Download Service (predefined ATOM) for dataset operations under mountain control in Saarland, locations, quartz sand [Dataset]. https://data.europa.eu/data/datasets/a55a5cc9-80cd-4a74-a649-98929dbacb44
    Explore at:
    atom feedAvailable download formats
    Dataset authored and provided by
    Ministerium für Umwelt, Klima, Mobilität, Agrar und Verbraucherschutz
    Description

    Description of the INSPIRE Download Service (predefined Atom): Operations under mountain control in the Saarland, locations, quartz sand, Bundesberggesetz (BBergG) — The link(s) for downloading the data sets is/are generated dynamically from getFeature Requests to a WFS 1.1.0

  20. N

    Roanoke, IN Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Roanoke, IN Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b24fd620-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Roanoke
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Roanoke by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Roanoke across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a slight majority of female population, with 51.9% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Roanoke is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Roanoke total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Roanoke Population by Race & Ethnicity. You can refer the same here

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
Organization logo

World Bank: Education Data

World Bank: Education Data (BigQuery Dataset)

Explore at:
45 scholarly articles cite this dataset (View in Google Scholar)
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
World Bankhttp://worldbank.org/
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

Content

This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

For more information, see the World Bank website.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

http://data.worldbank.org/data-catalog/ed-stats

https://cloud.google.com/bigquery/public-data/world-bank-education

Citation: The World Bank: Education Statistics

Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @till_indeman from Unplash.

Inspiration

Of total government spending, what percentage is spent on education?

Search
Clear search
Close search
Google apps
Main menu