100+ datasets found

World Bank: Education Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
World Bankhttp://worldbank.org/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

Content

This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

For more information, see the World Bank website.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

http://data.worldbank.org/data-catalog/ed-stats

https://cloud.google.com/bigquery/public-data/world-bank-education

Citation: The World Bank: Education Statistics

Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @till_indeman from Unplash.

Inspiration

Of total government spending, what percentage is spent on education?
MovieLens full 25-million recommendation data 🎬
kaggle.com
Updated Apr 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
iulia (2023). MovieLens full 25-million recommendation data 🎬 [Dataset]. https://www.kaggle.com/datasets/patriciabrezeanu/movielens-full-25-million-recommendation-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
iulia
Description
Summary This dataset (ml-25m) describes a 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 25000095 ratings and 1093360 tag applications across 62423 movies. These data were created by 162541 users between January 09, 1995, and November 21, 2019. This dataset was generated on November 21, 2019. Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided. The data are contained in the files genome-scores.csv, genome-tags.csv, links.csv, movies.csv, ratings.csv, and tags.csv. More details about the contents and use of all these files follow. This and other GroupLens data sets are publicly available for download at

Data from: ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

zenodo.org
data.niaid.nih.gov

csv, zip

Updated Jan 27, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Hossein Keshavarz; Hossein Keshavarz; Meiyappan Nagappan; Meiyappan Nagappan (2022). ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction [Dataset]. http://doi.org/10.5281/zenodo.5907002

Explore at:

zip, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.5907002

Dataset updated

Jan 27, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Hossein Keshavarz; Hossein Keshavarz; Meiyappan Nagappan; Meiyappan Nagappan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

This archive contains the ApacheJIT dataset presented in the paper "ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction" as well as the replication package. The paper is submitted to MSR 2022 Data Showcase Track.

The datasets are available under directory dataset. There are 4 datasets in this directory.

1. apachejit_total.csv: This file contains the entire dataset. Commits are specified by their identifier and a set of commit metrics that are explained in the paper are provided as features. Column buggy specifies whether or not the commit introduced any bug into the system.
2. apachejit_train.csv: This file is a subset of the entire dataset. It provides a balanced set that we recommend for models that are sensitive to class imbalance. This set is obtained from the first 14 years of data (2003 to 2016).
3. apachejit_test_large.csv: This file is a subset of the entire dataset. The commits in this file are the commits from the last 3 years of data. This set is not balanced to represent a real-life scenario in a JIT model evaluation where the model is trained on historical data to be applied on future data without any modification.
4. apachejit_test_small.csv: This file is a subset of the test file explained above. Since the test file has more than 30,000 commits, we also provide a smaller test set which is still unbalanced and from the last 3 years of data.

In addition to the dataset, we also provide the scripts using which we built the dataset. These scripts are written in Python 3.8. Therefore, Python 3.8 or above is required. To set up the environment, we have provided a list of required packages in file requirements.txt. Additionally, one filtering step requires GumTree [1]. For Java, GumTree requires Java 11. For other languages, external tools are needed. Installation guide and more details can be found here.

The scripts are comprised of Python scripts under directory src and Python notebooks under directory notebooks. The Python scripts are mainly responsible for conducting GitHub search via GitHub search API and collecting commits through PyDriller Package [2]. The notebooks link the fixed issue reports with their corresponding fixing commits and apply some filtering steps. The bug-inducing candidates then are filtered again using gumtree.py script that utilizes the GumTree package. Finally, the remaining bug-inducing candidates are combined with the clean commits in the dataset_construction notebook to form the entire dataset.

More specifically, git_token.py handles GitHub API token that is necessary for requests to GitHub API. Script collector.py performs GitHub search. Tracing changed lines and git annotate is done in gitminer.py using PyDriller. Finally, gumtree.py applies 4 filtering steps (number of lines, number of files, language, and change significance).

References:

1. GumTree

* https://github.com/GumTreeDiff/gumtree

Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and accurate source code differencing. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14,Vasteras, Sweden - September 15 - 19, 2014. 313–324

2. PyDriller

* https://pydriller.readthedocs.io/en/latest/

* Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(Lake Buena Vista, FL, USA)(ESEC/FSE2018). Association for Computing Machinery, New York, NY, USA, 908–911

C
National Hydrography Data - NHD and 3DHP
data.cnra.ca.gov
data.ca.gov
+3more
Updated Oct 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Water Resources (2024). National Hydrography Data - NHD and 3DHP [Dataset]. https://data.cnra.ca.gov/dataset/national-hydrography-dataset-nhd
Explore at:
pdf(1634485), pdf(9867020), pdf(182651), pdf(3684753), website, pdf(4856863), zip(578260992), pdf, zip(15824984), csv(12977), arcgis geoservices rest api, zip(10029073), zip(1647291), zip(972664), zip(128966494), pdf(1175775), zip(13901824), zip(73817620), zip(4657694), pdf(1436424), zip(39288832)Available download formats
Dataset updated
Oct 15, 2024
Dataset authored and provided by
California Department of Water Resources
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The USGS National Hydrography Dataset (NHD) Downloadable Data Collection from The National Map (TNM) is a comprehensive set of digital spatial data that encodes information about naturally occurring and constructed bodies of surface water (lakes, ponds, and reservoirs), paths through which water flows (canals, ditches, streams, and rivers), and related entities such as point features (springs, wells, stream gages, and dams). The information encoded about these features includes classification and other characteristics, delineation, geographic name, position and related measures, a "reach code" through which other information can be related to the NHD, and the direction of water flow. The network of reach codes delineating water and transported material flow allows users to trace movement in upstream and downstream directions. In addition to this geographic information, the dataset contains metadata that supports the exchange of future updates and improvements to the data. The NHD supports many applications, such as making maps, geocoding observations, flow modeling, data maintenance, and stewardship. For additional information on NHD, go to https://www.usgs.gov/core-science-systems/ngp/national-hydrography.

DWR was the steward for NHD and Watershed Boundary Dataset (WBD) in California. We worked with other organizations to edit and improve NHD and WBD, using the business rules for California. California's NHD improvements were sent to USGS for incorporation into the national database. The most up-to-date products are accessible from the USGS website. Please note that the California portion of the National Hydrography Dataset is appropriate for use at the 1:24,000 scale.

For additional derivative products and resources, including the major features in geopackage format, please go to this page: https://data.cnra.ca.gov/dataset/nhd-major-features Archives of previous statewide extracts of the NHD going back to 2018 may be found at https://data.cnra.ca.gov/dataset/nhd-archive.

In September 2022, USGS officially notified DWR that the NHD would become static as USGS resources will be devoted to the transition to the new 3D Hydrography Program (3DHP). 3DHP will consist of LiDAR-derived hydrography at a higher resolution than NHD. Upon completion, 3DHP data will be easier to maintain, based on a modern data model and architecture, and better meet the requirements of users that were documented in the Hydrography Requirements and Benefits Study (2016). The initial releases of 3DHP will be the NHD data cross-walked into the 3DHP data model. It will take several years for the 3DHP to be built out for California. Please refer to the resources on this page for more information.

The FINAL,STATIC version of the National Hydrography Dataset for California was published for download by USGS on December 27, 2023. This dataset can no longer be edited by the state stewards.

The first public release of the 3D Hydrography Program map service may be accessed at https://hydro.nationalmap.gov/arcgis/rest/services/3DHP_all/MapServer.

Questions about the California stewardship of these datasets may be directed to nhd_stewardship@water.ca.gov.
T
criteo
tensorflow.org
Updated Dec 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). criteo [Dataset]. https://www.tensorflow.org/datasets/catalog/criteo
Explore at:
Dataset updated
Dec 22, 2022
Description
Criteo Uplift Modeling Dataset

This dataset is released along with the paper: “A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP)

This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.

Data description

This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising. it consists of 25M rows, each one representing a user with 11 features, a treatment indicator and 2 labels (visits and conversions).

Fields

Here is a detailed description of the fields (they are comma-separated in the file):

f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)

treatment: treatment group (1 = treated, 0 = control)

conversion: whether a conversion occured for this user (binary, label)

visit: whether a visit occured for this user (binary, label)

exposure: treatment effect, whether the user has been effectively exposed (binary)

Key figures

Format: CSV

Size: 459MB (compressed)

Rows: 25,309,483

Average Visit Rate: .04132

Average Conversion Rate: .00229

Treatment Ratio: .846

Tasks

The dataset was collected and prepared with uplift prediction in mind as the main task. Additionally we can foresee related usages such as but not limited to:

benchmark for causal inference

uplift modeling

interactions between features and treatment

heterogeneity of treatment

benchmark for observational causality methods

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('criteo', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
b
An Open-Source Iterative python Module for the Automated Identification of...
data.bris.ac.uk
Updated Apr 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). An Open-Source Iterative python Module for the Automated Identification of Photopeaks in Photon Spectra v2.0 - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/n3cm8fnce5ri2k55dlipee3st
Explore at:
Dataset updated
Apr 22, 2022
Description
Code and Isotopic Libraries for the python identification and peak-fitting module. see also An Open-Source Iterative python Module for the Automated Identification of Photopeaks in Photon Spectra v1.0 https://data.bris.ac.uk/data/dataset/28ssj76dp5rx02tf8edowbfe3n Complete download (zip, 170.1 KiB) Alternative title
R
Data from: Signature Detection Dataset
universe.roboflow.com
zip
Updated Jun 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
minaehyeon (2024). Signature Detection Dataset [Dataset]. https://universe.roboflow.com/minaehyeon/signature-detection-a75nf/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 12, 2024
Dataset authored and provided by
minaehyeon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Signature Mh7y Bounding Boxes
Description
Signature Detection

## Overview Signature Detection is a dataset for object detection tasks - it contains Signature Mh7y annotations for 206 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
e
Simple download service (Atom) of the dataset: Small agricultural regions in...
data.europa.eu
Updated Apr 21, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Simple download service (Atom) of the dataset: Small agricultural regions in the region of New Aquitaine [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-fc630f8a-203c-42c2-bdf7-c68f0c06feab
Explore at:
inspire download serviceAvailable download formats
Dataset updated
Apr 21, 2022
Description
The Small Agricultural Regions (PRA) are the intersections of the Agricultural Regions with the departments. The Agricultural Regions (RA) are regions with the same dominant agricultural vocation. They cover an entire number of municipalities forming a homogeneous agricultural area. They were delimited by INSEE in 1946. The last update was in 1981.

WMS and WMS addresses: Warnings — Position on the address, right click, copy the link address and paste into the WMS server connection dialog box, WFS. The use of another method leads to the appearance of parasitic spaces. — Problems with displaying multi polygons via the use of WFS (under resolution) — prefer data download if presence of multi polygons — WFS display of more than 500 objects via the WFS impossible at the moment

WMS address for integration into a GIS from Geoide_Carto (add layers after another): http://data.geo-ide.application.developpement-durable.gouv.fr/WMS/228/PRA_R75?

WFS address for GIS integration: http://ogc.geo-ide.developpement-durable.gouv.fr/cartes/mapserv?map=/opt/data/carto/geoide-catalogue/REG072A/JDD.www.map
Dataset metadata of known Dataverse installations
search.datacite.org
dataverse.harvard.edu
+1more
Updated 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julian Gautier (2019). Dataset metadata of known Dataverse installations [Dataset]. http://doi.org/10.7910/dvn/dcdkzq
Explore at:
Unique identifier
https://doi.org/10.7910/dvn/dcdkzq
Dataset updated
2019
Dataset provided by
DataCitehttps://www.datacite.org/
Harvard Dataverse
Authors
Julian Gautier
Description
This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected this data, 36 installations were running versions of the Dataverse software that allow depositors to choose a license or data use agreement from a dropdown menu in the dataset deposit form. For more information, see https://guides.dataverse.org/en/5.11.1/user/dataset-management.html#choosing-a-license. The metadatablocks_from_most_known_dataverse_installations.csv file contains the metadata block names, field names and child field names (if the field is a compound field) of the 77 Dataverse installations' metadata blocks. The metadatablocks_from_most_known_dataverse_installations.csv file is useful for comparing each installation's dataset metadata model (the metadata fields and the metadata blocks that each installation uses). The CSV file was created using a Python script at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_csv_file_with_metadata_block_fields_of_all_installations.py, which takes as inputs the directories and files created by the get_dataset_metadata_of_all_installations.py script. Known errors The metadata of two datasets from one of the known installations could not be downloaded because the datasets' pages and metadata could not be accessed with the Dataverse APIs. About metadata blocks Read about the Dataverse software's metadata blocks system at http://guides.dataverse.org/en/latest/admin/metadatacustomization.html
Raccoon Range - CWHR M153 [ds1936]
data.ca.gov
data.cnra.ca.gov
+5more
Updated Mar 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2020). Raccoon Range - CWHR M153 [ds1936] [Dataset]. https://data.ca.gov/dataset/raccoon-range-cwhr-m153-ds1936
Explore at:
arcgis geoservices rest api, kml, csv, geojson, zip, htmlAvailable download formats
Dataset updated
Mar 17, 2020
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.
e
Dataset Direct Download Service (WFS): N_PERIMETRE_PPRN_20140454_S_032
data.europa.eu
Updated Feb 18, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Dataset Direct Download Service (WFS): N_PERIMETRE_PPRN_20140454_S_032 [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-d2d85ece-e42e-45d5-9cc4-30ffc18a45c7
Explore at:
Dataset updated
Feb 18, 2022
Description
Perimeter of the Natural Risk Prevention Plan Withdrawal of the Argiles of the commune of Salles-d’Armagnac in the department of Gers. This dataset contains the boundaries at the different stages of the development of the RPP. The characteristic of these perimeters is to be the consequence of an official act and to produce their effects from a specified date. This is the:- prescribed perimeter set out in a PPR’s prescription order;- risk exposure perimeter that corresponds to the perimeter regulated by the approved RPP, this approved perimeter is a utility easement;- study scope that corresponds to the envelope in which the hazards were studied.
Fixed Broadband Deployment Data: December 2020
catalog.data.gov
opendata.fcc.gov
+1more
Updated Sep 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
opendata.fcc.gov (2023). Fixed Broadband Deployment Data: December 2020 [Dataset]. https://catalog.data.gov/dataset/fixed-broadband-deployment-data-december-2020
Explore at:
Dataset updated
Sep 14, 2023
Dataset provided by
Federal Communications Commissionhttp://fcc.gov/
Description
The data collected to create this dataset was in place through data as of June, 2021. For more recent broadband availability data, please see https://broadbandmap.fcc.gov; for more information about the related data collection, please see https://www.fcc.gov/BroadbandData. All facilities-based broadband providers are required to file data with the FCC twice a year (Form 477) on where they offer Internet access service at speeds exceeding 200 kbps in at least one direction. Fixed providers file lists of census blocks in which they can or do offer service to at least one location, with additional information about the service. Data Download Page: (https://www.fcc.gov/general/broadband-deployment-data-fcc-form-477. Resources page: https://www.fcc.gov/general/form-477-resources-filers
Sarnet Search And Rescue Dataset
universe.roboflow.com
zip
Updated Jun 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow Public (2022). Sarnet Search And Rescue Dataset [Dataset]. https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue
Explore at:
zipAvailable download formats
Dataset updated
Jun 16, 2022
Dataset provided by
Roboflow
Authors
Roboflow Public
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
SaR Bounding Boxes
Description
Description from the SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery GitHub Repository * The "Note" was added by the Roboflow team.

Satellite Imagery for Search And Rescue Dataset - ArXiv

This is a single class dataset consisting of tiles of satellite imagery labeled with potential 'targets'. Labelers were instructed to draw boxes around anything they suspect may a paraglider wing, missing in a remote area of Nevada. Volunteers were shown examples of similar objects already in the environment for comparison. The missing wing, as it was found after 3 weeks, is shown below.

https://michaeltpublic.s3.amazonaws.com/images/anomaly_small.jpg" alt="anomaly">

The dataset contains the following:

Set Images Annotations
Train 1808 3048
Validate 490 747
Test 254 411
Total 2552 4206

The data is in the COCO format, and is directly compatible with faster r-cnn as implemented in Facebook's Detectron2.

Getting hold of the Data

Download the data here: sarnet.zip

Or follow these steps

# download the dataset wget https://michaeltpublic.s3.amazonaws.com/sarnet.zip # extract the files unzip sarnet.zip

***Note* with Roboflow, you can download the data here** (original, raw images, with annotations): https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue/ (download v1, original_raw-images) * Download the dataset in COCO JSON format, or another format of choice, and import them to Roboflow after unzipping the folder to get started on your project.

Getting started

Get started with a Faster R-CNN model pretrained on SaRNet: SaRNet_Demo.ipynb

Source Code for Paper

Source code for the paper is located here: SaRNet_train_test.ipynb

Cite this dataset

@misc{thoreau2021sarnet, title={SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery}, author={Michael Thoreau and Frazer Wilson}, year={2021}, eprint={2107.12469}, archivePrefix={arXiv}, primaryClass={eess.IV} }

Acknowledgment

The source data was generously provided by Planet Labs, Airbus Defence and Space, and Maxar Technologies.
Google Patents Public Data
kaggle.com
zip
Updated Sep 19, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/bigquery/patents
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2018
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

Content

The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

Acknowledgements

Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

“Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

Banner photo by Helloquence on Unsplash
d
Next Generation Simulation (NGSIM) Vehicle Trajectories and Supporting Data
catalog.data.gov
data.transportation.gov
+5more
Updated Mar 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Highway Administration (2025). Next Generation Simulation (NGSIM) Vehicle Trajectories and Supporting Data [Dataset]. https://catalog.data.gov/dataset/next-generation-simulation-ngsim-vehicle-trajectories-and-supporting-data
Explore at:
Dataset updated
Mar 16, 2025
Dataset provided by
Federal Highway Administration
Description
Click “Export” on the right to download the vehicle trajectory data. The associated metadata and additional data can be downloaded below under "Attachments". Researchers for the Next Generation Simulation (NGSIM) program collected detailed vehicle trajectory data on southbound US 101 and Lankershim Boulevard in Los Angeles, CA, eastbound I-80 in Emeryville, CA and Peachtree Street in Atlanta, Georgia. Data was collected through a network of synchronized digital video cameras. NGVIDEO, a customized software application developed for the NGSIM program, transcribed the vehicle trajectory data from the video. This vehicle trajectory data provided the precise location of each vehicle within the study area every one-tenth of a second, resulting in detailed lane positions and locations relative to other vehicles. Click the "Show More" button below to find additional contextual data and metadata for this dataset. For site-specific NGSIM video file datasets, please see the following: - NGSIM I-80 Videos: https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Program-I-80-Vide/2577-gpny - NGSIM US-101 Videos: https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Program-US-101-Vi/4qzi-thur - NGSIM Lankershim Boulevard Videos: https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Program-Lankershi/uv3e-y54k - NGSIM Peachtree Street Videos: https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Program-Peachtree/mupt-aksf
C
Dataset download service: Administrative Boundary
ckan.mobidatalab.eu
wfs
Updated Apr 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GeoDatiGovIt RNDT (2023). Dataset download service: Administrative Boundary [Dataset]. https://ckan.mobidatalab.eu/dataset/administrative-boundary-dataset-download-service
Explore at:
wfsAvailable download formats
Dataset updated
Apr 28, 2023
Dataset provided by
GeoDatiGovIt RNDT
Description
The level structure is harmonized according to the standards of the INSPIRE 2007/2/EC directive of 14 March 2007 starting from the regional information level "Administrative limits". The Inspire level with linear geometry consists of a hierarchical representation of the three types of administrative area present: 4thOrder (Municipality), 3rdOrder (Province), 2ndOrder (Region). The information level has been updated with a change in the toponymy of the Municipality of Ortonovo in Luni L.R. n.5/2017 and the merger of the geometries of the municipalities of Montalto Ligure and Carpasio into the new municipality Montalto Carpasio L.R. 21/2017.
P
Strandings of Oceania Database
pacificdata.org
png-data.sprep.org
+14more
docx, pdf, xls
Updated Feb 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Secretariat of the Pacific Regional Environment Programme (2022). Strandings of Oceania Database [Dataset]. https://pacificdata.org/data/dataset/strandings-of-oceania-databasec57c9884-597b-4fce-8532-4c27f63791df
Explore at:
xls, pdf, docxAvailable download formats
Dataset updated
Feb 11, 2022
Dataset provided by
Secretariat of the Pacific Regional Environment Programme
License
https://pacific-data.sprep.org/dataset/data-portal-license-agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588https://pacific-data.sprep.org/dataset/data-portal-license-agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588
Description
The Strandings of Oceania database is a collaborative project between SPREP, WildMe and the South Pacific Whale Research Consortium to record stranding and beachcast data for whales, dolphins and dugongs throughout the Pacific. We use a platform called Flukebook. An account is needed to view or use data within Flukebook but the data is available for download here. You can submit data direct into Flukebook (preferably while logged in) or send a completed data form to SPREP for upload. Guidance on using the database is available :
Stock Market Dataset
kaggle.com
zip
Updated Apr 2, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oleh Onyshchak (2020). Stock Market Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/1054465
Explore at:
zip(547714524 bytes)Available download formats
Unique identifier
https://doi.org/10.34740/kaggle/dsv/1054465
Dataset updated
Apr 2, 2020
Authors
Oleh Onyshchak
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Overview

This dataset contains historical daily prices for all tickers currently trading on NASDAQ. The up to date list is available from nasdaqtrader.com. The historic data is retrieved from Yahoo finance via yfinance python package.

It contains prices for up to 01 of April 2020. If you need more up to date data, just fork and re-run data collection script also available from Kaggle.

Data Structure

The date for every symbol is saved in CSV format with common fields:

Date - specifies trading date

Open - opening price

High - maximum price during the day

Low - minimum price during the day

Close - close price adjusted for splits

Adj Close - adjusted close price adjusted for both dividends and splits.

Volume - the number of shares that changed hands during a given day

All that ticker data is then stored in either ETFs or stocks folder, depending on a type. Moreover, each filename is the corresponding ticker symbol. At last, symbols_valid_meta.csv contains some additional metadata for each ticker such as full name.
e
Inspire Download Service (predefined ATOM) for dataset operations under...
data.europa.eu
atom feed
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ministerium für Umwelt, Klima, Mobilität, Agrar und Verbraucherschutz, Inspire Download Service (predefined ATOM) for dataset operations under mountain control in Saarland, locations, quartz sand [Dataset]. https://data.europa.eu/data/datasets/a55a5cc9-80cd-4a74-a649-98929dbacb44
Explore at:
atom feedAvailable download formats
Dataset authored and provided by
Ministerium für Umwelt, Klima, Mobilität, Agrar und Verbraucherschutz
Description
Description of the INSPIRE Download Service (predefined Atom): Operations under mountain control in the Saarland, locations, quartz sand, Bundesberggesetz (BBergG) — The link(s) for downloading the data sets is/are generated dynamically from getFeature Requests to a WFS 1.1.0
N
Roanoke, IN Population Breakdown by Gender Dataset: Male and Female...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Roanoke, IN Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b24fd620-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Roanoke
Variables measured
Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Roanoke by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Roanoke across both sexes and to determine which sex constitutes the majority.

Key observations

There is a slight majority of female population, with 51.9% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

Variables / Data Columns

Gender: This column displays the Gender (Male / Female)

Population: The population of the gender in the Roanoke is shown in this column.

% of Total Population: This column displays the percentage distribution of each gender as a proportion of Roanoke total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Roanoke Population by Race & Ethnicity. You can refer the same here

Set	Images	Annotations
Train	1808	3048
Validate	490	747
Test	254	411
Total	2552	4206

Facebook

Twitter

Click to copy link

Link copied

Cite

World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education

World Bank: Education Data

World Bank: Education Data (BigQuery Dataset)

Explore at:

45 scholarly articles cite this dataset (View in Google Scholar)

zip(0 bytes)Available download formats

Dataset updated

Mar 20, 2019

Dataset authored and provided by

World Bankhttp://worldbank.org/

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

Content

This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

For more information, see the World Bank website.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

http://data.worldbank.org/data-catalog/ed-stats

https://cloud.google.com/bigquery/public-data/world-bank-education

Citation: The World Bank: Education Statistics

Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @till_indeman from Unplash.

Inspiration

Of total government spending, what percentage is spent on education?

Clear search

Close search

Google apps

Main menu

World Bank: Education Data

Context

Content

Acknowledgements

Inspiration

MovieLens full 25-million recommendation data 🎬

Data from: ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

National Hydrography Data - NHD and 3DHP

criteo

Criteo Uplift Modeling Dataset

Data description

Fields

Key figures

Tasks

An Open-Source Iterative python Module for the Automated Identification of...

Data from: Signature Detection Dataset

Signature Detection

Simple download service (Atom) of the dataset: Small agricultural regions in...

Dataset metadata of known Dataverse installations

Raccoon Range - CWHR M153 [ds1936]

Dataset Direct Download Service (WFS): N_PERIMETRE_PPRN_20140454_S_032

Fixed Broadband Deployment Data: December 2020

Sarnet Search And Rescue Dataset

Satellite Imagery for Search And Rescue Dataset - ArXiv

Getting hold of the Data

Getting started

Source Code for Paper

Cite this dataset

Acknowledgment

Google Patents Public Data

Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

Context

Content

Acknowledgements

Next Generation Simulation (NGSIM) Vehicle Trajectories and Supporting Data

Dataset download service: Administrative Boundary

Strandings of Oceania Database

Stock Market Dataset

Overview

Data Structure

Inspire Download Service (predefined ATOM) for dataset operations under...

Roanoke, IN Population Breakdown by Gender Dataset: Male and Female...

About this dataset

Content

Inspiration

Recommended for further research

World Bank: Education Data

World Bank: Education Data (BigQuery Dataset)

Context

Content

Acknowledgements

Inspiration