The Meta-Dataset benchmark is a large few-shot learning benchmark and consists of multiple datasets of different data distributions. It does not restrict few-shot tasks to have fixed ways and shots, thus representing a more realistic scenario. It consists of 10 datasets from diverse domains:
ILSVRC-2012 (the ImageNet dataset, consisting of natural images with 1000 categories) Omniglot (hand-written characters, 1623 classes) Aircraft (dataset of aircraft images, 100 classes) CUB-200-2011 (dataset of Birds, 200 classes) Describable Textures (different kinds of texture images with 43 categories) Quick Draw (black and white sketches of 345 different categories) Fungi (a large dataset of mushrooms with 1500 categories) VGG Flower (dataset of flower images with 102 categories), Traffic Signs (German traffic sign images with 43 classes) MSCOCO (images collected from Flickr, 80 classes).
All datasets except Traffic signs and MSCOCO have a training, validation and test split (proportioned roughly into 70%, 15%, 15%). The datasets Traffic Signs and MSCOCO are reserved for testing only.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Face For Small Large is a dataset for object detection tasks - it contains Faces annotations for 389 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description of the INSPIRE Download Service (predefined Atom): Development plan "Große Wiesen 2. Änderung" of the municipality of Rettert - The link(s) for downloading the data sets is/are dynamically generated from Get Map calls to a WMS interface
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.
Key observations
The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here
Once PowerPivot has been installed, to load the large files, please follow the instructions below: Start Excel as normal Click on the PowerPivot tab Click on the PowerPivot Window icon (top left) In the PowerPivot Window, click on the "From Other Sources" icon In the Table Import Wizard e.g. scroll to the bottom and select Text File Browse to the file you want to open and choose the file extension you require e.g. CSV Please read the below notes to ensure correct understanding of the data. Microsoft PowerPivot add-on for Excel can be used to handle larger data sets. The Microsoft PowerPivot add-on for Excel is available using the link in the 'Related Links' section - https://www.microsoft.com/en-us/download/details.aspx?id=43348 Once PowerPivot has been installed, to load the large files, please follow the instructions below: 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Please read the below notes to ensure correct understanding of the data. Fewer than 5 Items Please be aware that I have decided not to release the exact number of items, where the total number of items falls below 5, for certain drugs/patient combinations. Where suppression has been applied a * is shown in place of the number of items, please read this as 1-4 items. Suppressions have been applied where items are lower than 5, for items and NIC and for quantity when quantity and items are both lower than 5 for the following drugs and identified genders as per the sensitive drug list; When the BNF Paragraph Code is 60401 (Female Sex Hormones & Their Modulators) and the gender identified on the prescription is Male When the BNF Paragraph Code is 60402 (Male Sex Hormones And Antagonists) and the gender identified on the prescription is Female When the BNF Paragraph Code is 70201 (Preparations For Vaginal/Vulval Changes) and the gender identified on the prescription is Male When the BNF Paragraph Code is 70202 (Vaginal And Vulval Infections) and the gender identified on the prescription is Male When the BNF Paragraph Code is 70301 (Combined Hormonal Contraceptives/Systems) and the gender identified on the prescription is Male When the BNF Paragraph Code is 70302 (Progestogen-only Contraceptives) and the gender identified on the prescription is Male When the BNF Paragraph Code is 80302 (Progestogens) and the gender identified on the prescription is Male When the BNF Paragraph Code is 70405 (Drugs For Erectile Dysfunction) and the gender identified on the prescription is Female When the BNF Paragraph Code is 70406 (Drugs For Premature Ejaculation) and the gender identified on the prescription is Female This is because the patients could be identified, when combined with other information that may be in the public domain or reasonably available. This information falls under the exemption in section 40 subsections 2 and 3A (a) of the Freedom of Information Act. This is because it would breach the first data protection principle as: a. it is not fair to disclose patients personal details to the world and is likely to cause damage or distress. b. these details are not of sufficient interest to the public to warrant an intrusion into the privacy of the patients. Please click the below web link to see the exemption in full.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description of the INSPIRE Download Service (predefined Atom): Development Plan Große Garten Böhl-Iggelheim - The link(s) for downloading the datasets is/are dynamically generated from Get Map calls to a WMS interface
S3DIS comprises 6 colored 3D point clouds from 6 large-scale indoor areas, along with semantic instance annotations for 12 object categories (wall, floor, ceiling, beam, column, window, door, sofa, desk, chair, bookcase, and board).
The Stanford Large-Scale 3D Indoor Spaces (S3DIS) dataset is composed of the colored 3D point clouds of six large-scale indoor areas from three different buildings, each covering approximately 935, 965, 450, 1700, 870, and 1100 square meters (total of 6020 square meters). These areas show diverse properties in architectural style and appearance and include mainly office areas, educational and exhibition spaces, and conference rooms, personal offices, restrooms, open spaces, lobbies, stairways, and hallways are commonly found therein. The entire point clouds are automatically generated without any manual intervention using the Matterport scanner. The dataset also includes semantic instance annotations on the point clouds for 12 semantic elements, which are structural elements (ceiling, floor, wall, beam, column, window, and door) and commonly found items and furniture (table, chair, sofa, bookcase, and board).
https://redivis.com/fileUploads/5bdaf09c-7d3b-4a91-b192-d98a0f0b0018%3E" alt="S3DIS.png">
%3Cu%3E%3Cstrong%3EImportant Information%3C/strong%3E%3C/u%3E
%3C!-- --%3E
NEWSROOM is a large dataset for training and evaluating summarization systems. It contains 1.3 million articles and summaries written by authors and editors in the newsrooms of 38 major publications.
Dataset features includes:
And additional features:
This dataset can be downloaded upon requests. Unzip all the contents "train.jsonl, dev.jsonl, test.jsonl" to the tfds folder.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('newsroom', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
This dataset is released along with the paper: “A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP)
This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.
This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising. it consists of 25M rows, each one representing a user with 11 features, a treatment indicator and 2 labels (visits and conversions).
Here is a detailed description of the fields (they are comma-separated in the file):
The dataset was collected and prepared with uplift prediction in mind as the main task. Additionally we can foresee related usages such as but not limited to:
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('criteo', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
A Digital Terrain Model (DTM) is a digital file consisting of a grid of regularly spaced points of known height which, when used with other digital data such as maps or orthophotographs, can provide a 3D image of the land surface. 10m and 50m DTM’s are available. This is a large dataset and will take sometime to download. Please be patient. By download or use of this dataset you agree to abide by the Open Government Data Licence.
The Census of Agriculture, produced by the United States Department of Agriculture (USDA), provides a complete count of Texas' farms, ranches and the people who grow our food. The census is conducted every five years, most recently in 2022, and provides an in-depth look at the agricultural industry.The complete census includes over 260 separate commodities. This dataset is a subset of 23 commodities selected for publishingThis layer was produced from data obtained from the USDA National Agriculture Statistics Service (NASS) Large Datasets download page. The data were transformed and prepared for publishing using the Pivot Table geoprocessing tool in ArcGIS Pro and joined to county boundaries. The county boundaries are 2022 vintage and come from Living Atlas ACS 2022 feature layers.AttributesNote that some values are suppressed as "Withheld to avoid disclosing data for individual operations", "Not applicable", or "Less than half the rounding unit". These have been coded in the data as -999, -888, and -777 respectively.AlmondsAnimal TotalsBarleyCattleChickensCornCottonCrop TotalsGovt ProgramsGrainGrapesHayHogsLaborMachinery TotalsRiceSorghumSoybeanTractorsTrucksTurkeysWheatWinter Wheat
https://choosealicense.com/licenses/gfdl/https://choosealicense.com/licenses/gfdl/
this is a subset of the wikimedia/wikipedia dataset code for creating this dataset : from datasets import load_dataset, Dataset from sentence_transformers import SentenceTransformer model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
dataset = load_dataset( "wikimedia/wikipedia", "20231101.en", split="train", streaming=True )
from tqdm importtqdm data = Dataset.from_dict({}) for i, entry in… See the full description on the dataset page: https://huggingface.co/datasets/not-lain/wikipedia-small-3000-embedded.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Taupō District Council large file download application. Source lidar, contour and imagery files are available for download. Flood Hazard data relating to Plan Change 34 of the Taupō District Plan is also available for download. Taupō District Council does not make any representation or give any warranty as to the accuracy or exhaustiveness of the data provided for download via this application. The data provided is indicative only and does not purport to be a complete database of all information in Taupō District Council's possession or control. Taupō District Council shall not be liable for any loss, damage, cost or expense (whether direct or indirect) arising from reliance upon or use of any data provided, or Council's failure to provide this data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the accompanying dataset to the following paper https://www.nature.com/articles/s41597-023-01975-w
Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.
If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.
All current development and additional community extensions can be found at https://github.com/kratzert/Caravan
Channel Log:
23 May 2022: Version 0.2 - Resolved a bug when renaming the LamaH gauge ids from the LamaH ids to the official gauge ids provided as "govnr" in the LamaH dataset attribute files.
24 May 2022: Version 0.3 - Fixed gaps in forcing data in some "camels" (US) basins.
15 June 2022: Version 0.4 - Fixed replacing negative CAMELS US values with NaN (-999 in CAMELS indicates missing observation).
1 December 2022: Version 0.4 - Added 4298 basins in the US, Canada and Mexico (part of HYSETS), now totalling to 6830 basins. Fixed a bug in the computation of catchment attributes that are defined as pour point properties, where sometimes the wrong HydroATLAS polygon was picked. Restructured the attribute files and added some more meta data (station name and country).
16 January 2023: Version 1.0 - Version of the official paper release. No changes in the data but added a static copy of the accompanying code of the paper. For the most up to date version, please check https://github.com/kratzert/Caravan
10 May 2023: Version 1.1 - No data change, just update data description.
17 May 2023: Version 1.2 - Updated a handful of attribute values that were affected by a bug in their derivation. See https://github.com/kratzert/Caravan/issues/22 for details.
16 April 2024: Version 1.4 - Added 9130 gauges from the original source dataset that were initially not included because of the area thresholds (i.e. basins smaller than 100sqkm or larger than 2000sqkm). Also extended the forcing period for all gauges (including the original ones) to 1950-2023. Added two different download options that include timeseries data only as either csv files (Caravan-csv.tar.xz) or netcdf files (Caravan-nc.tar.xz). Including the large basins also required an update in the earth engine code
16 Jan 2025: Version 1.5 - Added FAO Penman-Monteith PET (potential_evaporation_sum_FAO_PENMAN_MONTEITH) and renamed the ERA5-LAND potential_evaporation band to potential_evaporation_sum_ERA5_LAND. Also added all PET-related climated indices derived with the Penman-Monteith PET band (suffix "_FAO_PM") and renamed the old PET-related indices accordingly (suffix "_ERA5_LAND").
A Digital Terrain Model (DTM) is a digital file consisting of a grid of regularly spaced points of known height which, when used with other digital data such as maps or orthophotographs, can provide a 3D image of the land surface. This download contains OSNI 10k sheet numbers 201-250. This is a large dataset and will take sometime to download. Please be patient. This service is published for OpenData. By download or use of this dataset you agree to abide by the LPS Open Government Data Licence.Please Note for Open Data NI Users: Esri Rest API is not Broken, it will not open on its own in a Web Browser but can be copied and used in Desktop and Webmaps
A large-scale hierarchical dataset of diverse student activities collected by Santa, a multi-platform self-study solution equipped with artificial intelligence tutoring system. EdNet contains 131,441,538 interactions from 784,309 students collected over more than 2 years, which is the largest among the ITS datasets released to the public so far.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Big Rock by race. It includes the population of Big Rock across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Big Rock across relevant racial categories.
Key observations
The percent distribution of Big Rock population by race (across all racial categories recognized by the U.S. Census Bureau): 93.04% are white, 0.16% are American Indian and Alaska Native, 1.80% are Asian, 0.25% are some other race and 4.75% are multiracial.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Big Rock Population by Race & Ethnicity. You can refer the same here
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
A Digital Terrain Model (DTM) is a digital file consisting of a grid of regularly spaced points of known height which, when used with other digital data such as maps or orthophotographs, can provide a 3D image of the land surface. This download contains OSNI 10k sheet numbers 1-50.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
The Fox News Dataset is a comprehensive collection of over 1 million news articles, offering an unparalleled resource for analyzing media narratives, public discourse, and political trends. Covering articles up to the year 2023, this dataset is a treasure trove for researchers, analysts, and businesses interested in gaining deeper insights into the topics and trends covered by Fox News.
This large dataset is ideal for:
Discover additional resources for your research needs by visiting our news dataset collection. These datasets are tailored to support diverse analytical applications, including sentiment analysis and trend modeling.
The Fox News Dataset is a must-have for anyone interested in exploring large-scale media data and leveraging it for advanced analysis. Ready to dive into this wealth of information? Download the dataset now in CSV format and start uncovering the stories behind the headlines.
The Meta-Dataset benchmark is a large few-shot learning benchmark and consists of multiple datasets of different data distributions. It does not restrict few-shot tasks to have fixed ways and shots, thus representing a more realistic scenario. It consists of 10 datasets from diverse domains:
ILSVRC-2012 (the ImageNet dataset, consisting of natural images with 1000 categories) Omniglot (hand-written characters, 1623 classes) Aircraft (dataset of aircraft images, 100 classes) CUB-200-2011 (dataset of Birds, 200 classes) Describable Textures (different kinds of texture images with 43 categories) Quick Draw (black and white sketches of 345 different categories) Fungi (a large dataset of mushrooms with 1500 categories) VGG Flower (dataset of flower images with 102 categories), Traffic Signs (German traffic sign images with 43 classes) MSCOCO (images collected from Flickr, 80 classes).
All datasets except Traffic signs and MSCOCO have a training, validation and test split (proportioned roughly into 70%, 15%, 15%). The datasets Traffic Signs and MSCOCO are reserved for testing only.