100+ datasets found
  1. P

    Meta-Dataset Dataset

    • paperswithcode.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eleni Triantafillou; Tyler Zhu; Vincent Dumoulin; Pascal Lamblin; Utku Evci; Kelvin Xu; Ross Goroshin; Carles Gelada; Kevin Swersky; Pierre-Antoine Manzagol; Hugo Larochelle, Meta-Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/meta-dataset
    Explore at:
    Authors
    Eleni Triantafillou; Tyler Zhu; Vincent Dumoulin; Pascal Lamblin; Utku Evci; Kelvin Xu; Ross Goroshin; Carles Gelada; Kevin Swersky; Pierre-Antoine Manzagol; Hugo Larochelle
    Description

    The Meta-Dataset benchmark is a large few-shot learning benchmark and consists of multiple datasets of different data distributions. It does not restrict few-shot tasks to have fixed ways and shots, thus representing a more realistic scenario. It consists of 10 datasets from diverse domains:

    ILSVRC-2012 (the ImageNet dataset, consisting of natural images with 1000 categories) Omniglot (hand-written characters, 1623 classes) Aircraft (dataset of aircraft images, 100 classes) CUB-200-2011 (dataset of Birds, 200 classes) Describable Textures (different kinds of texture images with 43 categories) Quick Draw (black and white sketches of 345 different categories) Fungi (a large dataset of mushrooms with 1500 categories) VGG Flower (dataset of flower images with 102 categories), Traffic Signs (German traffic sign images with 43 classes) MSCOCO (images collected from Flickr, 80 classes).

    All datasets except Traffic signs and MSCOCO have a training, validation and test split (proportioned roughly into 70%, 15%, 15%). The datasets Traffic Signs and MSCOCO are reserved for testing only.

  2. R

    Face For Small Large Dataset

    • universe.roboflow.com
    zip
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ok (2024). Face For Small Large Dataset [Dataset]. https://universe.roboflow.com/ok-4sjtq/face-for-small-large/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 13, 2024
    Dataset authored and provided by
    ok
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Faces Bounding Boxes
    Description

    Face For Small Large

    ## Overview
    
    Face For Small Large is a dataset for object detection tasks - it contains Faces annotations for 389 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  3. g

    INSPIRE Download Service (predefined ATOM) for dataset Large meadows 2. |...

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INSPIRE Download Service (predefined ATOM) for dataset Large meadows 2. | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_da5acc6d-b0c0-0002-6a1f-d8d41d0d9a96/
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Description of the INSPIRE Download Service (predefined Atom): Development plan "Große Wiesen 2. Änderung" of the municipality of Rettert - The link(s) for downloading the data sets is/are dynamically generated from Get Map calls to a WMS interface

  4. N

    Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...

    • neilsberg.com
    csv, json
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aa8c95e0-4983-11ef-ae5d-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Excel
    Variables measured
    Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.

    Key observations

    The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group in consideration
    • Population: The population for the specific age group in the Excel is shown in this column.
    • % of Total Population: This column displays the population of each age group as a proportion of Excel total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here

  5. n

    FOI 30990 - Datasets - Open Data Portal

    • opendata.nhsbsa.net
    Updated Feb 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). FOI 30990 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-30990
    Explore at:
    Dataset updated
    Feb 13, 2023
    Description

    Once PowerPivot has been installed, to load the large files, please follow the instructions below: Start Excel as normal Click on the PowerPivot tab Click on the PowerPivot Window icon (top left) In the PowerPivot Window, click on the "From Other Sources" icon In the Table Import Wizard e.g. scroll to the bottom and select Text File Browse to the file you want to open and choose the file extension you require e.g. CSV Please read the below notes to ensure correct understanding of the data. Microsoft PowerPivot add-on for Excel can be used to handle larger data sets. The Microsoft PowerPivot add-on for Excel is available using the link in the 'Related Links' section - https://www.microsoft.com/en-us/download/details.aspx?id=43348 Once PowerPivot has been installed, to load the large files, please follow the instructions below: 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Please read the below notes to ensure correct understanding of the data. Fewer than 5 Items Please be aware that I have decided not to release the exact number of items, where the total number of items falls below 5, for certain drugs/patient combinations. Where suppression has been applied a * is shown in place of the number of items, please read this as 1-4 items. Suppressions have been applied where items are lower than 5, for items and NIC and for quantity when quantity and items are both lower than 5 for the following drugs and identified genders as per the sensitive drug list; When the BNF Paragraph Code is 60401 (Female Sex Hormones & Their Modulators) and the gender identified on the prescription is Male When the BNF Paragraph Code is 60402 (Male Sex Hormones And Antagonists) and the gender identified on the prescription is Female When the BNF Paragraph Code is 70201 (Preparations For Vaginal/Vulval Changes) and the gender identified on the prescription is Male When the BNF Paragraph Code is 70202 (Vaginal And Vulval Infections) and the gender identified on the prescription is Male When the BNF Paragraph Code is 70301 (Combined Hormonal Contraceptives/Systems) and the gender identified on the prescription is Male When the BNF Paragraph Code is 70302 (Progestogen-only Contraceptives) and the gender identified on the prescription is Male When the BNF Paragraph Code is 80302 (Progestogens) and the gender identified on the prescription is Male When the BNF Paragraph Code is 70405 (Drugs For Erectile Dysfunction) and the gender identified on the prescription is Female When the BNF Paragraph Code is 70406 (Drugs For Premature Ejaculation) and the gender identified on the prescription is Female This is because the patients could be identified, when combined with other information that may be in the public domain or reasonably available. This information falls under the exemption in section 40 subsections 2 and 3A (a) of the Freedom of Information Act. This is because it would breach the first data protection principle as: a. it is not fair to disclose patients personal details to the world and is likely to cause damage or distress. b. these details are not of sufficient interest to the public to warrant an intrusion into the privacy of the patients. Please click the below web link to see the exemption in full.

  6. g

    INSPIRE Download Service (predefined ATOM) for dataset Large Gardens |...

    • gimi9.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INSPIRE Download Service (predefined ATOM) for dataset Large Gardens | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_723061d2-ea3f-0002-2b64-31e9ac312e5e/
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Description of the INSPIRE Download Service (predefined Atom): Development Plan Große Garten Böhl-Iggelheim - The link(s) for downloading the datasets is/are dynamically generated from Get Map calls to a WMS interface

  7. Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS)

    • redivis.com
    application/jsonl +7
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Doerr School of Sustainability (2024). Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) [Dataset]. http://doi.org/10.57761/gk3g-wc33
    Explore at:
    avro, sas, arrow, csv, application/jsonl, parquet, spss, stataAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Doerr School of Sustainability
    Time period covered
    Jun 27, 2024
    Description

    Abstract

    S3DIS comprises 6 colored 3D point clouds from 6 large-scale indoor areas, along with semantic instance annotations for 12 object categories (wall, floor, ceiling, beam, column, window, door, sofa, desk, chair, bookcase, and board).

    Methodology

    The Stanford Large-Scale 3D Indoor Spaces (S3DIS) dataset is composed of the colored 3D point clouds of six large-scale indoor areas from three different buildings, each covering approximately 935, 965, 450, 1700, 870, and 1100 square meters (total of 6020 square meters). These areas show diverse properties in architectural style and appearance and include mainly office areas, educational and exhibition spaces, and conference rooms, personal offices, restrooms, open spaces, lobbies, stairways, and hallways are commonly found therein. The entire point clouds are automatically generated without any manual intervention using the Matterport scanner. The dataset also includes semantic instance annotations on the point clouds for 12 semantic elements, which are structural elements (ceiling, floor, wall, beam, column, window, and door) and commonly found items and furniture (table, chair, sofa, bookcase, and board).

    https://redivis.com/fileUploads/5bdaf09c-7d3b-4a91-b192-d98a0f0b0018%3E" alt="S3DIS.png">

    %3Cu%3E%3Cstrong%3EImportant Information%3C/strong%3E%3C/u%3E

    %3C!-- --%3E

  8. T

    newsroom

    • tensorflow.org
    • opendatalab.com
    Updated Dec 14, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). newsroom [Dataset]. https://www.tensorflow.org/datasets/catalog/newsroom
    Explore at:
    Dataset updated
    Dec 14, 2022
    Description

    NEWSROOM is a large dataset for training and evaluating summarization systems. It contains 1.3 million articles and summaries written by authors and editors in the newsrooms of 38 major publications.

    Dataset features includes:

    • text: Input news text.
    • summary: Summary for the news.

    And additional features:

    • title: news title.
    • url: url of the news.
    • date: date of the article.
    • density: extractive density.
    • coverage: extractive coverage.
    • compression: compression ratio.
    • density_bin: low, medium, high.
    • coverage_bin: extractive, abstractive.
    • compression_bin: low, medium, high.

    This dataset can be downloaded upon requests. Unzip all the contents "train.jsonl, dev.jsonl, test.jsonl" to the tfds folder.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('newsroom', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  9. T

    criteo

    • tensorflow.org
    Updated Dec 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). criteo [Dataset]. https://www.tensorflow.org/datasets/catalog/criteo
    Explore at:
    Dataset updated
    Dec 22, 2022
    Description

    Criteo Uplift Modeling Dataset

    This dataset is released along with the paper: “A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP)

    This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.

    Data description

    This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising. it consists of 25M rows, each one representing a user with 11 features, a treatment indicator and 2 labels (visits and conversions).

    Fields

    Here is a detailed description of the fields (they are comma-separated in the file):

    • f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
    • treatment: treatment group (1 = treated, 0 = control)
    • conversion: whether a conversion occured for this user (binary, label)
    • visit: whether a visit occured for this user (binary, label)
    • exposure: treatment effect, whether the user has been effectively exposed (binary)

    Key figures

    • Format: CSV
    • Size: 459MB (compressed)
    • Rows: 25,309,483
    • Average Visit Rate: .04132
    • Average Conversion Rate: .00229
    • Treatment Ratio: .846

    Tasks

    The dataset was collected and prepared with uplift prediction in mind as the main task. Additionally we can foresee related usages such as but not limited to:

    • benchmark for causal inference
    • uplift modeling
    • interactions between features and treatment
    • heterogeneity of treatment
    • benchmark for observational causality methods

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('criteo', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  10. e

    OSNI Open Data 50m Digital Terrain Model CSV

    • data.europa.eu
    • data.wu.ac.at
    csv
    Updated Oct 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenDataNI (2021). OSNI Open Data 50m Digital Terrain Model CSV [Dataset]. https://data.europa.eu/data/datasets/osni-open-data-50m-digital-terrain-model-csv1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 11, 2021
    Dataset authored and provided by
    OpenDataNI
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    A Digital Terrain Model (DTM) is a digital file consisting of a grid of regularly spaced points of known height which, when used with other digital data such as maps or orthophotographs, can provide a 3D image of the land surface. 10m and 50m DTM’s are available. This is a large dataset and will take sometime to download. Please be patient. By download or use of this dataset you agree to abide by the Open Government Data Licence.

  11. a

    Animal Totals - Expense, Measured in US Dollars

    • impactmap-smudallas.hub.arcgis.com
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SMU (2024). Animal Totals - Expense, Measured in US Dollars [Dataset]. https://impactmap-smudallas.hub.arcgis.com/datasets/animal-totals-expense-measured-in-us-dollars
    Explore at:
    Dataset updated
    May 29, 2024
    Dataset authored and provided by
    SMU
    Area covered
    Description

    The Census of Agriculture, produced by the United States Department of Agriculture (USDA), provides a complete count of Texas' farms, ranches and the people who grow our food. The census is conducted every five years, most recently in 2022, and provides an in-depth look at the agricultural industry.The complete census includes over 260 separate commodities. This dataset is a subset of 23 commodities selected for publishingThis layer was produced from data obtained from the USDA National Agriculture Statistics Service (NASS) Large Datasets download page. The data were transformed and prepared for publishing using the Pivot Table geoprocessing tool in ArcGIS Pro and joined to county boundaries. The county boundaries are 2022 vintage and come from Living Atlas ACS 2022 feature layers.AttributesNote that some values are suppressed as "Withheld to avoid disclosing data for individual operations", "Not applicable", or "Less than half the rounding unit". These have been coded in the data as -999, -888, and -777 respectively.AlmondsAnimal TotalsBarleyCattleChickensCornCottonCrop TotalsGovt ProgramsGrainGrapesHayHogsLaborMachinery TotalsRiceSorghumSoybeanTractorsTrucksTurkeysWheatWinter Wheat

  12. h

    wikipedia-small-3000-embedded

    • huggingface.co
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hafedh Hichri (2024). wikipedia-small-3000-embedded [Dataset]. https://huggingface.co/datasets/not-lain/wikipedia-small-3000-embedded
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2024
    Authors
    Hafedh Hichri
    License

    https://choosealicense.com/licenses/gfdl/https://choosealicense.com/licenses/gfdl/

    Description

    this is a subset of the wikimedia/wikipedia dataset code for creating this dataset : from datasets import load_dataset, Dataset from sentence_transformers import SentenceTransformer model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")

    load dataset in streaming mode (no download and it's fast)

    dataset = load_dataset( "wikimedia/wikipedia", "20231101.en", split="train", streaming=True )

    select 3000 samples

    from tqdm importtqdm data = Dataset.from_dict({}) for i, entry in… See the full description on the dataset page: https://huggingface.co/datasets/not-lain/wikipedia-small-3000-embedded.

  13. z

    Large File Download Application - Dataset - data.govt.nz - discover and use...

    • portal.zero.govt.nz
    Updated Nov 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Large File Download Application - Dataset - data.govt.nz - discover and use data [Dataset]. https://portal.zero.govt.nz/77d6ef04507c10508fcfc67a7c24be32/dataset/large-file-download-application5
    Explore at:
    Dataset updated
    Nov 5, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Taupō District Council large file download application. Source lidar, contour and imagery files are available for download. Flood Hazard data relating to Plan Change 34 of the Taupō District Plan is also available for download. Taupō District Council does not make any representation or give any warranty as to the accuracy or exhaustiveness of the data provided for download via this application. The data provided is indicative only and does not purport to be a complete database of all information in Taupō District Council's possession or control. Taupō District Council shall not be liable for any loss, damage, cost or expense (whether direct or indirect) arising from reliance upon or use of any data provided, or Council's failure to provide this data.

  14. Z

    Data from: Caravan - A global community dataset for large-sample hydrology

    • data.niaid.nih.gov
    • biorxiv.org
    • +2more
    Updated Jan 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shalev, Guy (2025). Caravan - A global community dataset for large-sample hydrology [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6522634
    Explore at:
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    Erickson, Tyler
    Addor, Nans
    Kratzert, Frederik
    Shalev, Guy
    Nearing, Grey
    Matias, Yossi
    Gilon, Oren
    Gudmundsson, Lukas
    Klotz, Daniel
    Nevo, Sella
    Gauch, Martin
    Hassidim, Avinatan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the accompanying dataset to the following paper https://www.nature.com/articles/s41597-023-01975-w

    Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.

    If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.

    All current development and additional community extensions can be found at https://github.com/kratzert/Caravan

    Channel Log:

    23 May 2022: Version 0.2 - Resolved a bug when renaming the LamaH gauge ids from the LamaH ids to the official gauge ids provided as "govnr" in the LamaH dataset attribute files.

    24 May 2022: Version 0.3 - Fixed gaps in forcing data in some "camels" (US) basins.

    15 June 2022: Version 0.4 - Fixed replacing negative CAMELS US values with NaN (-999 in CAMELS indicates missing observation).

    1 December 2022: Version 0.4 - Added 4298 basins in the US, Canada and Mexico (part of HYSETS), now totalling to 6830 basins. Fixed a bug in the computation of catchment attributes that are defined as pour point properties, where sometimes the wrong HydroATLAS polygon was picked. Restructured the attribute files and added some more meta data (station name and country).

    16 January 2023: Version 1.0 - Version of the official paper release. No changes in the data but added a static copy of the accompanying code of the paper. For the most up to date version, please check https://github.com/kratzert/Caravan

    10 May 2023: Version 1.1 - No data change, just update data description.

    17 May 2023: Version 1.2 - Updated a handful of attribute values that were affected by a bug in their derivation. See https://github.com/kratzert/Caravan/issues/22 for details.

    16 April 2024: Version 1.4 - Added 9130 gauges from the original source dataset that were initially not included because of the area thresholds (i.e. basins smaller than 100sqkm or larger than 2000sqkm). Also extended the forcing period for all gauges (including the original ones) to 1950-2023. Added two different download options that include timeseries data only as either csv files (Caravan-csv.tar.xz) or netcdf files (Caravan-nc.tar.xz). Including the large basins also required an update in the earth engine code

    16 Jan 2025: Version 1.5 - Added FAO Penman-Monteith PET (potential_evaporation_sum_FAO_PENMAN_MONTEITH) and renamed the ERA5-LAND potential_evaporation band to potential_evaporation_sum_ERA5_LAND. Also added all PET-related climated indices derived with the Penman-Monteith PET band (suffix "_FAO_PM") and renamed the old PET-related indices accordingly (suffix "_ERA5_LAND").

  15. a

    OSNI Open Data - 10M DTM - Sheets 201-250

    • hub.arcgis.com
    • data.europa.eu
    • +2more
    Updated Jun 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SpatialNI (2020). OSNI Open Data - 10M DTM - Sheets 201-250 [Dataset]. https://hub.arcgis.com/documents/fabaca5127ae4e92abb48eac13421c41
    Explore at:
    Dataset updated
    Jun 3, 2020
    Dataset authored and provided by
    SpatialNI
    Description

    A Digital Terrain Model (DTM) is a digital file consisting of a grid of regularly spaced points of known height which, when used with other digital data such as maps or orthophotographs, can provide a 3D image of the land surface. This download contains OSNI 10k sheet numbers 201-250. This is a large dataset and will take sometime to download. Please be patient. This service is published for OpenData. By download or use of this dataset you agree to abide by the LPS Open Government Data Licence.Please Note for Open Data NI Users: Esri Rest API is not Broken, it will not open on its own in a Web Browser but can be copied and used in Desktop and Webmaps

  16. P

    EdNet Dataset

    • paperswithcode.com
    Updated Apr 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youngduck Choi; Youngnam Lee; Dongmin Shin; Junghyun Cho; Seoyon Park; Seewoo Lee; Jineon Baek; Chan Bae; Byung-soo Kim; Jaewe Heo (2023). EdNet Dataset [Dataset]. https://paperswithcode.com/dataset/ednet
    Explore at:
    Dataset updated
    Apr 4, 2023
    Authors
    Youngduck Choi; Youngnam Lee; Dongmin Shin; Junghyun Cho; Seoyon Park; Seewoo Lee; Jineon Baek; Chan Bae; Byung-soo Kim; Jaewe Heo
    Description

    A large-scale hierarchical dataset of diverse student activities collected by Santa, a multi-platform self-study solution equipped with artificial intelligence tutoring system. EdNet contains 131,441,538 interactions from 784,309 students collected over more than 2 years, which is the largest among the ITS datasets released to the public so far.

  17. N

    Big Rock, IL Population Breakdown By Race (Excluding Ethnicity) Dataset:...

    • neilsberg.com
    csv, json
    Updated Jul 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Big Rock, IL Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/2dad4987-230c-11ef-bd92-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jul 7, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Big Rock, Illinois
    Variables measured
    Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Big Rock by race. It includes the population of Big Rock across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Big Rock across relevant racial categories.

    Key observations

    The percent distribution of Big Rock population by race (across all racial categories recognized by the U.S. Census Bureau): 93.04% are white, 0.16% are American Indian and Alaska Native, 1.80% are Asian, 0.25% are some other race and 4.75% are multiracial.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race: This column displays the racial categories (excluding ethnicity) for the Big Rock
    • Population: The population of the racial category (excluding ethnicity) in the Big Rock is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each race as a proportion of Big Rock total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Big Rock Population by Race & Ethnicity. You can refer the same here

  18. c

    SAROS - A large, heterogeneous, and sparsely annotated segmentation dataset...

    • cancerimagingarchive.net
    csv, n/a +1
    Updated Oct 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2023). SAROS - A large, heterogeneous, and sparsely annotated segmentation dataset on CT imaging data [Dataset]. http://doi.org/10.25737/SZ96-ZG60
    Explore at:
    csv, n/a, nifti and zipAvailable download formats
    Dataset updated
    Oct 29, 2023
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Mar 7, 2024
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description
    Sparsely Annotated Region and Organ Segmentation (SAROS) contributes a large heterogeneous semantic segmentation annotation dataset for existing CT imaging cases on TCIA. The goal of this dataset is to provide high-quality annotations for building body composition analysis tools (References: Koitka 2020 and Haubold 2023). Existing in-house segmentation models were employed to generate annotation candidates on randomly selected cases. All generated annotations were manually reviewed and corrected by medical residents and students on every fifth axial slice while other slices were set to an ignore label (numeric value 255). 900 CT series from 882 patients were randomly selected from the following TCIA collections (number of CTs per collection in parenthesis): ACRIN-FLT-Breast (32), ACRIN-HNSCC-FDG-PET/CT (48), ACRIN-NSCLC-FDG-PET (129), Anti-PD-1_Lung (12), Anti-PD-1_MELANOMA (2), C4KC-KiTS (175), COVID-19-NY-SBU (1), CPTAC-CM (1), CPTAC-LSCC (3), CPTAC-LUAD (1), CPTAC-PDA (8), CPTAC-UCEC (26), HNSCC (17), Head-Neck Cetuximab (12), LIDC-IDRI (133), Lung-PET-CT-Dx (17), NSCLC Radiogenomics (7), NSCLC-Radiomics (56), NSCLC-Radiomics-Genomics (20), Pancreas-CT (58), QIN-HEADNECK (94), Soft-tissue-Sarcoma (6), TCGA-HNSC (1), TCGA-LIHC (33), TCGA-LUAD (2), TCGA-LUSC (3), TCGA-STAD (2), TCGA-UCEC (1). A script to download and resample the images is provided in our GitHub repository: https://github.com/UMEssen/saros-dataset The annotations are provided in NIfTI format and were performed on 5mm slice thickness. The annotation files define foreground labels on the same axial slices and match pixel-perfect. In total, 13 semantic body regions and 6 body part labels were annotated with an index that corresponds to a numeric value in the segmentation file.

    Body Regions

    1. Subcutaneous Tissue
    2. Muscle
    3. Abdominal Cavity
    4. Thoracic Cavity
    5. Bones
    6. Parotid Glands
    7. Pericardium
    8. Breast Implant
    9. Mediastinum
    10. Brain
    11. Spinal Cord
    12. Thyroid Glands
    13. Submandibular Glands

    Body Parts

    1. Torso
    2. Head
    3. Right Leg
    4. Left Leg
    5. Right Arm
    6. Left Arm
    The labels which were modified or require further commentary are listed and explained below:
    • Subcutaneous Adipose Tissue: The cutis was included into this label due to its limited differentiation in 5mm-CT.
    • Muscle: All muscular tissue was segmented contiguously and not separated into single muscles. Thus, fascias and intermuscular fat were included into the label. Inter- and intramuscular fat is subtracted automatically in the process.
    • Abdominal Cavity: This label includes the pelvis. The label does not separate between the positional relationships of the peritoneum.
    • Mediastinum: The International Thymic Malignancy Group (ITMIG) scheme was used for the segmentation guidelines.
    • Head + Neck: The neck is confined by the base of the trapezius muscle.
    • Right + Left Leg: The legs are separated from the torso by the line between the two lowest points of the Rami ossa pubis.
    • Right + Left Arm: The arms are separated from the torso by the diagonal between the most lateral point of the acromion and the tuberculum infraglenoidale.
    For reproducibility on downstream tasks, five cross-validation folds and a test set were pre-defined and are described in the provided spreadsheet. Segmentation was conducted strictly in accordance with anatomical guidelines and only modified if required for the gain of segmentation efficiency.

  19. e

    OSNI Open Data - 10M DTM - Sheets 1-50

    • data.europa.eu
    • hub.arcgis.com
    • +2more
    html, json
    Updated Jun 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenDataNI (2022). OSNI Open Data - 10M DTM - Sheets 1-50 [Dataset]. https://data.europa.eu/data/datasets/osni-open-data-10m-dtm-sheets-1-503?locale=et
    Explore at:
    html, jsonAvailable download formats
    Dataset updated
    Jun 30, 2022
    Dataset authored and provided by
    OpenDataNI
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    A Digital Terrain Model (DTM) is a digital file consisting of a grid of regularly spaced points of known height which, when used with other digital data such as maps or orthophotographs, can provide a 3D image of the land surface. This download contains OSNI 10k sheet numbers 1-50.


    This is a large dataset and will take sometime to download. Please be patient. This service is published for OpenData. By download or use of this dataset you agree to abide by the LPS Open Government Data Licence.

    Please Note for Open Data NI Users: Esri Rest API is not Broken, it will not open on its own in a Web Browser but can be copied and used in Desktop and Webmaps

  20. c

    Fox News dataset is for analyzing media trends and narratives

    • crawlfeeds.com
    csv, zip
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Fox News dataset is for analyzing media trends and narratives [Dataset]. https://crawlfeeds.com/datasets/fox-news-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    The Fox News Dataset is a comprehensive collection of over 1 million news articles, offering an unparalleled resource for analyzing media narratives, public discourse, and political trends. Covering articles up to the year 2023, this dataset is a treasure trove for researchers, analysts, and businesses interested in gaining deeper insights into the topics and trends covered by Fox News.

    Key Features of the Fox News Dataset

    • Extensive Coverage: Contains more than 1 million articles spanning various topics and events up to 2023.
    • Research-Ready: Perfect for text classification, natural language processing (NLP), and other research purposes.
    • Format: Provided in CSV format for seamless integration into analytical and research tools.

    Why Use This Dataset?

    This large dataset is ideal for:

    • Text Classification: Develop machine learning models to classify and categorize news content.
    • Natural Language Processing (NLP): Conduct sentiment analysis, keyword extraction, or topic modeling.
    • Media and Political Research: Analyze media narratives, public opinion, and political trends reflected in Fox News articles.
    • Trend Analysis: Identify shifts in public discourse and media focus over time.

    Explore More News Datasets

    Discover additional resources for your research needs by visiting our news dataset collection. These datasets are tailored to support diverse analytical applications, including sentiment analysis and trend modeling.

    The Fox News Dataset is a must-have for anyone interested in exploring large-scale media data and leveraging it for advanced analysis. Ready to dive into this wealth of information? Download the dataset now in CSV format and start uncovering the stories behind the headlines.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Eleni Triantafillou; Tyler Zhu; Vincent Dumoulin; Pascal Lamblin; Utku Evci; Kelvin Xu; Ross Goroshin; Carles Gelada; Kevin Swersky; Pierre-Antoine Manzagol; Hugo Larochelle, Meta-Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/meta-dataset

Meta-Dataset Dataset

Explore at:
Authors
Eleni Triantafillou; Tyler Zhu; Vincent Dumoulin; Pascal Lamblin; Utku Evci; Kelvin Xu; Ross Goroshin; Carles Gelada; Kevin Swersky; Pierre-Antoine Manzagol; Hugo Larochelle
Description

The Meta-Dataset benchmark is a large few-shot learning benchmark and consists of multiple datasets of different data distributions. It does not restrict few-shot tasks to have fixed ways and shots, thus representing a more realistic scenario. It consists of 10 datasets from diverse domains:

ILSVRC-2012 (the ImageNet dataset, consisting of natural images with 1000 categories) Omniglot (hand-written characters, 1623 classes) Aircraft (dataset of aircraft images, 100 classes) CUB-200-2011 (dataset of Birds, 200 classes) Describable Textures (different kinds of texture images with 43 categories) Quick Draw (black and white sketches of 345 different categories) Fungi (a large dataset of mushrooms with 1500 categories) VGG Flower (dataset of flower images with 102 categories), Traffic Signs (German traffic sign images with 43 classes) MSCOCO (images collected from Flickr, 80 classes).

All datasets except Traffic signs and MSCOCO have a training, validation and test split (proportioned roughly into 70%, 15%, 15%). The datasets Traffic Signs and MSCOCO are reserved for testing only.

Search
Clear search
Close search
Google apps
Main menu