100+ datasets found
  1. Excel dataset

    • kaggle.com
    zip
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pinky Verma (2023). Excel dataset [Dataset]. https://www.kaggle.com/datasets/pinkyverma0256/excel-dataset
    Explore at:
    zip(13123 bytes)Available download formats
    Dataset updated
    Jun 29, 2023
    Authors
    Pinky Verma
    Description

    Dataset

    This dataset was created by Pinky Verma

    Contents

  2. New 1000 Sales Records Data 2

    • kaggle.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
    Explore at:
    zip(49305 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    Calvin Oko Mensah
    Description

    This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.

  3. c

    Walmart Products Dataset – Free Product Data CSV

    • crawlfeeds.com
    csv, zip
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Walmart Products Dataset – Free Product Data CSV [Dataset]. https://crawlfeeds.com/datasets/walmart-products-free-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Dec 2, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    Looking for a free Walmart product dataset? The Walmart Products Free Dataset delivers a ready-to-use ecommerce product data CSV containing ~2,100 verified product records from Walmart.com. It includes vital details like product titles, prices, categories, brand info, availability, and descriptions — perfect for data analysis, price comparison, market research, or building machine-learning models.

    Key Features

    Complete Product Metadata: Each entry includes URL, title, brand, SKU, price, currency, description, availability, delivery method, average rating, total ratings, image links, unique ID, and timestamp.

    CSV Format, Ready to Use: Download instantly - no need for scraping, cleaning or formatting.

    Good for E-commerce Research & ML: Ideal for product cataloging, price tracking, demand forecasting, recommendation systems, or data-driven projects.

    Free & Easy Access: Priced at USD $0.0, making it a great starting point for developers, data analysts or students.

    Who Benefits?

    • Data analysts & researchers exploring e-commerce trends or product catalog data.
    • Developers & data scientists building price-comparison tools, recommendation engines or ML models.
    • E-commerce strategists/marketers need product metadata for competitive analysis or market research.
    • Students/hobbyists needing a free dataset for learning or demo projects.

    Why Use This Dataset Instead of Manual Scraping?

    • Time-saving: No need to write scrapers or deal with rate limits.
    • Clean, structured data: All records are verified and already formatted in CSV, saving hours of cleaning.
    • Risk-free: Avoid Terms-of-Service issues or IP blocks that come with manual scraping.
      Instant access: Free and immediately downloadable.
  4. Feature Engineering Data

    • kaggle.com
    zip
    Updated Jul 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mat Leonard (2019). Feature Engineering Data [Dataset]. https://www.kaggle.com/datasets/matleonard/feature-engineering-data
    Explore at:
    zip(205019058 bytes)Available download formats
    Dataset updated
    Jul 23, 2019
    Authors
    Mat Leonard
    Description

    This dataset is a sample from the TalkingData AdTracking competition. I kept all the positive examples (where is_attributed == 1), while discarding 99% of the negative samples. The sample has roughly 20% positive examples.

    For this competition, your objective was to predict whether a user will download an app after clicking a mobile app advertisement.

    File descriptions

    train_sample.csv - Sampled data

    Data fields

    Each row of the training data contains a click record, with the following features.

    • ip: ip address of click.
    • app: app id for marketing.
    • device: device type id of user mobile phone (e.g., iphone 6 plus, iphone 7, huawei mate 7, etc.)
    • os: os version id of user mobile phone
    • channel: channel id of mobile ad publisher
    • click_time: timestamp of click (UTC)
    • attributed_time: if user download the app for after clicking an ad, this is the time of the app download
    • is_attributed: the target that is to be predicted, indicating the app was downloaded

    Note that ip, app, device, os, and channel are encoded.

    I'm also including Parquet files with various features for use within the course.

  5. Lending club train

    • figshare.com
    txt
    Updated Jun 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepchecks Data (2022). Lending club train [Dataset]. http://doi.org/10.6084/m9.figshare.20015915.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 7, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Deepchecks Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
  6. N

    Gratis, OH Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Gratis, OH Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b235d8fd-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Gratis
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Gratis by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Gratis across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a slight majority of female population, with 50.0% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Gratis is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Gratis total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Gratis Population by Race & Ethnicity. You can refer the same here

  7. Mars rover dataset

    • kaggle.com
    zip
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav Kumar (2025). Mars rover dataset [Dataset]. https://www.kaggle.com/datasets/gauravkumar2525/mars-rover-dataset
    Explore at:
    zip(101820038 bytes)Available download formats
    Dataset updated
    Mar 1, 2025
    Authors
    Gaurav Kumar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    1. Description

    The description section is crucial for helping users understand the purpose, context, and potential applications of your dataset. It should include the following details:

    • Dataset Overview: Provide a clear summary of what the dataset contains. For example, "This dataset includes images captured by NASA’s Curiosity Rover on Mars, along with metadata such as the camera used, the Martian sol (day) when the photo was taken, and the corresponding Earth date."
    • Source of the Data: Explain where the data comes from. If it’s obtained via an API, mention the source (e.g., "The images and metadata are retrieved from NASA's Mars Rover Photos API."). If the dataset is curated from multiple sources, list them.
    • Purpose and Use Cases: Describe why this dataset was created and how it can be used. For example:
      • Machine Learning: Train models for image classification, object detection, and anomaly detection.
      • Scientific Research: Analyze Martian surface patterns, study terrain features, or examine rover camera performance.
      • Space Exploration: Understand Mars' environmental conditions and assist in future exploration planning.
    • Data Format and Organization: Briefly mention the format of the files (e.g., CSV file for metadata, ZIP file containing images) and how they are structured.
    • Licensing and Permissions: Specify if the dataset has any restrictions on usage. Since NASA data is typically public domain, state that users are free to use it for research and development.
    • Limitations or Considerations: Mention any potential challenges, such as missing data, limited coverage, or resolution constraints.

    2. File Information

    This section provides details about the files included in your dataset, helping users navigate and use them efficiently. Key points to include:

    • List of Files: Clearly mention all the files and their formats, such as:
      • mars_rover_dataset.csv (CSV file containing metadata of images)
      • mars_images.zip (Compressed folder containing all images)
    • Purpose of Each File:
      • CSV File: Contains structured data, including image IDs, timestamps, camera details, and URLs.
      • ZIP File: Stores actual Mars images, which can be extracted and used for ML training or visualization.
    • File Dependencies: Explain how files relate to each other. For example, "The img_src column in mars_rover_dataset.csv corresponds to the images stored in mars_images.zip. Users should extract the images before using the dataset for model training."
    • How to Access the Files: Provide instructions on downloading and extracting files. Example:
      bash unzip mars_images.zip
      This ensures that users can quickly set up the dataset in their working environment.

    3. Column Descriptions

    This section explains the meaning of each column in the dataset, ensuring users can analyze and interpret the data correctly. A well-structured table format is often useful:

    Column NameDescription
    idUnique identifier for each image.
    solMartian sol (day) when the image was captured.
    camera_nameAbbreviated name of the rover's camera (e.g., "FHAZ" for Front Hazard Camera).
    camera_full_nameFull descriptive name of the camera.
    img_srcURL link to the image. Users can download images using this link.
    earth_dateThe Earth date corresponding to the Martian sol.
    rover_nameName of the rover that captured the image (e.g., "Curiosity").
    rover_statusCurrent operational status of the rover (e.g., "Active" or "Complete").
    landing_dateDate when the rover landed on Mars.
    launch_dateDate when the rover was launched from Earth.

    Additional Details:

    • Data Types: Indicate whether a column contains numbers, text, or dates.
    • Data Format: Example: earth_date is in YYYY-MM-DD format.
    • Special Notes: If any column has missing values or requires preprocessing, mention it.

    This section helps users quickly understand the dataset's structure, making it easier for them to work with the data effectively.

  8. d

    Addresses (Open Data)

    • catalog.data.gov
    • data-academy.tempe.gov
    • +11more
    Updated Nov 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). Addresses (Open Data) [Dataset]. https://catalog.data.gov/dataset/addresses-open-data
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset provided by
    City of Tempe
    Description

    This dataset is a compilation of address point data for the City of Tempe. The dataset contains a point location, the official address (as defined by The Building Safety Division of Community Development) for all occupiable units and any other official addresses in the City. There are several additional attributes that may be populated for an address, but they may not be populated for every address. Contact: Lynn Flaaen-Hanna, Development Services Specialist Contact E-mail Link: Map that Lets You Explore and Export Address Data Data Source: The initial dataset was created by combining several datasets and then reviewing the information to remove duplicates and identify errors. This published dataset is the system of record for Tempe addresses going forward, with the address information being created and maintained by The Building Safety Division of Community Development.Data Source Type: ESRI ArcGIS Enterprise GeodatabasePreparation Method: N/APublish Frequency: WeeklyPublish Method: AutomaticData Dictionary

  9. SWAMP Data Dashboard

    • data.cnra.ca.gov
    • data.ca.gov
    • +2more
    csv, pdf
    Updated Nov 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California State Water Resources Control Board (2025). SWAMP Data Dashboard [Dataset]. https://data.cnra.ca.gov/dataset/swamp-data-dashboard
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Nov 17, 2025
    Dataset authored and provided by
    California State Water Resources Control Board
    Description

    This dataset supports the SWAMP Data Dashboard, a public-facing tool developed by the Surface Water Ambient Monitoring Program (SWAMP) to provide accessible, user-friendly access to water quality monitoring data across California. The dashboard and its associated datasets are designed to help the public, researchers, and decision-makers explore and download monitoring data collected from California’s surface waters.

    This dataset includes five distinct resources:

    • SWAMP Stations – Geospatial and descriptive information about SWAMP monitoring sites.
    • Water Quality Results – Field and lab analysis results for chemical and physical parameters measured in water samples.
    • Toxicity Summary Results – Summarized results from aquatic toxicity tests. Summary records are entries in the database that summarize the results from multiple replicate toxicity tests of the same sample water.
    • Habitat Results – Data on physical habitat conditions typically collected alongside biological monitoring to provide context for interpreting water quality conditions. Includes scores for the California Stream Condition Index (CSCI) and Algal Stream Condition Index (ASCI).
    • Tissue Summary Results – Annual summary statistics of contaminant concentrations in aquatic organism tissue samples. The data are derived from raw individual and composite tissue sample results.

    These data are collected by SWAMP and its partners to support water quality assessments, identify trends, and inform water resource management. The SWAMP Data Dashboard provides interactive visualizations and filtering tools to explore this data by region, parameter, and more.

    The SWAMP dataset is sourced from the California Environmental Data Exchange Network (CEDEN), which serves as the central repository for water quality data collected by various monitoring programs throughout the state. As such, there is some overlap between this dataset and the broader CEDEN datasets also published on the California Open Data Portal (see Related Resources). This SWAMP dataset represents a curated subset of CEDEN data, specifically tailored for use in the SWAMP Data Dashboard.

    Access the SWAMP Data Dashboard: https://gispublic.waterboards.ca.gov/swamp-data/

    *This dataset is provisional and subject to revision. It should not be used for regulatory purposes.

  10. The Home Depot products dataset

    • kaggle.com
    zip
    Updated Dec 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2021). The Home Depot products dataset [Dataset]. https://www.kaggle.com/datasets/crawlfeeds/the-home-depot-products-dataset
    Explore at:
    zip(1979687 bytes)Available download formats
    Dataset updated
    Dec 13, 2021
    Authors
    Crawl Feeds
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Home Depot retail products dataset

    Content

    Sample home depot dataset included more than 3500+ records Total Fields: 13 Format: CSV Fields: url, title, images, description, product_id, sku, gtin13, brand, price, currency, availability, uniq_id, scraped_at

    Acknowledgements

    Crawl Feeds team extracted data from the home depot. Download complete dataset with more than 1 million+ products in csv format

    Inspiration

    The Home depot dataset useful for research and analysis purposes

  11. b

    SMILE trial public dataset - Datasets - data.bris

    • data.bris.ac.uk
    Updated Apr 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). SMILE trial public dataset - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/2c1pfur00h0p52c7s8cnpg31hb
    Explore at:
    Dataset updated
    Apr 18, 2019
    Description

    This data set contains a number of variables from collected on children and their parents who took part in the SMILE trial at assessment and follow up. It does not include data on age and gender as we want to be certain that no child or parent can be identified through the data. Researchers can apply to access a fuller data set (https://data.bris.ac.uk/data/dataset/1myzti8qnv48g2sxtx6h5nice7) containing age and gender through application to the University of Bristol's Data Access Committee, please refer to the data access request form (http://bit.ly/data-bris-request) for details on how to apply for access. Complete download (zip, 1.5 MiB)

  12. R

    Data from: Customized Dataset

    • universe.roboflow.com
    zip
    Updated May 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasets (2025). Customized Dataset [Dataset]. https://universe.roboflow.com/datasets-4hpre/customized-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 12, 2025
    Dataset authored and provided by
    Datasets
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects Bounding Boxes
    Description

    Customized Dataset

    ## Overview
    
    Customized Dataset is a dataset for object detection tasks - it contains Objects annotations for 779 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  13. g

    Data from: Animal Faces Dataset

    • gts.ai
    zip
    Updated Dec 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2023). Animal Faces Dataset [Dataset]. https://gts.ai/dataset-download/animal-faces-dataset-ai-data-collection/
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 2, 2023
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Animal Faces Dataset is a collection of animal face images across multiple species, designed for AI, machine learning, and computer vision applications such as wildlife monitoring and image recognition.

  14. e

    INSPIRE Download Service (predefined ATOM) for dataset Auf der Buech...

    • data.europa.eu
    atom feed
    Updated Oct 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LVermGeo im Auftrag von Bremm (2024). INSPIRE Download Service (predefined ATOM) for dataset Auf der Buech (Klaerwerk) [Dataset]. https://data.europa.eu/data/datasets/e7dd9d42-8911-0002-9403-c19bac8e9120
    Explore at:
    atom feedAvailable download formats
    Dataset updated
    Oct 16, 2024
    Dataset authored and provided by
    LVermGeo im Auftrag von Bremm
    Description

    Description of the INSPIRE Download Service (predefined Atom): Local municipality Bremm Development plan Auf der Buech (Klaerwerk) - The link(s) for downloading the data sets is/are dynamically generated from Get Map calls to a WMS interface

  15. Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  16. H

    Dataset metadata of known Dataverse installations, August 2025

    • dataverse.harvard.edu
    Updated Sep 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Gautier (2025). Dataset metadata of known Dataverse installations, August 2025 [Dataset]. http://doi.org/10.7910/DVN/RMAGSH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 29, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Julian Gautier
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the metadata of the datasets published in 118 Dataverse installations, information about the metadata blocks of 118 installations, and the lists of pre-defined licenses or dataset terms that depositors can apply to datasets in the 100 installations that were running versions of the Dataverse software that include the "multiple-license" feature. The data is useful for improving understandings about how certain Dataverse features and metadata fields are used and for learning about the quality of dataset and file-level metadata within and across Dataverse installations. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation between August 25 and September 2, 2025 using a Python script that uses the Dataverse API. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation)_2025.08.25-2025.09.02.csv │ ├── contributor(citation)_2025.08.25-2025.09.02.csv │ ├── data_source(citation)_2025.08.25-2025.09.02.csv │ ├── ... │ └── topic_classification(citation)_2025.08.25-2025.09.02.csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2025.08.26_07.14.00.zip │ ├── dataset_pids_Abacus_2025.08.26_07.14.00.csv │ ├── Dataverse_JSON_metadata_2025.08.26_07.14.00 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0(latest_version).json │ ├── ... │ ├── metadatablocks_v5.9 │ ├── astrophysics_v5.9.json │ ├── biomedical_v5.9.json │ ├── citation_v5.9.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2025.08.25_15.45.25.zip │ ├── ... │ └── Yale_Dataverse_2025.08.25_11.51.29.zip └── dataverse_installations_summary_2025.09.02.csv └── dataset_pids_from_most_known_dataverse_installations_2025.08.25-2025.09.02.csv └── license_options_for_each_dataverse_installation_2025.08.29_14.58.36.csv └── metadatablocks_from_most_known_dataverse_installations_2025.08.29.csv This dataset contains two directories and four CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 20 CSV files that list the values of many of the metadata fields in the "Citation" metadata block and "Geospatial" metadata block of datasets in the 118 Dataverse installations. For example, author(citation)_2025.08.25-2025.09.02.csv contains the "Author" metadata for the latest versions of all published, non-deaccessioned datasets in 118 installations, with a column for each of the four child fields: author name, affiliation, identifier type, and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 118 zip files, one zip file for each of the 118 Dataverse installations whose sites were functioning when I attempted to collect their metadata and that have at least one published dataset. Each zip file contains: A CSV file listing information about the datasets published in the installation, including a column to indicate if the Python script was able to download the Dataverse JSON metadata for each dataset. A directory with JSON files that have information about the installation's metadata fields, such as the field names and how they're organized. A directory of JSON files that contain the metadata of the installation's published, non-deaccessioned dataset versions in the Dataverse JSON metadata schema. The dataverse_installations_summary_2025.09.02.csv file contains information about each installation, including its name, URL, Dataverse software version, and counts of dataset metadata included and not included in this dataset. The dataset_pids_from_most_known_dataverse_installations_2025.08.25-2025.09.02.csv file contains the dataset PIDs of published datasets in 118 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all "dataset_pids_....csv" files in each of the 118 zip files in the dataverse_json_metadata_from_each_known_dataverse_installation directory. The license_options_for_each_dataverse_installation_2025.08.29_14.58.36.csv file contains information about the licenses and data use agreements that some installations let depositors choose when creating datasets. When I collected this data, 100 of the available 118 installations were running versions of the Dataverse software that allow depositors to choose a "predefined license or data use agreement" from a dropdown menu in the dataset deposit form. For more information about this Dataverse feature, see https://guides.dataverse.org/en/6.7/user/dataset-management.html#choosing-a-license. The metadatablocks_from_most_known_dataverse_installations_2025.08.29.csv file contains the metadata block names, field names, child field names (if the field is a compound field), display names, descriptions/tooltip text, watermarks, and controlled vocabulary values of fields in the 118 Dataverse installations' metadata blocks. This file is useful for learning...

  17. N

    Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...

    • neilsberg.com
    csv, json
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aa8c95e0-4983-11ef-ae5d-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Excel, Alabama
    Variables measured
    Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.

    Key observations

    The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group in consideration
    • Population: The population for the specific age group in the Excel is shown in this column.
    • % of Total Population: This column displays the population of each age group as a proportion of Excel total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here

  18. h

    Africa-Rural-Population-Dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Electric Sheep, Africa-Rural-Population-Dataset [Dataset]. https://huggingface.co/datasets/electricsheepafrica/Africa-Rural-Population-Dataset
    Explore at:
    Dataset authored and provided by
    Electric Sheep
    License

    https://choosealicense.com/licenses/gpl/https://choosealicense.com/licenses/gpl/

    Description

    Africa Rural Population Dataset

      Dataset Summary
    

    This dataset provides annual rural population counts for 54 African countries from 1960 to 2024.The data originates from the World Bank Development Indicators (indicator code SP.RUR.TOTL) and has been cleaned and re-formatted for machine-learning workflows.

      Source & Collection
    

    Original source: World Bank Open Data – Rural population (SP.RUR.TOTL)Data accessed via Excel download and processed on 2025-08-07.… See the full description on the dataset page: https://huggingface.co/datasets/electricsheepafrica/Africa-Rural-Population-Dataset.

  19. T

    imdb_reviews

    • tensorflow.org
    • kaggle.com
    Updated Sep 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews
    Explore at:
    Dataset updated
    Sep 20, 2024
    Description

    Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('imdb_reviews', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  20. N

    Wadsworth, OH Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Wadsworth, OH Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b25a151b-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ohio, Wadsworth
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Wadsworth by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Wadsworth across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a slight majority of female population, with 52.19% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Wadsworth is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Wadsworth total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Wadsworth Population by Race & Ethnicity. You can refer the same here

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Pinky Verma (2023). Excel dataset [Dataset]. https://www.kaggle.com/datasets/pinkyverma0256/excel-dataset
Organization logo

Excel dataset

Explore at:
zip(13123 bytes)Available download formats
Dataset updated
Jun 29, 2023
Authors
Pinky Verma
Description

Dataset

This dataset was created by Pinky Verma

Contents

Search
Clear search
Close search
Google apps
Main menu