100+ datasets found
  1. A Journey through Data Cleaning

    • kaggle.com
    zip
    Updated Mar 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kenanyafi (2024). A Journey through Data Cleaning [Dataset]. https://www.kaggle.com/datasets/kenanyafi/a-journey-through-data-cleaning
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 22, 2024
    Authors
    kenanyafi
    Description

    Embark on a transformative journey with our Data Cleaning Project, where we meticulously refine and polish raw data into valuable insights. Our project focuses on streamlining data sets, removing inconsistencies, and ensuring accuracy to unlock its full potential.

    Through advanced techniques and rigorous processes, we standardize formats, address missing values, and eliminate duplicates, creating a clean and reliable foundation for analysis. By enhancing data quality, we empower organizations to make informed decisions, drive innovation, and achieve strategic objectives with confidence.

    Join us as we embark on this essential phase of data preparation, paving the way for more accurate and actionable insights that fuel success."

  2. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  3. food data cleaning

    • kaggle.com
    zip
    Updated Apr 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AbdElRahman16 (2024). food data cleaning [Dataset]. https://www.kaggle.com/datasets/abdelrahman16/food-n
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 13, 2024
    Authors
    AbdElRahman16
    Description

    Dataset

    This dataset was created by AbdElRahman16

    Contents

  4. q

    Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio

    • qubeshub.org
    Updated Jul 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shelly Gaynor (2020). Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio [Dataset]. http://doi.org/10.25334/DRGD-F069
    Explore at:
    Dataset updated
    Jul 16, 2020
    Dataset provided by
    QUBES
    Authors
    Shelly Gaynor
    Description

    Access and clean an open source herbarium dataset using Excel or RStudio.

  5. Excel-project: Glassdoor Data Cleaning

    • kaggle.com
    zip
    Updated Sep 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Lira (2023). Excel-project: Glassdoor Data Cleaning [Dataset]. https://www.kaggle.com/datasets/luisliraportfolio/excel-project-clean-dataset/suggestions?status=pending&yourSuggestions=true
    Explore at:
    zip(12085049 bytes)Available download formats
    Dataset updated
    Sep 26, 2023
    Authors
    Luis Lira
    Description

    Dataset

    This dataset was created by Luis Lira

    Contents

  6. d

    Coresignal | Clean Data | Company Data | AI-Enriched Datasets | Global /...

    • datarade.ai
    .json, .csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coresignal, Coresignal | Clean Data | Company Data | AI-Enriched Datasets | Global / 35M+ Records / Updated Weekly [Dataset]. https://datarade.ai/data-products/coresignal-clean-data-company-data-ai-enriched-datasets-coresignal
    Explore at:
    .json, .csvAvailable download formats
    Dataset authored and provided by
    Coresignal
    Area covered
    Hungary, Guinea-Bissau, Guatemala, Chile, Niue, Panama, Saint Barthélemy, Namibia, Guadeloupe, Andorra
    Description

    This clean dataset is a refined version of our company datasets, consisting of 35M+ data records.

    It’s an excellent data solution for companies with limited data engineering capabilities and those who want to reduce their time to value. You get filtered, cleaned, unified, and standardized B2B data. After cleaning, this data is also enriched by leveraging a carefully instructed large language model (LLM).

    AI-powered data enrichment offers more accurate information in key data fields, such as company descriptions. It also produces over 20 additional data points that are very valuable to B2B businesses. Enhancing and highlighting the most important information in web data contributes to quicker time to value, making data processing much faster and easier.

    For your convenience, you can choose from multiple data formats (Parquet, JSON, JSONL, or CSV) and select suitable delivery frequency (quarterly, monthly, or weekly).

    Coresignal is a leading public business data provider in the web data sphere with an extensive focus on firmographic data and public employee profiles. More than 3B data records in different categories enable companies to build data-driven products and generate actionable insights. Coresignal is exceptional in terms of data freshness, with 890M+ records updated monthly for unprecedented accuracy and relevance.

  7. Clean Workplace Dataset

    • universe.roboflow.com
    zip
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow (2023). Clean Workplace Dataset [Dataset]. https://universe.roboflow.com/roboflow-myov4/clean-workplace
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2023
    Dataset authored and provided by
    Roboflow
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Workplace Bounding Boxes
    Description

    Clean Workplace

    ## Overview
    
    Clean Workplace is a dataset for object detection tasks - it contains Workplace annotations for 898 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  8. h

    alpaca-cleaned

    • huggingface.co
    Updated Mar 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    alpaca-cleaned [Dataset]. https://huggingface.co/datasets/yahma/alpaca-cleaned
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 30, 2023
    Authors
    Gene Ruebsamen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Alpaca-Cleaned

    Repository: https://github.com/gururise/AlpacaDataCleaned

      Dataset Description
    

    This is a cleaned version of the original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset:

    Hallucinations: Many instructions in the original dataset had instructions referencing data on the internet, which just caused GPT3 to hallucinate an answer.

    "instruction":"Summarize the… See the full description on the dataset page: https://huggingface.co/datasets/yahma/alpaca-cleaned.

  9. Cleaning Code and Preprocessing

    • kaggle.com
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MohameddAteff (2024). Cleaning Code and Preprocessing [Dataset]. https://www.kaggle.com/datasets/mohameddateff/cleaning-code-and-preprocessing
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    MohameddAteff
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by MohameddAteff

    Released under MIT

    Contents

  10. clean-dirty-unaugmented

    • kaggle.com
    zip
    Updated Feb 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amit Chauhan (2024). clean-dirty-unaugmented [Dataset]. https://www.kaggle.com/datasets/amitchauhan6339/clean-dirty-unaugmented/data
    Explore at:
    zip(458939321 bytes)Available download formats
    Dataset updated
    Feb 26, 2024
    Authors
    Amit Chauhan
    Description

    Dataset

    This dataset was created by Amit Chauhan

    Contents

  11. Retail Data to Clean

    • kaggle.com
    zip
    Updated Jan 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JamilShelton (2024). Retail Data to Clean [Dataset]. https://www.kaggle.com/datasets/jamilshelton/retail-data-to-clean/suggestions
    Explore at:
    zip(8718 bytes)Available download formats
    Dataset updated
    Jan 29, 2024
    Authors
    JamilShelton
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by JamilShelton

    Released under Apache 2.0

    Contents

  12. R

    Clean Dataset

    • universe.roboflow.com
    zip
    Updated Jun 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cleanroom (2024). Clean Dataset [Dataset]. https://universe.roboflow.com/cleanroom-at4av/clean-g93xl/dataset/13
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 19, 2024
    Dataset authored and provided by
    cleanroom
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Ok Bounding Boxes
    Description

    Clean

    ## Overview
    
    Clean is a dataset for object detection tasks - it contains Ok annotations for 290 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
    
  13. Boulder Workshop Clean Datasets [Walker]

    • data.ucar.edu
    excel
    Updated Dec 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marilyn Walker (2024). Boulder Workshop Clean Datasets [Walker] [Dataset]. http://doi.org/10.5065/D6B856C3
    Explore at:
    excelAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    University Corporation for Atmospheric Research
    Authors
    Marilyn Walker
    Time period covered
    Feb 15, 2001 - Feb 19, 2001
    Area covered
    Boulder,
    Description

    This dataset contains all of the clean datasets that were submitted by March 15 from the Boulder Workshop, 2001. Note: upon ordering this data, all of the data files will be included.

  14. d

    NYC Clean Heat Dataset (Historical)

    • catalog.data.gov
    • data.cityofnewyork.us
    Updated Jan 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2024). NYC Clean Heat Dataset (Historical) [Dataset]. https://catalog.data.gov/dataset/nyc-clean-heat-dataset-historical
    Explore at:
    Dataset updated
    Jan 26, 2024
    Dataset provided by
    data.cityofnewyork.us
    Area covered
    New York
    Description

    NYC Clean Heat dataset

  15. Clean Transportation Program

    • catalog.data.gov
    • data.ca.gov
    • +5more
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Energy Commission (2024). Clean Transportation Program [Dataset]. https://catalog.data.gov/dataset/clean-transportation-program
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    California Energy Commissionhttp://www.energy.ca.gov/
    Description

    Clean Transportation Program Data 2022. The Clean Transportation Program (also known as Alternative and Renewable Fuel and Vehicle Technology Program) invests up to $100 million annually in a broad portfolio of transportation and fuel transportation projects throughout the state. The Energy Commission leverages public and private investments to support adoption of cleaner transportation powered by alternative and renewable fuels. The program plays an important role in achieving California’s ambitious goals on climate change, petroleum reduction, and adoption of zero-emission vehicles, as well as efforts to reach air quality standards. The program also supports the state’s sustainable, long-term economic development.Data within this application was last updated August 2024.For more information on the Clean Transportation Program, visit:https://www.energy.ca.gov/programs-and-topics/programs/clean-transportation-program

  16. Clean data set - tech layoffs

    • kaggle.com
    zip
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditi Dash (2024). Clean data set - tech layoffs [Dataset]. https://www.kaggle.com/datasets/aditidash30/clean-data-set-tech-layoffs
    Explore at:
    zip(71836 bytes)Available download formats
    Dataset updated
    Jun 6, 2024
    Authors
    Aditi Dash
    Description

    Dataset

    This dataset was created by Aditi Dash

    Contents

  17. Clean Energy Technology Dataset | S&P Global Marketplace

    • marketplace.spglobal.com
    Updated Aug 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S&P Global (2024). Clean Energy Technology Dataset | S&P Global Marketplace [Dataset]. https://www.marketplace.spglobal.com/en/datasets/clean-energy-technology-(1713558647)
    Explore at:
    Dataset updated
    Aug 11, 2024
    Dataset authored and provided by
    S&P Globalhttp://www.spglobal.com/
    Description

    The Clean Energy Technology dataset provides crucial insights to decision-makers and business developers, enabling them to strategize their future activities and investments in emerging energy technologies at the forefront of the energy transition.

  18. R

    Clean Lens Dataset

    • universe.roboflow.com
    zip
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ai service (2025). Clean Lens Dataset [Dataset]. https://universe.roboflow.com/ai-service-ggosd/clean-lens
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset authored and provided by
    ai service
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Motions Bounding Boxes
    Description

    Clean Lens

    ## Overview
    
    Clean Lens is a dataset for object detection tasks - it contains Motions annotations for 464 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  19. EC-LEDS Clean Energy Program Georgia Dataset

    • catalog.data.gov
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). EC-LEDS Clean Energy Program Georgia Dataset [Dataset]. https://catalog.data.gov/dataset/ec-leds-clean-energy-program-georgia-dataset-a3adb
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttps://usaid.gov/
    Description

    The purpose of this strategy is to ensure a low emissions transition and sustainable path for Georgia’s economic and social development, through: the identification of main sources/sectors of emissions and their trends in development process, assessing and removing barriers to low emission development, defining goals/policies/measures within each sector in the context of sustainable development of the country, establishment of relevant legislation system, infrastructure and coordination process for implementation, and monitoring of results and mobilizing the national and international financial sources for implementation of LEDS.

  20. h

    alpaca-cleaned-52k-th

    • huggingface.co
    Updated May 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thaweewat (2023). alpaca-cleaned-52k-th [Dataset]. https://huggingface.co/datasets/Thaweewat/alpaca-cleaned-52k-th
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 12, 2023
    Authors
    Thaweewat
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Summary

    This is a Thai 🇹🇭-instructed dataset translated from cleaned version of the original Alpaca Dataset released by Stanford using Google Cloud Translation, contain 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better. The following issues have been identified in the original release and fixed in this… See the full description on the dataset page: https://huggingface.co/datasets/Thaweewat/alpaca-cleaned-52k-th.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
kenanyafi (2024). A Journey through Data Cleaning [Dataset]. https://www.kaggle.com/datasets/kenanyafi/a-journey-through-data-cleaning
Organization logo

A Journey through Data Cleaning

Streamlining Data for Enhanced Analysis and Decision-Making

Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 22, 2024
Authors
kenanyafi
Description

Embark on a transformative journey with our Data Cleaning Project, where we meticulously refine and polish raw data into valuable insights. Our project focuses on streamlining data sets, removing inconsistencies, and ensuring accuracy to unlock its full potential.

Through advanced techniques and rigorous processes, we standardize formats, address missing values, and eliminate duplicates, creating a clean and reliable foundation for analysis. By enhancing data quality, we empower organizations to make informed decisions, drive innovation, and achieve strategic objectives with confidence.

Join us as we embark on this essential phase of data preparation, paving the way for more accurate and actionable insights that fuel success."

Search
Clear search
Close search
Google apps
Main menu