Embark on a transformative journey with our Data Cleaning Project, where we meticulously refine and polish raw data into valuable insights. Our project focuses on streamlining data sets, removing inconsistencies, and ensuring accuracy to unlock its full potential.
Through advanced techniques and rigorous processes, we standardize formats, address missing values, and eliminate duplicates, creating a clean and reliable foundation for analysis. By enhancing data quality, we empower organizations to make informed decisions, drive innovation, and achieve strategic objectives with confidence.
Join us as we embark on this essential phase of data preparation, paving the way for more accurate and actionable insights that fuel success."
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
This dataset was created by AbdElRahman16
Access and clean an open source herbarium dataset using Excel or RStudio.
This dataset was created by Luis Lira
This clean dataset is a refined version of our company datasets, consisting of 35M+ data records.
It’s an excellent data solution for companies with limited data engineering capabilities and those who want to reduce their time to value. You get filtered, cleaned, unified, and standardized B2B data. After cleaning, this data is also enriched by leveraging a carefully instructed large language model (LLM).
AI-powered data enrichment offers more accurate information in key data fields, such as company descriptions. It also produces over 20 additional data points that are very valuable to B2B businesses. Enhancing and highlighting the most important information in web data contributes to quicker time to value, making data processing much faster and easier.
For your convenience, you can choose from multiple data formats (Parquet, JSON, JSONL, or CSV) and select suitable delivery frequency (quarterly, monthly, or weekly).
Coresignal is a leading public business data provider in the web data sphere with an extensive focus on firmographic data and public employee profiles. More than 3B data records in different categories enable companies to build data-driven products and generate actionable insights. Coresignal is exceptional in terms of data freshness, with 890M+ records updated monthly for unprecedented accuracy and relevance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Clean Workplace is a dataset for object detection tasks - it contains Workplace annotations for 898 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Alpaca-Cleaned
Repository: https://github.com/gururise/AlpacaDataCleaned
Dataset Description
This is a cleaned version of the original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset:
Hallucinations: Many instructions in the original dataset had instructions referencing data on the internet, which just caused GPT3 to hallucinate an answer.
"instruction":"Summarize the… See the full description on the dataset page: https://huggingface.co/datasets/yahma/alpaca-cleaned.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by MohameddAteff
Released under MIT
This dataset was created by Amit Chauhan
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by JamilShelton
Released under Apache 2.0
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
## Overview
Clean is a dataset for object detection tasks - it contains Ok annotations for 290 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
This dataset contains all of the clean datasets that were submitted by March 15 from the Boulder Workshop, 2001. Note: upon ordering this data, all of the data files will be included.
NYC Clean Heat dataset
Clean Transportation Program Data 2022. The Clean Transportation Program (also known as Alternative and Renewable Fuel and Vehicle Technology Program) invests up to $100 million annually in a broad portfolio of transportation and fuel transportation projects throughout the state. The Energy Commission leverages public and private investments to support adoption of cleaner transportation powered by alternative and renewable fuels. The program plays an important role in achieving California’s ambitious goals on climate change, petroleum reduction, and adoption of zero-emission vehicles, as well as efforts to reach air quality standards. The program also supports the state’s sustainable, long-term economic development.Data within this application was last updated August 2024.For more information on the Clean Transportation Program, visit:https://www.energy.ca.gov/programs-and-topics/programs/clean-transportation-program
This dataset was created by Aditi Dash
The Clean Energy Technology dataset provides crucial insights to decision-makers and business developers, enabling them to strategize their future activities and investments in emerging energy technologies at the forefront of the energy transition.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Clean Lens is a dataset for object detection tasks - it contains Motions annotations for 464 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
The purpose of this strategy is to ensure a low emissions transition and sustainable path for Georgia’s economic and social development, through: the identification of main sources/sectors of emissions and their trends in development process, assessing and removing barriers to low emission development, defining goals/policies/measures within each sector in the context of sustainable development of the country, establishment of relevant legislation system, infrastructure and coordination process for implementation, and monitoring of results and mobilizing the national and international financial sources for implementation of LEDS.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Summary
This is a Thai 🇹🇭-instructed dataset translated from cleaned version of the original Alpaca Dataset released by Stanford using Google Cloud Translation, contain 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better. The following issues have been identified in the original release and fixed in this… See the full description on the dataset page: https://huggingface.co/datasets/Thaweewat/alpaca-cleaned-52k-th.
Embark on a transformative journey with our Data Cleaning Project, where we meticulously refine and polish raw data into valuable insights. Our project focuses on streamlining data sets, removing inconsistencies, and ensuring accuracy to unlock its full potential.
Through advanced techniques and rigorous processes, we standardize formats, address missing values, and eliminate duplicates, creating a clean and reliable foundation for analysis. By enhancing data quality, we empower organizations to make informed decisions, drive innovation, and achieve strategic objectives with confidence.
Join us as we embark on this essential phase of data preparation, paving the way for more accurate and actionable insights that fuel success."