71 datasets found
  1. Data Cleaning Portfolio Project

    • kaggle.com
    zip
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepali Sukhdeve (2024). Data Cleaning Portfolio Project [Dataset]. https://www.kaggle.com/datasets/deepalisukhdeve/data-cleaning-portfolio-project
    Explore at:
    zip(6053781 bytes)Available download formats
    Dataset updated
    Apr 2, 2024
    Authors
    Deepali Sukhdeve
    Description

    Dataset

    This dataset was created by Deepali Sukhdeve

    Contents

  2. Cleaning Data in SQL Portfolio Project

    • kaggle.com
    zip
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin Kennell (2023). Cleaning Data in SQL Portfolio Project [Dataset]. https://www.kaggle.com/austinkennell/cleaning-data-in-sql-portfolio-project
    Explore at:
    zip(6054868 bytes)Available download formats
    Dataset updated
    Apr 19, 2023
    Authors
    Austin Kennell
    Description

    The dataset contained information on housing data in the Nashville, TN area. I used SQL Server to clean the data to make it easier to use. For example, I converted some dates to remove unnecessary timestamps; I populated data for null values; I changed address columns from containing all of the address, city and state into separate columns; I changed a column that had different representations of the same data into consistent usage; I removed duplicate rows; and I deleted unused columns.

  3. Nashville Housing Data Cleaning Project

    • kaggle.com
    zip
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Elhelbawy (2024). Nashville Housing Data Cleaning Project [Dataset]. https://www.kaggle.com/datasets/elhelbawylogin/nashville-housing-data-cleaning-project/discussion
    Explore at:
    zip(1282 bytes)Available download formats
    Dataset updated
    Aug 20, 2024
    Authors
    Ahmed Elhelbawy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Nashville
    Description

    Project Overview : This project demonstrates a thorough data cleaning process for the Nashville Housing dataset using SQL. The script performs various data cleaning and transformation operations to improve the quality and usability of the data for further analysis.

    Technologies Used : SQL Server T-SQL

    Dataset: The project uses the Nashville Housing dataset, which contains information about property sales in Nashville, Tennessee. The original dataset includes various fields such as property addresses, sale dates, sale prices, and other relevant real estate information. Data Cleaning Operations The script performs the following data cleaning operations:

    Date Standardization: Converts the SaleDate column to a standard Date format for consistency and easier manipulation. Populating Missing Property Addresses: Fills in NULL values in the PropertyAddress field using data from other records with the same ParcelID. Breaking Down Address Components: Separates the PropertyAddress and OwnerAddress fields into individual columns for Address, City, and State, improving data granularity and queryability. Standardizing Values: Converts 'Y' and 'N' values to 'Yes' and 'No' in the SoldAsVacant field for clarity and consistency. Removing Duplicates: Identifies and removes duplicate records based on specific criteria to ensure data integrity. Dropping Unused Columns: Removes unnecessary columns to streamline the dataset.

    Key SQL Techniques Demonstrated :

    Data type conversion Self joins for data population String manipulation (SUBSTRING, CHARINDEX, PARSENAME) CASE statements Window functions (ROW_NUMBER) Common Table Expressions (CTEs) Data deletion Table alterations (adding and dropping columns)

    Important Notes :

    The script includes cautionary comments about data deletion and column dropping, emphasizing the importance of careful consideration in a production environment. This project showcases various SQL data cleaning techniques and can serve as a template for similar data cleaning tasks.

    Potential Improvements :

    Implement error handling and transaction management for more robust execution. Add data validation steps to ensure the cleaned data meets specific criteria. Consider creating indexes on frequently queried columns for performance optimization.

  4. SQLcleaning

    • kaggle.com
    zip
    Updated Mar 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen M Blake (2023). SQLcleaning [Dataset]. https://www.kaggle.com/datasets/stephenmblake/sqlcleaning
    Explore at:
    zip(8206870 bytes)Available download formats
    Dataset updated
    Mar 15, 2023
    Authors
    Stephen M Blake
    Description

    Using SQL was able to cleaning up data so the it is easier to analyze. Used JOIN's, Substrings, parsename, update/alter tables, CTE, case statement, and row_number.. Learned many different ways to cleaning the data.

  5. SQL Data Cleaning Portfolio V2

    • kaggle.com
    zip
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Hurairah (2023). SQL Data Cleaning Portfolio V2 [Dataset]. https://www.kaggle.com/datasets/mohammadhurairah/sql-cleaning-portfolio-v2/discussion
    Explore at:
    zip(6054498 bytes)Available download formats
    Dataset updated
    Jun 16, 2023
    Authors
    Mohammad Hurairah
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data Cleaning from Public Nashville Housing Data:

    1. Standardize the Date Format

    2. Populate Property Address data

    3. Breaking out Addresses into Individual Columns (Address, City, State)

    4. Change Y and N to Yes and No in the "Sold as Vacant" field

    5. Remove Duplicates

    6. Delete Unused Columns

  6. Data and tools for studying isograms

    • figshare.com
    Updated Jul 31, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Breit (2017). Data and tools for studying isograms [Dataset]. http://doi.org/10.6084/m9.figshare.5245810.v1
    Explore at:
    application/x-sqlite3Available download formats
    Dataset updated
    Jul 31, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Florian Breit
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A collection of datasets and python scripts for extraction and analysis of isograms (and some palindromes and tautonyms) from corpus-based word-lists, specifically Google Ngram and the British National Corpus (BNC).Below follows a brief description, first, of the included datasets and, second, of the included scripts.1. DatasetsThe data from English Google Ngrams and the BNC is available in two formats: as a plain text CSV file and as a SQLite3 database.1.1 CSV formatThe CSV files for each dataset actually come in two parts: one labelled ".csv" and one ".totals". The ".csv" contains the actual extracted data, and the ".totals" file contains some basic summary statistics about the ".csv" dataset with the same name.The CSV files contain one row per data point, with the colums separated by a single tab stop. There are no labels at the top of the files. Each line has the following columns, in this order (the labels below are what I use in the database, which has an identical structure, see section below):

    Label Data type Description

    isogramy int The order of isogramy, e.g. "2" is a second order isogram

    length int The length of the word in letters

    word text The actual word/isogram in ASCII

    source_pos text The Part of Speech tag from the original corpus

    count int Token count (total number of occurences)

    vol_count int Volume count (number of different sources which contain the word)

    count_per_million int Token count per million words

    vol_count_as_percent int Volume count as percentage of the total number of volumes

    is_palindrome bool Whether the word is a palindrome (1) or not (0)

    is_tautonym bool Whether the word is a tautonym (1) or not (0)

    The ".totals" files have a slightly different format, with one row per data point, where the first column is the label and the second column is the associated value. The ".totals" files contain the following data:

    Label

    Data type

    Description

    !total_1grams

    int

    The total number of words in the corpus

    !total_volumes

    int

    The total number of volumes (individual sources) in the corpus

    !total_isograms

    int

    The total number of isograms found in the corpus (before compacting)

    !total_palindromes

    int

    How many of the isograms found are palindromes

    !total_tautonyms

    int

    How many of the isograms found are tautonyms

    The CSV files are mainly useful for further automated data processing. For working with the data set directly (e.g. to do statistics or cross-check entries), I would recommend using the database format described below.1.2 SQLite database formatOn the other hand, the SQLite database combines the data from all four of the plain text files, and adds various useful combinations of the two datasets, namely:• Compacted versions of each dataset, where identical headwords are combined into a single entry.• A combined compacted dataset, combining and compacting the data from both Ngrams and the BNC.• An intersected dataset, which contains only those words which are found in both the Ngrams and the BNC dataset.The intersected dataset is by far the least noisy, but is missing some real isograms, too.The columns/layout of each of the tables in the database is identical to that described for the CSV/.totals files above.To get an idea of the various ways the database can be queried for various bits of data see the R script described below, which computes statistics based on the SQLite database.2. ScriptsThere are three scripts: one for tiding Ngram and BNC word lists and extracting isograms, one to create a neat SQLite database from the output, and one to compute some basic statistics from the data. The first script can be run using Python 3, the second script can be run using SQLite 3 from the command line, and the third script can be run in R/RStudio (R version 3).2.1 Source dataThe scripts were written to work with word lists from Google Ngram and the BNC, which can be obtained from http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and [https://www.kilgarriff.co.uk/bnc-readme.html], (download all.al.gz).For Ngram the script expects the path to the directory containing the various files, for BNC the direct path to the *.gz file.2.2 Data preparationBefore processing proper, the word lists need to be tidied to exclude superfluous material and some of the most obvious noise. This will also bring them into a uniform format.Tidying and reformatting can be done by running one of the following commands:python isograms.py --ngrams --indir=INDIR --outfile=OUTFILEpython isograms.py --bnc --indir=INFILE --outfile=OUTFILEReplace INDIR/INFILE with the input directory or filename and OUTFILE with the filename for the tidied and reformatted output.2.3 Isogram ExtractionAfter preparing the data as above, isograms can be extracted from by running the following command on the reformatted and tidied files:python isograms.py --batch --infile=INFILE --outfile=OUTFILEHere INFILE should refer the the output from the previosu data cleaning process. Please note that the script will actually write two output files, one named OUTFILE with a word list of all the isograms and their associated frequency data, and one named "OUTFILE.totals" with very basic summary statistics.2.4 Creating a SQLite3 databaseThe output data from the above step can be easily collated into a SQLite3 database which allows for easy querying of the data directly for specific properties. The database can be created by following these steps:1. Make sure the files with the Ngrams and BNC data are named “ngrams-isograms.csv” and “bnc-isograms.csv” respectively. (The script assumes you have both of them, if you only want to load one, just create an empty file for the other one).2. Copy the “create-database.sql” script into the same directory as the two data files.3. On the command line, go to the directory where the files and the SQL script are. 4. Type: sqlite3 isograms.db 5. This will create a database called “isograms.db”.See the section 1 for a basic descript of the output data and how to work with the database.2.5 Statistical processingThe repository includes an R script (R version 3) named “statistics.r” that computes a number of statistics about the distribution of isograms by length, frequency, contextual diversity, etc. This can be used as a starting point for running your own stats. It uses RSQLite to access the SQLite database version of the data described above.

  7. h

    NSText2SQL

    • huggingface.co
    • opendatalab.com
    Updated Feb 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NumbersStation (2024). NSText2SQL [Dataset]. https://huggingface.co/datasets/NumbersStation/NSText2SQL
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2024
    Dataset authored and provided by
    NumbersStation
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Summary

    NSText2SQL dataset used to train NSQL models. The data is curated from more than 20 different public sources across the web with permissable licenses (listed below). All of these datasets come with existing text-to-SQL pairs. We apply various data cleaning and pre-processing techniques including table schema augmentation, SQL cleaning, and instruction generation using existing LLMs. The resulting dataset contains around 290,000 samples of text-to-SQL pairs. For more… See the full description on the dataset page: https://huggingface.co/datasets/NumbersStation/NSText2SQL.

  8. MY SQL DATA CLEANING PROJECT

    • kaggle.com
    zip
    Updated Jun 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George M122 (2024). MY SQL DATA CLEANING PROJECT [Dataset]. https://www.kaggle.com/georgem122/my-sql-data-cleaning-project
    Explore at:
    zip(1421 bytes)Available download formats
    Dataset updated
    Jun 20, 2024
    Authors
    George M122
    Description

    Dataset

    This dataset was created by George M122

    Contents

  9. S

    StreetSweeping022819

    • splitgraph.com
    • data.cityofchicago.org
    • +2more
    Updated Apr 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2024). StreetSweeping022819 [Dataset]. https://www.splitgraph.com/cityofchicago/streetsweeping022819-jqxt-c6gd
    Explore at:
    application/openapi+json, json, application/vnd.splitgraph.imageAvailable download formats
    Dataset updated
    Apr 10, 2024
    Dataset authored and provided by
    City of Chicago
    Description

    Street sweeping zones by Ward and Ward Section Number. For the corresponding schedule, see https://data.cityofchicago.org/d/k737-xg34.

    For more information about the City's Street Sweeping program, go to http://bit.ly/H2PHUP.

    The data can be viewed on the Chicago Data Portal with a web browser. However, to view or use the files outside of a web browser, you will need to use compression software and special GIS software, such as ESRI ArcGIS (shapefile) or Google Earth (KML or KMZ).

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  10. SQL Data Cleaning & EDA Project

    • kaggle.com
    zip
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bilal424 (2024). SQL Data Cleaning & EDA Project [Dataset]. https://www.kaggle.com/datasets/bilal424/sql-data-cleaning-and-eda-project/code
    Explore at:
    zip(5352 bytes)Available download formats
    Dataset updated
    Oct 15, 2024
    Authors
    Bilal424
    Description

    This dataset is a comprehensive collection of healthcare facility ratings across multiple countries. It includes detailed information on various attributes such as facility name, location, type, total beds, accreditation status, and annual visits of hospitals throughout the world. This cleaned dataset is ideal for conducting trend analysis, comparative studies between countries, or developing predictive models for facility ratings based on various factors. It offers a foundation for exploratory data analysis, machine learning modelling, and data visualization projects aimed at uncovering insights in the healthcare industry. The Project consists of the Original dataset, Data Cleaning Script and an EDA script in the data explorer tab for further analysis.

  11. S

    Street Sweeping Schedule - 2024

    • splitgraph.com
    • data.cityofchicago.org
    • +2more
    Updated Mar 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2024). Street Sweeping Schedule - 2024 [Dataset]. https://www.splitgraph.com/cityofchicago/street-sweeping-schedule-2024-3q8d-2t69
    Explore at:
    application/vnd.splitgraph.image, application/openapi+json, jsonAvailable download formats
    Dataset updated
    Mar 29, 2024
    Dataset authored and provided by
    City of Chicago
    Description

    Street sweeping schedule by Ward and Ward section number. To find your Ward section, visit https://data.cityofchicago.org/d/ytfi-mzdz. For more information about the City's Street Sweeping program, go to https://www.chicago.gov/city/en/depts/streets/provdrs/streetssan/svcs/streetsweeping.html.

    Corrections are possible during the course of the sweeping season.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  12. S

    Street Sweeping Zones - 2023

    • splitgraph.com
    • data.cityofchicago.org
    • +1more
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2023). Street Sweeping Zones - 2023 [Dataset]. https://www.splitgraph.com/cityofchicago/street-sweeping-zones-2023-6c59-kupn
    Explore at:
    json, application/vnd.splitgraph.image, application/openapi+jsonAvailable download formats
    Dataset updated
    Mar 31, 2023
    Dataset authored and provided by
    City of Chicago
    Description

    Street sweeping zones by Ward and Ward Section Number. For the corresponding schedule, see https://data.cityofchicago.org/d/3dx4-5j8t.

    For more information about the City's Street Sweeping program, go to https://www.chicago.gov/city/en/depts/streets/provdrs/streetssan/svcs/streetsweeping.html.

    ​​​​​This dataset is in a forma​​t for spatial datasets that is inherently tabular but allows for a map as a derived view. Please click the indicated link below for such a map.

    To export the data in either tabular or geographic format, please use the Export button on this dataset.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  13. Z

    IVMOOC 2017 - GloBI Data for Interactive Tableau Map of Spatial and Temporal...

    • data.niaid.nih.gov
    • nde-dev.biothings.io
    • +2more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cains, Mariana; Anand, Srini (2020). IVMOOC 2017 - GloBI Data for Interactive Tableau Map of Spatial and Temporal Distribution of Interactions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_814911
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Indiana University
    Authors
    Cains, Mariana; Anand, Srini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Global Biotic Interactions (GloBI, www.globalbioticinteractions.org) provides an infrastructure and data service that aggregates and archives known biotic interaction databases to provide easy access to species interaction data. This project explores the coverage of GloBI data against known taxonomic catalogues in order to identify 'gaps' in knowledge of species interactions. We examine the richness of GloBI's datasets using itself as a frame of reference for comparison and explore interaction networks according to geographic regions over time. The resulting analysis and visualizations intend to provide insights that may help to enhance GloBI as a resource for research and education.

    Spatial and temporal biotic interactions data were used in the construction of an interactive Tableau map. The raw data (IVMOOC 2017 GloBI Kingdom Data Extracted 2017 04 17.csv) was extracted from the project-specific SQL database server. The raw data was clean and preprocessed (IVMOOC 2017 GloBI Cleaned Tableau Data.csv) for use in the Tableau map. Data cleaning and preprocessing steps are detailed in the companion paper.

    The interactive Tableau map can be found here: https://public.tableau.com/profile/publish/IVMOOC2017-GloBISpatialDistributionofInteractions/InteractionsMapTimeSeries#!/publish-confirm

    The companion paper can be found here: doi.org/10.5281/zenodo.814979

    Complementary high resolution visualizations can be found here: doi.org/10.5281/zenodo.814922

    Project-specific data can be found here: doi.org/10.5281/zenodo.804103 (SQL server database)

  14. o

    UK Power Networks Grid Substation Distribution Areas

    • ukpowernetworks.opendatasoft.com
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). UK Power Networks Grid Substation Distribution Areas [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/ukpn-grid-postcode-area/
    Explore at:
    Dataset updated
    Mar 31, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionThis dataset is a geospatial view of the areas fed by grid substations. The aim is to create an indicative map showing the extent to which individual grid substations feed areas based on MPAN data.

    Methodology

    Data Extraction and Cleaning: MPAN data is queried from SQL Server and saved as a CSV. Invalid values and incorrectly formatted postcodes are removed using a Test Filter in FME.

    Data Filtering and Assignment: MPAN data is categorized into EPN, LPN, and SPN based on the first two digits. Postcodes are assigned a Primary based on the highest number of MPANs fed from different Primary Sites.

    Polygon Creation and Cleaning: Primary Feed Polygons are created and cleaned to remove holes and inclusions. Donut Polygons (holes) are identified, assigned to the nearest Primary, and merged.

    Grid Supply Point Integration: Primaries are merged into larger polygons based on Grid Site relationships. ny Primaries not fed from a Grid Site are marked as NULL and labeled.

      Functional Location Codes (FLOC) Matching: FLOC codes are extracted and matched to Primaries, Grid Sites and Grid Supply Points. Confirmed FLOCs are used to ensure accuracy, with any unmatched sites reviewed by the Open Data Team.
    

    Quality Control Statement

    Quality Control Measures include:

    Verification steps to match features only with confirmed functional locations. Manual review and correct of data inconsistencies Use of additional verification steps to ensure accuracy in the methodology Regular updates and reviews documented in the version history

    Assurance Statement The Open Data Team and Network Data Team worked with the Geospatial Data Engineering Team to ensure data accuracy and consistency.

    Other

    Download dataset information: Metadata (JSON)

    Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary: https://ukpowernetworks.opendatasoft.com/pages/glossary/To view this data please register and login.

  15. S

    Texas Commission on Environmental Quality - Historical Dry Cleaner...

    • splitgraph.com
    • data.texas.gov
    • +2more
    Updated Oct 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Waste (2024). Texas Commission on Environmental Quality - Historical Dry Cleaner Registrations [Dataset]. https://www.splitgraph.com/texas-gov/texas-commission-on-environmental-quality-xcc6-2a52
    Explore at:
    application/openapi+json, application/vnd.splitgraph.image, jsonAvailable download formats
    Dataset updated
    Oct 15, 2024
    Dataset authored and provided by
    Office of Waste
    Description

    This dataset contains all historical Dry Cleaner Registrations in Texas. Note that most registrations listed are expired and are from previous years.

    View operating dry cleaners with current and valid (unexpired) registration certificates here: https://data.texas.gov/dataset/Texas-Commission-on-Environmental-Quality-Current-/qfph-9bnd/

    State law requires dry cleaning facilities and drop stations to register with TCEQ. Dry cleaning facilities and drop stations must renew their registration by August 1st of each year. The Dry Cleaners Registrations reflect self-reported registration information about whether a dry cleaning location is a facility or drop station, and whether they have opted out of the Dry Cleaning Environmental Remediation Fund. Distributors can find out whether to collect solvent fees from each registered facility as well as the registration status and delivery certificate expiration date of a location.

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  16. S

    Streetsweeping_2015A

    • splitgraph.com
    • data.cityofchicago.org
    • +2more
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2024). Streetsweeping_2015A [Dataset]. https://www.splitgraph.com/cityofchicago/streetsweeping2015a-j58c-hv2e
    Explore at:
    application/vnd.splitgraph.image, application/openapi+json, jsonAvailable download formats
    Dataset updated
    Apr 10, 2024
    Dataset authored and provided by
    City of Chicago
    Description

    Street sweeping zones by Ward and Ward Section Number. The zones are the same as those used in 2014. For the corresponding schedule, see https://data.cityofchicago.org/d/waad-z968. Because the City of Chicago ward map will change on May 18, 2015, this dataset will be supplemented with an additional dataset to cover the remainder of 2015 (through November).

    For more information about the City's Street Sweeping program, go to http://bit.ly/H2PHUP. The data can be viewed on the Chicago Data Portal with a web browser. However, to view or use the files outside of a web browser, you will need to use compression software and special GIS software, such as ESRI ArcGIS (shapefile) or Google Earth (KML or KMZ).

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  17. Audible Dataset Cleaning SQL

    • kaggle.com
    zip
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Filip Kobus (2024). Audible Dataset Cleaning SQL [Dataset]. https://www.kaggle.com/datasets/fkobus/audible-dataset-cleaning-sql/data
    Explore at:
    zip(6590021 bytes)Available download formats
    Dataset updated
    Oct 8, 2024
    Authors
    Filip Kobus
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    I took this data from some kaggle datased and cleaned it myself in MySQL.

  18. SQL Data Cleaning Project1

    • kaggle.com
    zip
    Updated Nov 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    christopher alverio (2024). SQL Data Cleaning Project1 [Dataset]. https://www.kaggle.com/datasets/christopheralverio/sql-data-cleaning-project1/code
    Explore at:
    zip(1312 bytes)Available download formats
    Dataset updated
    Nov 12, 2024
    Authors
    christopher alverio
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by christopher alverio

    Released under MIT

    Contents

  19. i

    Annual Household Survey 2004 - Lao PDR

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    Updated Mar 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Statistical Center (NSC) (2019). Annual Household Survey 2004 - Lao PDR [Dataset]. https://catalog.ihsn.org/catalog/study/LAO_2004_AHS_v01_M
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    National Statistical Center (NSC)
    Time period covered
    2004
    Area covered
    Laos
    Description

    Abstract

    The Annual Household Survey is conducted to provide data requirements of the National Accounts of Lao PDR.

    Geographic coverage

    • National
    • Province
    • Urban and Rural
    • Three Regions: North, Central, South

    Analysis unit

    • Household
    • Individual

    Universe

    • All private household in Lao PDR
    • all person 10 years and over

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    • Household Sameple Size: 2,400 household
    • Village Sample Size: 240 Villages
    • AHS 2004, For each village, random sample of 5 household from 10 household of AHS2003 and 5 new household from the list provided by village chief to the enumerator.

    For detail please refer to the manual for "AHS2004 Report - Final" (Lao version) on page 2.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    AHS 2004 has 2 forms:

    (1) Household Survey - identification - household composition - labor force: labor force participation last sevendays; overview of work in the last seven days - construction activities in the past 12 months - household businesses: establishing the existence of non-farm enterprises - agriculture: crop harvested during last 12 months; fishery; forestry; livestock, poultry - households' purchase and selling of durables during the last 12 months - income and transfers

    (2) Diary Expenditure and Consumption Household Survey - identification - households' diary sheet for household transactions

    Cleaning operations

    Data editing: - Office editing and coding - Use software Microsoft Access for entry and checking data - Use SQL Server for Database - Use SPSS for analysis

  20. SQL PROJECT

    • kaggle.com
    zip
    Updated Jul 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SHAW RICK (2024). SQL PROJECT [Dataset]. https://www.kaggle.com/datasets/shawrick/sql-project
    Explore at:
    zip(69397 bytes)Available download formats
    Dataset updated
    Jul 27, 2024
    Authors
    SHAW RICK
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains a collection of SQL scripts and techniques developed by business data analyst to assist with data optimization and cleaning tasks. The scripts cover a range of data management operations, including:

    1) Data cleansing: Identifying and addressing issues such as missing values, duplicate records, formatting inconsistencies, and outliers. 2) Data normalization: Designing optimized database schemas and normalizing data structures to minimize redundancy and improve data integrity. 3) Data transformation and ETL: Developing efficient Extract, Transform, and Load (ETL) pipelines to integrate data from multiple sources and perform complex data transformations. 4) Reporting and dashboarding: Creating visually appealing and insightful reports, dashboards, and data visualizations to support informed decision-making.

    The scripts and techniques in this dataset are tailored to the needs of business data analysts and can be used to enhance the quality, efficiency, and value of data-driven insights.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Deepali Sukhdeve (2024). Data Cleaning Portfolio Project [Dataset]. https://www.kaggle.com/datasets/deepalisukhdeve/data-cleaning-portfolio-project
Organization logo

Data Cleaning Portfolio Project

Cleaning Data with SQL Queries

Explore at:
zip(6053781 bytes)Available download formats
Dataset updated
Apr 2, 2024
Authors
Deepali Sukhdeve
Description

Dataset

This dataset was created by Deepali Sukhdeve

Contents

Search
Clear search
Close search
Google apps
Main menu