54 datasets found
  1. h

    clean-up-table

    • huggingface.co
    Updated Jul 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel San Jose Pro (2025). clean-up-table [Dataset]. https://huggingface.co/datasets/danielsanjosepro/clean-up-table
    Explore at:
    Dataset updated
    Jul 28, 2025
    Authors
    Daniel San Jose Pro
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset was created using LeRobot.

      Dataset Structure
    

    meta/info.json: { "codebase_version": "v2.1", "robot_type": "franka", "total_episodes": 51, "total_frames": 28867, "total_tasks": 1, "total_videos": 0, "total_chunks": 1, "chunks_size": 1000, "fps": 20, "splits": { "train": "0:51" }, "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet", "video_path": null, "features": {… See the full description on the dataset page: https://huggingface.co/datasets/danielsanjosepro/clean-up-table.

  2. R

    Clean Table Dataset

    • universe.roboflow.com
    zip
    Updated Jul 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohith (2024). Clean Table Dataset [Dataset]. https://universe.roboflow.com/rohith-vr9nb/clean-table/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 10, 2024
    Dataset authored and provided by
    Rohith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Tables Bounding Boxes
    Description

    Clean Table

    ## Overview
    
    Clean Table is a dataset for object detection tasks - it contains Tables annotations for 857 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  3. Clean Meta Kaggle

    • kaggle.com
    Updated Sep 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yoni Kremer (2023). Clean Meta Kaggle [Dataset]. https://www.kaggle.com/datasets/yonikremer/clean-meta-kaggle
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yoni Kremer
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Cleaned Meta-Kaggle Dataset

    The Original Dataset - Meta-Kaggle

    Explore our public data on competitions, datasets, kernels (code / notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

    Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.

    https://i.imgur.com/2Egeb8R.png" alt="" title="a title">

    This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.

    Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.

    August 2023 update

    In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here

    We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.

    The Problems with the Original Dataset

    • The original dataset is 32 CSV files, with 268 colums and 7GB of compressed data. Having so many tables and columns makes it hard to understand the data.
    • The data is not normalized, so when you join tables you get a lot of errors.
    • Some values refer to non-existing values in other tables. For example, the UserId column in the ForumMessages table has values that do not exist in the Users table.
    • There are missing values.
    • There are duplicate values.
    • There are values that are not valid. For example, Ids that are not positive integers.
    • The date and time columns are not in the right format.
    • Some columns only have the same value for all rows, so they are not useful.
    • The boolean columns have string values True or False.
    • Incorrect values for the Total columns. For example, the DatasetCount is not the total number of datasets with the Tag according to the DatasetTags table.
    • Users upvote their own messages.

    The Solution

    • To handle so many tables and columns I use a relational database. I use MySQL, but you can use any relational database.
    • The steps to create the database are:
    • Creating the database tables with the right data types and constraints. I do that by running the db_abd_create_tables.sql script.
    • Downloading the CSV files from Kaggle using the Kaggle API.
    • Cleaning the data using pandas. I do that by running the clean_data.py script. The script does the following steps for each table:
      • Drops the columns that are not needed.
      • Converts each column to the right data type.
      • Replaces foreign keys that do not exist with NULL.
      • Replaces some of the missing values with default values.
      • Removes rows where there are missing values in the primary key/not null columns.
      • Removes duplicate rows.
    • Loading the data into the database using the LOAD DATA INFILE command.
    • Checks that the number of rows in the database tables is the same as the number of rows in the CSV files.
    • Adds foreign key constraints to the database tables. I do that by running the add_foreign_keys.sql script.
    • Update the Total columns in the database tables. I do that by running the update_totals.sql script.
    • Backup the database.
  4. Frequency of tidying up all rooms in the U.S. 2018

    • statista.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Frequency of tidying up all rooms in the U.S. 2018 [Dataset]. https://www.statista.com/forecasts/1004981/frequency-of-tidying-up-all-rooms-in-the-us-2018
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 8, 2017 - Dec 13, 2017
    Area covered
    United States
    Description

    This statistic shows the results of a survey conducted in the United States in 2017 on the frequency of tidying up all rooms. Some ** percent of respondents stated in their household tidying up all rooms happens several times per week. The Survey Data Table for the Statista survey Cleaning Products in the United States 2018 contains the complete tables for the survey including various column headings.

  5. Wine Quality (UCI) — Clean CSV for ML

    • kaggle.com
    zip
    Updated Aug 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nishtha Sharma (2025). Wine Quality (UCI) — Clean CSV for ML [Dataset]. https://www.kaggle.com/datasets/nishtha711/wine-quality-uci-clean-csv-for-ml
    Explore at:
    zip(10711 bytes)Available download formats
    Dataset updated
    Aug 1, 2025
    Authors
    Nishtha Sharma
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🍷 Wine Quality Dataset — Cleaned + Raw ZIP

    This dataset contains the classic Wine Recognition Dataset from the UCI Machine Learning Repository — now presented in two formats:

    • ✅ A cleaned and labeled CSV (wine_clean.csv) for fast ML workflows
    • 🗜️ The original UCI zip (wine_original.zip) for purists and explorers

    Perfect for learning K-Nearest Neighbors (KNN), exploring distance metrics like Euclidean, Manhattan, Cosine, and building visual + interactive ML notebooks.

    📂 What's Inside?

    FileDescription
    wine_clean.csvClean version with column names, no missing data, and ready-to-use
    wine.zipRaw UCI files: wine.data, wine.names, etc. for reference or manual parsing

    🧪 Features in Clean CSV

    FeatureDescription
    ClassTarget: Cultivar of wine (1, 2, or 3)
    AlcoholAlcohol content
    Malic_AcidMalic acid amount
    AshAsh content
    Alcalinity_of_AshAlkalinity of ash
    MagnesiumMagnesium content
    Total_PhenolsTotal phenol compounds
    FlavanoidsFlavonoid concentration
    Nonflavanoid_PhenolsNon-flavonoid phenols
    ProanthocyaninsAmount of proanthocyanins
    Color_IntensityIntensity of wine color
    HueHue of wine
    OD280_OD315Optical density ratio
    ProlineProline levels

    📈 Why Use This Dataset?

    • Great for learning classification with distance-based algorithms (e.g., KNN)
    • Use for visualizations, feature scaling, normalization demos
    • Combines clean, beginner-friendly data with original reference files
    • Ideal for building educational projects, GitHub portfolios, and Streamlit apps

    🧠 Origin & Credit

    • 📍 Source: UCI Machine Learning Repository
    • 🧪 Collected by: Paulo Cortez et al., University of Minho, Portugal
    • 📝 Citation:
      > Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J. (2009).
      > Modeling wine preferences by data mining from physicochemical properties.
      > Decision Support Systems, Elsevier. DOI: 10.24432/C56S3T

    🔖 License

    Public Domain (CC0) — free to use, remix, and share 🌍

    If you're an ML student or early-career data scientist, this dataset is your 🍷 playpen. Dive in!

  6. Z

    Soundscape Attributes Translation Project (SATP) Dataset

    • data.niaid.nih.gov
    • data.europa.eu
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oberman, Tin; Mitchell, Andrew; Aletta, Francesco; Almagro Pastor, José Antonio; Jambrošić, Kristian; Kang, Jian (2024). Soundscape Attributes Translation Project (SATP) Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6914433
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    University of Zagreb
    University College London
    Universidad de Granada
    Authors
    Oberman, Tin; Mitchell, Andrew; Aletta, Francesco; Almagro Pastor, José Antonio; Jambrošić, Kristian; Kang, Jian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data and audio included here were collected for the Soundscape Attributes Translation Project (SATP). First introduced in Aletta et. al. (2020), the SATP is an attempt to provide validated translations of soundscape attributes in languages other than English. The recordings were used for headphones - based listening experiments.

    The data are provided to accompany publications resulting from this project and to provide a unique dataset of 1000s of perceptual responses to a standardised set of urban soundscape recordings. This dataset is the result of efforts from hundreds of researchers, students, assistants, PIs, and participants from institutions around the world. We have made an attempt to list every contributor to this Zenodo repo; if you feel you should be included, please get in touch.

    Citation: If you use the SATP dataset or part of it, please cite our paper describing the data collection and this dataset itself.

    Overview: The SATP dataset consists of 27 30-sec binaural audio recordings made in urban public spaces in London and one 60 sec stereo calibration signal.

    The recordings were made at locations as reported in Table 1 of the README.md (Recording locations), at various times of day by an operator wearing a binaural kit consisting of BHS II microphones and a SQobold (HEAD acoustics) device. Recordings were then exported to WAV via the ArtemiS SUITE software, using the original dynamic range from HDF. The listening experiment and the calibration procedure were intended for a headphone playback system (Sennheiser HD650 or similar open-back headphones recommended).

    The recordings were selected from an initial set of 80 recordings through a pilot study to ensure the test set had an even coverage of the soundscape circumplex space. These recordings were sent to the partner institutions (see Table 2 of the README.md) and assessed by approximately 30 participants in the institution's target language. The questionnaire used in each assessment is a translation of Method A Questionnaire, ISO 12913-2:2018. Each institution carried out their own lab experiment to collect data, then submitted their data to the team at UCL to compile into a single dataset. Some institutions included additional questions or translation options; the combined dataset (SATP Dataset v1.x.xlsx) includes only the base set of questions, the extended set of questions from each institution is included in the Institution Datasets folder.

    In all, SATP Dataset v1.4 contains 19,089 samples, including 707 participants, for 27 recordings, in 18 languages with contributions from 29 institutions.

    Descriptions of the recordings, including GPS coordinates and sound sources, can be found in the README.md file.

    Format: The audio recordings are provided as 24 bit, 48 kHz, stereo WAV files. The combined dataset and Institutional datasets are provided as long tidy data tables in .xlsx files.

    Calibration: The recommended calibration approach was based on the open-circuit voltage (OCV) procedure which was considered most accessible but other calibration procedures are also possible (Lam et. al. (2022)). The provided calibration file is a computer generated sine wave at 1kHz, matching a sine wave recorded using the exact same setup at SPL of 94 dB. In case of the calibration signal playback level set to match SPL of 94 dB at the eardrum, all the 27 samples should be reproduced at realistic loudness. More details on OCV calibration procedure and other options you can find in Lam et. al. (2022) and the attached documentation. PLEASE DO NOT EXPOSE YOURSELF NOR THE PARTICIPANTS TO THE CALIBRATION SIGNAL SET AT THE REALISTIC LEVEL AS IT CAN CAUSE HARM.

    License and reuse: All SATP recordings are provided under the Creative Commons Attribution 4.0 International (CC BY 4.0) License and are free to use. We encourage other researchers to replicate the SATP protocol and contribute new languages to the dataset. We also encourage the use of these recordings and the perceptual data for further soundscape research purposes. Please provide the proper attribution and get in touch with the authors if you would like to contribute a new translation or for any other collaborations.

  7. Environmental and Clean Technology Products Economic Account, supply and use...

    • www150.statcan.gc.ca
    Updated Dec 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2024). Environmental and Clean Technology Products Economic Account, supply and use table (x 1,000,000) [Dataset]. http://doi.org/10.25318/3610062901-eng
    Explore at:
    Dataset updated
    Dec 20, 2024
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Annual total supply, output margins, international and interprovincial imports, total use, intermediate input, domestic demand, inventories, and international and interprovincial exports of the environmental and clean technology product sector, per goods and services category, for Canada, provinces and territories.

  8. Attitude toward cleaning in the U.S. 2018

    • statista.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Attitude toward cleaning in the U.S. 2018 [Dataset]. https://www.statista.com/forecasts/1006967/attitude-toward-cleaning-in-the-us
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 8, 2017 - Dec 13, 2017
    Area covered
    United States
    Description

    This statistic shows the results of a survey conducted in the United States in 2017 on attitudes towards cleaning. Some ** percent of respondents stated they are happy when everything is clean and tidy for their family. The Survey Data Table for the Statista survey Cleaning Products in the United States 2018 contains the complete tables for the survey including various column headings.

  9. Stake DAO - Strategies Clean Table

    • dune.com
    Updated Apr 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    stake-dao (2025). Stake DAO - Strategies Clean Table [Dataset]. https://dune.com/discover/content/relevant?q=author:stake-dao&resource-type=queries
    Explore at:
    Dataset updated
    Apr 6, 2025
    Dataset provided by
    Stake DAO
    Authors
    stake-dao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Blockchain data query: Stake DAO - Strategies Clean Table

  10. G

    Environmental and Clean Technology Products Economic Account, supply and use...

    • open.canada.ca
    • data.wu.ac.at
    csv, html, xml
    Updated Jan 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Environmental and Clean Technology Products Economic Account, supply and use table, inactive [Dataset]. https://open.canada.ca/data/dataset/371f89ee-53d9-4a19-be4c-d784ebe9eccd
    Explore at:
    xml, csv, htmlAvailable download formats
    Dataset updated
    Jan 17, 2023
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    This table contains 25 series, with data for years 2007 - 2016 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (Not all combinations are available): Geography (1 item: Canada); Economic variable (9 items: Total supply; Output; Margins; Imports; ...); Product and service (9 items: Total, products and services; Total, electricity; From nuclear; From renewable sources; ...).

  11. Radio Stations

    • kaggle.com
    zip
    Updated Nov 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sujay Kapadnis (2023). Radio Stations [Dataset]. https://www.kaggle.com/datasets/sujaykapadnis/radio-stations/data
    Explore at:
    zip(65930324 bytes)Available download formats
    Dataset updated
    Nov 5, 2023
    Authors
    Sujay Kapadnis
    Description

    The data comes from Wikipedia.

    The dataset included was mined from all 50 states, tidying column names, binding and aggregating.

    Data Dictionary

    state_stations.csv

    variableclassdescription
    call_signcharacterCall Sign
    frequencycharacterfrequency
    citycharactercity
    licenseecharacterlicensee
    formatcharacterformat
    statecharacterstate

    station_info.csv

    Can be joined:

    state_stations |> dplyr::right_join(station_info, by = c("call_sign"))
    
    variableclassdescription
    call_signcharacterCall sign
    facility_iddoubleFacility id
    servicecharacterService
    licenseecharacterLicensee
    statuscharacterStatus
    detailscharacterDetails

    citation("tidytuesdayR")

  12. I

    Clean Steam Separator Market Analysis - Size, Share & Forecast 2025 to 2035 ...

    • futuremarketinsights.com
    html, pdf
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil Kaitwade (2025). Clean Steam Separator Market Analysis - Size, Share & Forecast 2025 to 2035  [Dataset]. https://www.futuremarketinsights.com/reports/clean-steam-separator-market
    Explore at:
    html, pdfAvailable download formats
    Dataset updated
    Jun 6, 2025
    Authors
    Nikhil Kaitwade
    License

    https://www.futuremarketinsights.com/privacy-policyhttps://www.futuremarketinsights.com/privacy-policy

    Time period covered
    2025 - 2035
    Area covered
    Worldwide
    Description

    The global clean steam separator market is expected to reach a value of USD 2.6 billion by the end of 2025 and is projected to expand steadily to USD 3.6 billion by 2035, growing at a CAGR of 3.1%.

    MetricValue
    Industry Size (2025E)USD 2.6 billion
    Industry Value (2035F)USD 3.6 billion
    CAGR (2025 to 2035)3.1%

    Clean Steam Separator Market Analyzed by Top Investment Segments

    TypeCAGR (2025 to 2035)
    Stainless Steel3.8%
    TypeCAGR (2025 to 2035)
    Stainless Steel3.8%
    Structure TypeCAGR (2025 to 2035)
    Fabricated3.9%
    End UseCAGR (2025 to 2035)
    Pharmaceuticals4.1%

    Country-Wise Insights

    CountryCAGR (2025 to 2035)
    United States2.9%
    CountryCAGR (2025 to 2035)
    United Kingdom2.8%
    CountryCAGR (2025 to 2035)
    European Union3.0%
    CountryCAGR (2025 to 2035)
    China3.4%
    CountryCAGR (2025 to 2035)
    India3.6%
    CountryCAGR (2025 to 2035)
    Japan2.7%
    CountryCAGR (2025 to 2035)
    South Korea3.2%
  13. e

    High-frequency sensor data collected by Stroud Water Research Center in a...

    • portal.edirepository.org
    • search.dataone.org
    csv
    Updated Sep 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diana Oviedo-Vargas (2020). High-frequency sensor data collected by Stroud Water Research Center in a meadow reach of White Clay Creek from Janurary 2018 through December 2018 [Dataset]. http://doi.org/10.6073/pasta/db05cd0a83e6272085ae2657fd7d4725
    Explore at:
    csv(6354027 bytes), csv(1935219 bytes), csv(10861380 bytes), csv(61766485 bytes)Available download formats
    Dataset updated
    Sep 12, 2020
    Dataset provided by
    EDI
    Authors
    Diana Oviedo-Vargas
    Time period covered
    Jan 1, 2018 - Jan 1, 2019
    Area covered
    Variables measured
    Unit, Level, Value, UTC_Time, depth_iq, utc_time, Parameter, depth_ysi, Instrument, velocity_iq, and 21 more
    Description

    High-frequency sensor data from a YSI 600 OMS Optical Monitoring System (every 15 minutes) and Sontek IQ (every 10 minutes) in a meadow reach at White Clay Creek from January 2018 through December 2018. Funded by NSF and DEB as part of the LTREB grant to study the recovery of stream ecosystem structure and function during reforestation, Stroud Water Research Center. The parameters in this data package are water temperature, depth, turbidity, conductivity, specific conductance, water pressure, discharge, rivers ection area, and velocity. Data are presented in four tables which likely have significant overlap. The raw data table presents the data exactly as it was downloaded from the Aquarius Database. It is formatted as a "wide" human-readable table. IQ_stream and YSI_stream present only the data from the respective sensors. These tables are gapfilled so that there are no time gaps. Formatted as a "wide" human-readable table. The full_stream table is all of the data, raw and cleaned, from both sensors. It is organized as a long, tidy table and is optimal for machine readability. All of the parameters and table are further explained in the metadata.

  14. R

    Russia Avg Consumer Price: Paper & Clean Articles: Paper Table Napkins

    • ceicdata.com
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). Russia Avg Consumer Price: Paper & Clean Articles: Paper Table Napkins [Dataset]. https://www.ceicdata.com/en/russia/average-consumer-price-paper-and-clean-articles-stationeries-publishing/avg-consumer-price-paper--clean-articles-paper-table-napkins
    Explore at:
    Dataset updated
    Oct 15, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2024 - Dec 1, 2024
    Area covered
    Russia
    Variables measured
    Consumer Prices
    Description

    Russia Avg Consumer Price: Paper & Clean Articles: Paper Table Napkins data was reported at 52.720 RUB/100 Unit in Feb 2025. This records an increase from the previous number of 52.630 RUB/100 Unit for Jan 2025. Russia Avg Consumer Price: Paper & Clean Articles: Paper Table Napkins data is updated monthly, averaging 47.530 RUB/100 Unit from Jan 2021 (Median) to Feb 2025, with 50 observations. The data reached an all-time high of 52.720 RUB/100 Unit in Feb 2025 and a record low of 34.310 RUB/100 Unit in May 2021. Russia Avg Consumer Price: Paper & Clean Articles: Paper Table Napkins data remains active status in CEIC and is reported by Federal State Statistics Service. The data is categorized under Russia Premium Database’s Prices – Table RU.PA010: Average Consumer Price: Paper and Clean Articles, Stationeries, Publishing.

  15. SEC Edgar 10-K/10-Q Filings 2008 - 2023

    • kaggle.com
    zip
    Updated Sep 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James G Lang (2023). SEC Edgar 10-K/10-Q Filings 2008 - 2023 [Dataset]. https://www.kaggle.com/datasets/jamesglang/sec-edgar-company-facts-september2023/discussion
    Explore at:
    zip(932450519 bytes)Available download formats
    Dataset updated
    Sep 7, 2023
    Authors
    James G Lang
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    There are two files in this dataset. One dataset contains line items from 10-K and 10-Q forms filed between 2009-04-15 and 2023-09-06. The other dataset, "line_item_counts.csv", contains the frequency that each line item occurs, along with a description of the line item.

    I was originally looking for a dataset with up to date company information but couldn't find anything that was current and beginner friendly to use. So I decided to pull data directly from SEC Edgar to create a tidy table from their dataset. I have yet to use it but figured I would share what I have so far in case anyone was in my position.

    I'll release more info about my process in the near future, but for now I hope that you find some use from this dataset.

    I have also released a sample notebook to show how you can load the large dataset into Kaggle without exceeding memory limits. Hopefully this can help you get started if you want to try in Kaggle. Other options would be to download the dataset locally and use your preferred ide to work with the dataset, and the operations would be limited by the memory currently available on your computer OR you could look into using a cloud computing platform like AWS EC2 or GCP to work with the dataset.

  16. NASA Meteorites Dataset

    • kaggle.com
    zip
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sujay Kapadnis (2023). NASA Meteorites Dataset [Dataset]. https://www.kaggle.com/datasets/sujaykapadnis/meteorites-dataset/discussion
    Explore at:
    zip(682986 bytes)Available download formats
    Dataset updated
    Oct 12, 2023
    Authors
    Sujay Kapadnis
    Description

    This week's dataset is a dataset all about meteorites, where they fell and when they fell! Data comes from the Meteoritical Society by way of NASA. H/t to #TidyTuesday community member Malin Axelsson for sharing this data as an issue on GitHub!

    If you want to find out more about meteorite classifications, Malin was kind enough to share a wikipedia article as well!

    Data Dictionary

    meteorites.csv

    variableclassdescription
    namecharacterMeteorite name
    iddoubleMeteorite numerical ID
    name_typecharacterName type either valid or relict, where relict = a meteorite that cannot be assigned easily to a class
    classcharacterClass of the meteorite, please see Wikipedia for full context
    massdoubleMass in grams
    fallcharacterFell or Found meteorite
    yearintegerYear found
    latdoubleLatitude
    longdoubleLongitude
    geolocationcharacterGeolocation

    @misc{tidytuesday, title = {Tidy Tuesday: A weekly social data project}, author = {R4DS Online Learning Community}, url = {https://github.com/rfordatascience/tidytuesday}, year = {2023} }

  17. US Population by State - Comprehensive Data

    • kaggle.com
    zip
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rolf Hendriks (2025). US Population by State - Comprehensive Data [Dataset]. https://www.kaggle.com/datasets/rolfhendriks/us-population-by-state-comprehensive-data
    Explore at:
    zip(65754 bytes)Available download formats
    Dataset updated
    Sep 2, 2025
    Authors
    Rolf Hendriks
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    United States
    Description

    A comprehensive collection of US Census state population totals by year - from 1790 to present. Includes all 50 states plus DC.

    Data from 1790 to 1900 is represented once per decade based on historic US Census data. Populations between 1900 and 1946 are backfilled estimates provided by the US census based on decennial Census data combined with external data including birth rates and death rates. Populations from 1947 onwards are based on population estimate surveys conducted by the US Census.

    Population data is published in a tidy / long format as well as a wide / columnar format:

    tidy format:

    Each row represents a total population for a particular year and state.

    This format is ideally suited for computation and for converting to other formats as needed.

    wide format:

    A pivot table of populations by year and state, with states as columns and years as rows. Each row represents populations for all states in a year.

    This format is more compact and human-readable than the tidy format.

  18. SPORTS_DATA_ANALYSIS_ON_EXCEL

    • kaggle.com
    zip
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nil kamal Saha (2024). SPORTS_DATA_ANALYSIS_ON_EXCEL [Dataset]. https://www.kaggle.com/datasets/nilkamalsaha/sports-data-analysis-on-excel
    Explore at:
    zip(1203633 bytes)Available download formats
    Dataset updated
    Dec 12, 2024
    Authors
    Nil kamal Saha
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    PROJECT OBJECTIVE

    We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.

    Questions (KPIs)

    TASK 1: STANDARDIZING THE DATASET

    • Populate the FULLNAME consisting of the following fields ONLY, in the prescribed format: PREFIX FIRSTNAME LASTNAME.{Note: All UPPERCASE)
    • Get the COUNTRY NAME to which these sportsmen belong to. Make use of LOCATION sheet to get the required data
    • Populate the LANGUAGE_!poken by the sportsmen. Make use of LOCTION sheet to get the required data
    • Generate the EMAIL ADDRESS for those members, who speak English, in the prescribed format :lastname.firstnamel@xyz .org {Note: All lowercase) and for all other members, format should be lastname.firstname@xyz.com (Note: All lowercase)
    • Populate the SPORT LOCATION of the sport played by each player. Make use of SPORT sheet to get the required data

    TASK 2: DATA FORMATING

    • Display MEMBER IDas always 3 digit number {Note: 001,002 ...,D2D,..etc)
    • Format the BIRTHDATE as dd mmm'yyyy (Prescribed format example: 09 May' 1986)
    • Display the units for the WEIGHT column (Prescribed format example: 80 kg)
    • Format the SALARY to show the data In thousands. If SALARY is less than 100,000 then display data with 2 decimal places else display data with one decimal place. In both cases units should be thousands (k) e.g. 87670 -> 87.67 k and 12 250 -> 123.2 k

    TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:

    • In COLUMNS; Group : GENDER.
    • In ROWS; Group : COUNTRY (Note: use COUNTRY NAMES).
    • In VALUES; calculate the count of candidates from each COUNTRY and GENDER type, Remove GRAND TOTALs.

    TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:

    • Starting from range RANGE H4; get the distinct GENDER. Use remove duplicates option and transpose the data.
    • Starting from range RANGE GS; get the distinct COUNTRY (Note: use COUNTRY NAMES).
    • In the cross table,get the count of candidates from each COUNTRY and GENDER type.

    TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:

    • Change the report layout to TABULAR form.
    • Remove expand and collapse buttons.
    • Remove GRAND TOTALs.
    • Allow user to filter the data by SPORT LOCATION.

    Process

    • Verify data for any missing values and anomalies, and sort out the same.
    • Made sure data is consistent and clean with respect to data type, data format and values used.
    • Created pivot tables according to the questions asked.
  19. u

    Environmental and Clean Technology Products Economic Account, supply and use...

    • data.urbandatacentre.ca
    Updated Oct 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Environmental and Clean Technology Products Economic Account, supply and use table, inactive - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-371f89ee-53d9-4a19-be4c-d784ebe9eccd
    Explore at:
    Dataset updated
    Oct 19, 2025
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    This table contains 25 series, with data for years 2007 - 2016 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (Not all combinations are available): Geography (1 item: Canada); Economic variable (9 items: Total supply; Output; Margins; Imports; ...); Product and service (9 items: Total, products and services; Total, electricity; From nuclear; From renewable sources; ...).

  20. m

    Supplementary Table II REVISED CLEAN COPY

    • data.mendeley.com
    Updated Oct 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tejas Joshi (2024). Supplementary Table II REVISED CLEAN COPY [Dataset]. http://doi.org/10.17632/nkmvz5dh6f.1
    Explore at:
    Dataset updated
    Oct 1, 2024
    Authors
    Tejas Joshi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Revised supplementary table II for JAAD manuscript "Association of cutaneous leiomyosarcoma with subsequent primary malignancies: a population-based analysis"

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Daniel San Jose Pro (2025). clean-up-table [Dataset]. https://huggingface.co/datasets/danielsanjosepro/clean-up-table

clean-up-table

danielsanjosepro/clean-up-table

Explore at:
474 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 28, 2025
Authors
Daniel San Jose Pro
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset was created using LeRobot.

  Dataset Structure

meta/info.json: { "codebase_version": "v2.1", "robot_type": "franka", "total_episodes": 51, "total_frames": 28867, "total_tasks": 1, "total_videos": 0, "total_chunks": 1, "chunks_size": 1000, "fps": 20, "splits": { "train": "0:51" }, "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet", "video_path": null, "features": {… See the full description on the dataset page: https://huggingface.co/datasets/danielsanjosepro/clean-up-table.

Search
Clear search
Close search
Google apps
Main menu