100+ datasets found
  1. Top 2500 Kaggle Datasets

    • kaggle.com
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saket Kumar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

    Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

    Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

    Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

    Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

    Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

    Column Definitions:

    Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.

  2. Customer Dataset csv

    • kaggle.com
    zip
    Updated Mar 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moses Moncy (2023). Customer Dataset csv [Dataset]. https://www.kaggle.com/datasets/mosesmoncy/customer-dataset-csv
    Explore at:
    zip(348492 bytes)Available download formats
    Dataset updated
    Mar 22, 2023
    Authors
    Moses Moncy
    Description

    Dataset

    This dataset was created by Moses Moncy

    Contents

  3. Top 1000 Kaggle Datasets

    • kaggle.com
    zip
    Updated Jan 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trrishan (2022). Top 1000 Kaggle Datasets [Dataset]. https://www.kaggle.com/datasets/notkrishna/top-1000-kaggle-datasets
    Explore at:
    zip(34269 bytes)Available download formats
    Dataset updated
    Jan 3, 2022
    Authors
    Trrishan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    From wiki

    Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

    Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

    Source: Kaggle

  4. Images in CSV datasets

    • kaggle.com
    zip
    Updated Oct 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pascal (2024). Images in CSV datasets [Dataset]. https://www.kaggle.com/datasets/pyim59/images-in-csv-datasets
    Explore at:
    zip(347504240 bytes)Available download formats
    Dataset updated
    Oct 14, 2024
    Authors
    Pascal
    Description

    Images sous forme de fichiers CSV pour une application de méthodes de machine learning "classiques" Ces datasets sont utilisés pour le cours de Centrale Lille sur le Machine Learning de Pascal Yim

    "mnist_big.csv"

    Reconnaissance d'images de chiffres manuscrits

    Version "mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

    Source : https://www.kaggle.com/datasets/oddrationale/mnist-in-csv

    "sign_mnist_big.csv"

    Reconnaissance d'images de gestes de la langue des signes

    Version "sign_mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

    Source : https://www.kaggle.com/datasets/datamunge/sign-language-mnist

    "zalando_small.csv"

    Reconnaissance de vêtements et chaussures (Zalando)

    Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

    "hmnist_8_8_RGB.csv"

    Reconnaissance de tumeurs de la peau (images en couleurs, trois valeurs R,G,B par pixel)

    Autres versions avec des images plus petites et/ou en niveaux de gris

    Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

    "cifar10_small.csv"

    Reconnaissance de petites images en couleurs dans 10 catégories Version en CSV du dataset CIFAR10

    Source : https://www.kaggle.com/datasets/fedesoriano/cifar10-python-in-csv?select=train.csv

  5. Url Dataset

    • kaggle.com
    zip
    Updated May 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TeseRact (2018). Url Dataset [Dataset]. https://www.kaggle.com/datasets/teseract/urldataset
    Explore at:
    zip(6911526 bytes)Available download formats
    Dataset updated
    May 18, 2018
    Authors
    TeseRact
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by TeseRact

    Released under CC0: Public Domain

    Contents

  6. Sample CSV Datasets

    • kaggle.com
    zip
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SOURAV S V (2023). Sample CSV Datasets [Dataset]. https://www.kaggle.com/datasets/souravsv/sample-csv-datasets
    Explore at:
    zip(14455964 bytes)Available download formats
    Dataset updated
    Nov 30, 2023
    Authors
    SOURAV S V
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by SOURAV S V

    Released under CC0: Public Domain

    Contents

  7. train csv file

    • kaggle.com
    zip
    Updated May 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emmanuel Arias (2018). train csv file [Dataset]. https://www.kaggle.com/datasets/eamanu/train
    Explore at:
    zip(33695 bytes)Available download formats
    Dataset updated
    May 5, 2018
    Authors
    Emmanuel Arias
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by Emmanuel Arias

    Released under Database: Open Database, Contents: Database Contents

    Contents

  8. sales dataset

    • kaggle.com
    zip
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VINOTH KANNA S (2025). sales dataset [Dataset]. https://www.kaggle.com/datasets/vinothkannaece/sales-dataset
    Explore at:
    zip(27634 bytes)Available download formats
    Dataset updated
    Feb 18, 2025
    Authors
    VINOTH KANNA S
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Sales Data Description This dataset represents synthetic sales data generated for practice purposes only. It is not real-time or based on actual business operations, and should be used solely for educational or testing purposes. The dataset contains information that simulates sales transactions across different products, regions, and customers. Each row represents an individual sale event with various details associated with it.

    Columns in the Dataset

    1. Product_ID: Unique identifier for each product sold. Randomly generated for practice purposes.
    2. Sale_Date: The date when the sale occurred. Randomly selected from the year 2023.
    3. Sales_Rep: The sales representative responsible for the transaction. The dataset includes five random sales representatives (Alice, Bob, Charlie, David, Eve).
    4. Region: The region where the sale took place. The possible regions are North, South, East, and West.
    5. Sales_Amount: The total sales amount for the transaction, including discounts if any. Values range from 100 to 10,000 (in currency units).
    6. Quantity_Sold: The number of units sold in that transaction, randomly generated between 1 and 50.
    7. Product_Category: The category of the product sold. Categories include Electronics, Furniture, Clothing, and Food.
    8. Unit_Cost: The cost per unit of the product sold, randomly generated between 50 and 5000 currency units.
    9. Unit_Price: The selling price per unit of the product, calculated to be higher than the unit cost.
    10. Customer_Type: Indicates whether the customer is a New or Returning customer.
    11. Discount: The discount applied to the sale, randomly chosen between 0% and 30%.
    12. Payment_Method: The method of payment used by the customer (e.g., Credit Card, Cash, Bank Transfer).
    13. Sales_Channel: The channel through which the sale occurred. Either Online or Retail.
    14. Region_and_Sales_Rep: A combined column that pairs the region and sales representative for easier tracking.

    Disclaimer

    Please note: This data was randomly generated and is intended solely for practice, learning, or testing. It does not reflect real-world sales, customers, or businesses, and should not be considered reliable for any real-time analysis or decision-making.

  9. Power BI dataset

    • kaggle.com
    zip
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmadali Jamali (2023). Power BI dataset [Dataset]. https://www.kaggle.com/datasets/ahmadalijamali/dataset
    Explore at:
    zip(1642 bytes)Available download formats
    Dataset updated
    Oct 31, 2023
    Authors
    Ahmadali Jamali
    License

    https://www.licenses.ai/ai-licenseshttps://www.licenses.ai/ai-licenses

    Description

    Tabular dataset for data analysis and machine learning practice. The dataset is about the market and is usable for Power BI practice and data science.

  10. Data from: Global Superstore Dataset

    • kaggle.com
    zip
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatih İlhan (2023). Global Superstore Dataset [Dataset]. https://www.kaggle.com/datasets/fatihilhan/global-superstore-dataset
    Explore at:
    zip(3349507 bytes)Available download formats
    Dataset updated
    Nov 16, 2023
    Authors
    Fatih İlhan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    About this file The Kaggle Global Superstore dataset is a comprehensive dataset containing information about sales and orders in a global superstore. It is a valuable resource for data analysis and visualization tasks. This dataset has been processed and transformed from its original format (txt) to CSV using the R programming language. The original dataset is available here, and the transformed CSV file used in this analysis can be found here.

    Here is a description of the columns in the dataset:

    category: The category of products sold in the superstore.

    city: The city where the order was placed.

    country: The country in which the superstore is located.

    customer_id: A unique identifier for each customer.

    customer_name: The name of the customer who placed the order.

    discount: The discount applied to the order.

    market: The market or region where the superstore operates.

    ji_lu_shu: An unknown or unspecified column.

    order_date: The date when the order was placed.

    order_id: A unique identifier for each order.

    order_priority: The priority level of the order.

    product_id: A unique identifier for each product.

    product_name: The name of the product.

    profit: The profit generated from the order.

    quantity: The quantity of products ordered.

    region: The region where the order was placed.

    row_id: A unique identifier for each row in the dataset.

    sales: The total sales amount for the order.

    segment: The customer segment (e.g., consumer, corporate, or home office).

    ship_date: The date when the order was shipped.

    ship_mode: The shipping mode used for the order.

    shipping_cost: The cost of shipping for the order.

    state: The state or region within the country.

    sub_category: The sub-category of products within the main category.

    year: The year in which the order was placed.

    market2: Another column related to market information.

    weeknum: The week number when the order was placed.

    This dataset can be used for various data analysis tasks, including understanding sales patterns, customer behavior, and profitability in the context of a global superstore.

  11. UCI-dataset

    • kaggle.com
    zip
    Updated Aug 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Waquar Azam (2022). UCI-dataset [Dataset]. https://www.kaggle.com/datasets/mdwaquarazam/ucidatasetlist
    Explore at:
    zip(20774 bytes)Available download formats
    Dataset updated
    Aug 17, 2022
    Authors
    Md Waquar Azam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is about list of dataset provided by UCI ML , If you are a learner and want some data on the basis of year ,categories, profession or some other criteria you search it from here.

    There are 8 rows in the dataset in which all details are given. --link --Data-Name --data type --default task --attribute-type --instances --attributes --year

    Some missing values are present there also,

    You can analyse the as per your requirement

    EDA

  12. SQUAD 2.0 - csv format

    • kaggle.com
    zip
    Updated Apr 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parth Chokhra (2020). SQUAD 2.0 - csv format [Dataset]. https://www.kaggle.com/datasets/parthplc/squad-20-csv-file
    Explore at:
    zip(9887206 bytes)Available download formats
    Dataset updated
    Apr 15, 2020
    Authors
    Parth Chokhra
    Description

    Dataset

    This dataset was created by Parth Chokhra

    Contents

  13. People

    • kaggle.com
    zip
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aung M. Myat (2023). People [Dataset]. https://www.kaggle.com/datasets/aungdev/people-dataset
    Explore at:
    zip(581 bytes)Available download formats
    Dataset updated
    Jan 27, 2023
    Authors
    Aung M. Myat
    Description

    The dataset contains randomly generated persons' data. It is created to be used in explaining data science. It currently contains the following columns: - Name - Gender - Skin Color - Height(cm) - Weight(m) - Date of Birth

  14. Natural Questions Dataset

    • kaggle.com
    zip
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fujoos (2024). Natural Questions Dataset [Dataset]. https://www.kaggle.com/datasets/frankossai/natural-questions-dataset
    Explore at:
    zip(116502047 bytes)Available download formats
    Dataset updated
    Mar 15, 2024
    Authors
    fujoos
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Context

    The Natural Questions (NQ) dataset is a comprehensive collection of real user queries submitted to Google Search, with answers sourced from Wikipedia by expert annotators. Created by Google AI Research, this dataset aims to support the development and evaluation of advanced automated question-answering systems. The version provided here includes 89,312 meticulously annotated entries, tailored for ease of access and utility in natural language processing (NLP) and machine learning (ML) research.

    Data Collection

    The dataset is composed of authentic search queries from Google Search, reflecting the wide range of information sought by users globally. This approach ensures a realistic and diverse set of questions for NLP applications.

    Data Pre-processing

    The NQ dataset underwent significant pre-processing to prepare it for NLP tasks: - Removal of web-specific elements like URLs, hashtags, user mentions, and special characters using Python's "BeautifulSoup" and "regex" libraries. - Grammatical error identification and correction using the "LanguageTool" library, an open-source grammar, style, and spell checker.

    These steps were taken to clean and simplify the text while retaining the essence of the questions and their answers, divided into 'questions', 'long answers', and 'short answers'.

    Data Storage

    The unprocessed data, including answers with embedded HTML, empty or complex long and short answers, is stored in "Natural-Questions-Base.csv". This version retains the raw structure of the data, featuring HTML elements in answers, and varied answer formats such as tables and lists, providing a comprehensive view for those interested in the original dataset's complexity and richness. The processed data is compiled into a single CSV file named "Natural-Questions-Filtered.csv". The file is structured for easy access and analysis, with each record containing the processed question, a detailed answer, and concise answer snippets.

    Filtered Results

    The filtered version is available where specific criteria, such as question length or answer complexity, were applied to refine the data further. This version allows for more focused research and application development.

    Flask CSV Reader App

    The repository at 'https://github.com/fujoos/natural_questions' also includes a Flask-based CSV reader application designed to read and display contents from the "NaturalQuestions.csv" file. The app provides functionalities such as: - Viewing questions and answers directly in your browser. - Filtering results based on criteria like question keywords or answer length. -See the live demo using the csv files converted to slite db at 'https://fujoos.pythonanywhere.com/'

  15. Retail Product Dataset with Missing Values

    • kaggle.com
    zip
    Updated Feb 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himel Sarder (2025). Retail Product Dataset with Missing Values [Dataset]. https://www.kaggle.com/datasets/himelsarder/retail-product-dataset-with-missing-values
    Explore at:
    zip(47826 bytes)Available download formats
    Dataset updated
    Feb 17, 2025
    Authors
    Himel Sarder
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).

    The dataset includes:
    - Category (Categorical): Product category (A, B, C, D)
    - Price (Numerical): Randomized product prices
    - Rating (Numerical): Ratings between 1 to 5
    - Stock (Categorical): Availability status (In Stock, Out of Stock)
    - Discount (Numerical): Discount percentage

    This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.

  16. Subjective Question Answer Dataset

    • kaggle.com
    zip
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pokhrel Arahanta (2023). Subjective Question Answer Dataset [Dataset]. https://www.kaggle.com/datasets/pokhrelarahanta/subjective-question-answer-dataset
    Explore at:
    zip(1993150 bytes)Available download formats
    Dataset updated
    Nov 24, 2023
    Authors
    Pokhrel Arahanta
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Datasets containing Paragraphs :4118 Question1: 4118 Question2: 4118 Question3: 4118 Answer1: 4118 Answer2: 4118 Answer3 : 4118 were collected during data collection. It includes paragraphs consisting of related question-answer pairs. Each paragraph will have 3 questions and 3 answers. The dataset is stored as a Comma-Separated Values file (.csv). The dataset has been collected manually and subsequently cleaned and filtered. This laborious and time-consuming process was undertaken with the utmost care and dedication to craft a high-quality dataset specifically designed for generating extractive subjective questions and answers from the provided input paragraphs.

  17. Global Country Information Dataset 2023

    • kaggle.com
    zip
    Updated Jul 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidula Elgiriyewithana ⚡ (2023). Global Country Information Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023
    Explore at:
    zip(24063 bytes)Available download formats
    Dataset updated
    Jul 8, 2023
    Authors
    Nidula Elgiriyewithana ⚡
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

    DOI

    Key Features

    • Country: Name of the country.
    • Density (P/Km2): Population density measured in persons per square kilometer.
    • Abbreviation: Abbreviation or code representing the country.
    • Agricultural Land (%): Percentage of land area used for agricultural purposes.
    • Land Area (Km2): Total land area of the country in square kilometers.
    • Armed Forces Size: Size of the armed forces in the country.
    • Birth Rate: Number of births per 1,000 population per year.
    • Calling Code: International calling code for the country.
    • Capital/Major City: Name of the capital or major city.
    • CO2 Emissions: Carbon dioxide emissions in tons.
    • CPI: Consumer Price Index, a measure of inflation and purchasing power.
    • CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
    • Currency_Code: Currency code used in the country.
    • Fertility Rate: Average number of children born to a woman during her lifetime.
    • Forested Area (%): Percentage of land area covered by forests.
    • Gasoline_Price: Price of gasoline per liter in local currency.
    • GDP: Gross Domestic Product, the total value of goods and services produced in the country.
    • Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
    • Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
    • Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
    • Largest City: Name of the country's largest city.
    • Life Expectancy: Average number of years a newborn is expected to live.
    • Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
    • Minimum Wage: Minimum wage level in local currency.
    • Official Language: Official language(s) spoken in the country.
    • Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
    • Physicians per Thousand: Number of physicians per thousand people.
    • Population: Total population of the country.
    • Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
    • Tax Revenue (%): Tax revenue as a percentage of GDP.
    • Total Tax Rate: Overall tax burden as a percentage of commercial profits.
    • Unemployment Rate: Percentage of the labor force that is unemployed.
    • Urban Population: Percentage of the population living in urban areas.
    • Latitude: Latitude coordinate of the country's location.
    • Longitude: Longitude coordinate of the country's location.

    Potential Use Cases

    • Analyze population density and land area to study spatial distribution patterns.
    • Investigate the relationship between agricultural land and food security.
    • Examine carbon dioxide emissions and their impact on climate change.
    • Explore correlations between economic indicators such as GDP and various socio-economic factors.
    • Investigate educational enrollment rates and their implications for human capital development.
    • Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
    • Study labor market dynamics through indicators such as labor force participation and unemployment rates.
    • Investigate the role of taxation and its impact on economic development.
    • Explore urbanization trends and their social and environmental consequences.

    Data Source: This dataset was compiled from multiple data sources

    If this was helpful, a vote is appreciated ❤️ Thank you 🙂

  18. MHEALTH Dataset Data Set CSV

    • kaggle.com
    zip
    Updated Jan 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmal Sankalana (2023). MHEALTH Dataset Data Set CSV [Dataset]. https://www.kaggle.com/datasets/nirmalsankalana/mhealth-dataset-data-set-csv
    Explore at:
    zip(78174751 bytes)Available download formats
    Dataset updated
    Jan 4, 2023
    Authors
    Nirmal Sankalana
    Description

    Source:

    Oresti Banos, Department of Computer Architecture and Computer Technology, University of Granada Rafael Garcia, Department of Computer Architecture and Computer Technology, University of Granada Alejandro Saez, Department of Computer Architecture and Computer Technology, University of Granada

    Email to whom correspondence should be addressed: oresti '@' ugr.es (oresti.bl '@' gmail.com)

    Data Set Information:

    The MHEALTH (Mobile HEALTH) dataset comprises body motion and vital signs recordings for ten volunteers of the diverse profile while performing several physical activities. Sensors placed on the subject's chest, right wrist, and left ankle are used to measure the motion experienced by diverse body parts, namely, acceleration, rate of turn, and magnetic field orientation. The sensor positioned on the chest also provides 2-lead ECG measurements, which can be potentially used for basic heart monitoring, checking for various arrhythmias, or looking at the effects of exercise on the ECG.

    DATASET SUMMARY:

    • Activities: 12
    • Sensor devices: 3
    • Subjects: 10

    EXPERIMENTAL SETUP

    The collected dataset comprises body motion and vital signs recordings for ten volunteers of the diverse profile while performing 12 physical activities (Table 1). Shimmer2 [BUR10] wearable sensors were used for the recordings. The sensors were respectively placed on the subject's chest, right wrist, and left ankle and attached by using elastic straps (as shown in the figure in the attachment). The use of multiple sensors permits us to measure the motion experienced by diverse body parts, namely, the acceleration, the rate of turn, and the magnetic field orientation, thus better capturing the body dynamics. The sensor positioned on the chest also provides 2-lead ECG measurements which are not used for the development of the recognition model but rather collected for future work purposes. This information can be used, for example, for basic heart monitoring, checking for various arrhythmias, or looking at the effects of exercise on the ECG. All sensing modalities are recorded at a sampling rate of 50 Hz, which is considered sufficient for capturing human activity. Each session was recorded using a video camera. This dataset is found to generalize to common activities of daily living, given the diversity of body parts involved in each one (e.g., the frontal elevation of arms vs. knees bending), the intensity of the actions (e.g., cycling vs. sitting and relaxing) and their execution speed or dynamicity (e.g., running vs. standing still). The activities were collected in an out-of-lab environment with no constraints on the way these must be executed, with the exception that the subject should try their best when executing them.

    ACTIVITY SET

    The activity set is listed in the following: L1: Standing still (1 min) L2: Sitting and relaxing (1 min) L3: Lying down (1 min) L4: Walking (1 min) L5: Climbing stairs (1 min) L6: Waist bends forward (20x) L7: Frontal elevation of arms (20x) L8: Knees bending (crouching) (20x) L9: Cycling (1 min) L10: Jogging (1 min) L11: Running (1 min) L12: Jump front & back (20x) NOTE: In brackets are the number of repetitions (Nx) or the duration of the exercises (min).

    A complete and illustrated description (including table of activities, sensor setup, etc.) of the dataset is provided in the papers presented in the section “Citation Requests†.

    Attribute Information:

    The data collected for each subject is stored in a different log file: 'mHealth_subject.log'. Each file contains the samples (by rows) recorded for all sensors (by columns). The labels used to identify the activities are similar to the abovementioned (e.g., the label for walking is '4').

    The meaning of each column is detailed next: Column 1: acceleration from the chest sensor (X-axis) Column 2: acceleration from the chest sensor (Y axis) Column 3: acceleration from the chest sensor (Z axis) Column 4: electrocardiogram signal (lead 1) Column 5: electrocardiogram signal (lead 2) Column 6: acceleration from the left-ankle sensor (X-axis) Column 7: acceleration from the left-ankle sensor (Y axis) Column 8: acceleration from the left-ankle sensor (Z axis) Column 9: gyro from the left-ankle sensor (X-axis) Column 10: gyro from the left-ankle sensor (Y axis) Column 11: gyro from the left-ankle sensor (Z axis) Column 13: magnetometer from the left-ankle sensor (X-axis) Column 13: magnetometer from the left-ankle sensor (Y axis) Column 14: magnetometer from the left-ankle sensor (Z axis) Column 15: acceleration from the right-lower-arm sensor (X-axis) Column 16: acceleration from the right-lower-arm sensor (Y axis) Column 17: acceleration from the right-lower-arm sensor (Z axis) Column 18: gyro from the right-lower-arm sensor (X-axis) Column 19: gyro from the right-lower-arm sensor (Y axis) Column 20: gyro fro...

  19. Text Document Classification Dataset

    • kaggle.com
    zip
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sunil thite (2023). Text Document Classification Dataset [Dataset]. https://www.kaggle.com/datasets/sunilthite/text-document-classification-dataset
    Explore at:
    zip(1941393 bytes)Available download formats
    Dataset updated
    Dec 4, 2023
    Authors
    sunil thite
    Description

    This is text document classification dataset which contains 2225 text data and five categories of documents. Five categories are politics, sport, tech, entertainment and business. We can use this dataset for documents classification and document clustering.

    About Dataset - Dataset contains two features text and label. - No. of Rows : 2225 - No. of Columns : 2

    Text: It contains different categories of text data Label: It contains labels for five different categories : 0,1,2,3,4

    1. Politics = 0
    2. Sport = 1
    3. Technology = 2
    4. Entertainment =3
    5. Business = 4
  20. Users dataset in csv format.

    • kaggle.com
    zip
    Updated Apr 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VictorDaniloCastanedaPinzon (2022). Users dataset in csv format. [Dataset]. https://www.kaggle.com/datasets/vicdancastpinz/users-dataset-in-csv-format
    Explore at:
    zip(5347 bytes)Available download formats
    Dataset updated
    Apr 24, 2022
    Authors
    VictorDaniloCastanedaPinzon
    Description

    Dataset

    This dataset was created by VictorDaniloCastanedaPinzon

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365
Organization logo

Top 2500 Kaggle Datasets

Explore, Analyze, Innovate: The Best of Kaggle's Data at Your Fingertips

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Saket Kumar
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

Column Definitions:

Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.

Search
Clear search
Close search
Google apps
Main menu