100+ datasets found

Top 2500 Kaggle Datasets
kaggle.com
Updated Feb 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7637365
Dataset updated
Feb 16, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Saket Kumar
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

Column Definitions:

Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.
Customer Dataset csv
kaggle.com
zip
Updated Mar 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moses Moncy (2023). Customer Dataset csv [Dataset]. https://www.kaggle.com/datasets/mosesmoncy/customer-dataset-csv
Explore at:
zip(348492 bytes)Available download formats
Dataset updated
Mar 22, 2023
Authors
Moses Moncy
Description
Dataset

This dataset was created by Moses Moncy

Contents
Top 1000 Kaggle Datasets
kaggle.com
zip
Updated Jan 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Trrishan (2022). Top 1000 Kaggle Datasets [Dataset]. https://www.kaggle.com/datasets/notkrishna/top-1000-kaggle-datasets
Explore at:
zip(34269 bytes)Available download formats
Dataset updated
Jan 3, 2022
Authors
Trrishan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
From wiki

Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

Source: Kaggle
Images in CSV datasets
kaggle.com
zip
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pascal (2024). Images in CSV datasets [Dataset]. https://www.kaggle.com/datasets/pyim59/images-in-csv-datasets
Explore at:
zip(347504240 bytes)Available download formats
Dataset updated
Oct 14, 2024
Authors
Pascal
Description
Images sous forme de fichiers CSV pour une application de méthodes de machine learning "classiques" Ces datasets sont utilisés pour le cours de Centrale Lille sur le Machine Learning de Pascal Yim

"mnist_big.csv"

Reconnaissance d'images de chiffres manuscrits

Version "mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

Source : https://www.kaggle.com/datasets/oddrationale/mnist-in-csv

"sign_mnist_big.csv"

Reconnaissance d'images de gestes de la langue des signes

Version "sign_mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

Source : https://www.kaggle.com/datasets/datamunge/sign-language-mnist

"zalando_small.csv"

Reconnaissance de vêtements et chaussures (Zalando)

Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

"hmnist_8_8_RGB.csv"

Reconnaissance de tumeurs de la peau (images en couleurs, trois valeurs R,G,B par pixel)

Autres versions avec des images plus petites et/ou en niveaux de gris

Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

"cifar10_small.csv"

Reconnaissance de petites images en couleurs dans 10 catégories Version en CSV du dataset CIFAR10

Source : https://www.kaggle.com/datasets/fedesoriano/cifar10-python-in-csv?select=train.csv
Url Dataset
kaggle.com
zip
Updated May 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TeseRact (2018). Url Dataset [Dataset]. https://www.kaggle.com/datasets/teseract/urldataset
Explore at:
zip(6911526 bytes)Available download formats
Dataset updated
May 18, 2018
Authors
TeseRact
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by TeseRact

Released under CC0: Public Domain

Contents
Sample CSV Datasets
kaggle.com
zip
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SOURAV S V (2023). Sample CSV Datasets [Dataset]. https://www.kaggle.com/datasets/souravsv/sample-csv-datasets
Explore at:
zip(14455964 bytes)Available download formats
Dataset updated
Nov 30, 2023
Authors
SOURAV S V
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by SOURAV S V

Released under CC0: Public Domain

Contents
train csv file
kaggle.com
zip
Updated May 5, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emmanuel Arias (2018). train csv file [Dataset]. https://www.kaggle.com/datasets/eamanu/train
Explore at:
zip(33695 bytes)Available download formats
Dataset updated
May 5, 2018
Authors
Emmanuel Arias
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Dataset

This dataset was created by Emmanuel Arias

Released under Database: Open Database, Contents: Database Contents

Contents
sales dataset
kaggle.com
zip
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VINOTH KANNA S (2025). sales dataset [Dataset]. https://www.kaggle.com/datasets/vinothkannaece/sales-dataset
Explore at:
zip(27634 bytes)Available download formats
Dataset updated
Feb 18, 2025
Authors
VINOTH KANNA S
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Sales Data Description This dataset represents synthetic sales data generated for practice purposes only. It is not real-time or based on actual business operations, and should be used solely for educational or testing purposes. The dataset contains information that simulates sales transactions across different products, regions, and customers. Each row represents an individual sale event with various details associated with it.

Columns in the Dataset

Product_ID: Unique identifier for each product sold. Randomly generated for practice purposes.

Sale_Date: The date when the sale occurred. Randomly selected from the year 2023.

Sales_Rep: The sales representative responsible for the transaction. The dataset includes five random sales representatives (Alice, Bob, Charlie, David, Eve).

Region: The region where the sale took place. The possible regions are North, South, East, and West.

Sales_Amount: The total sales amount for the transaction, including discounts if any. Values range from 100 to 10,000 (in currency units).

Quantity_Sold: The number of units sold in that transaction, randomly generated between 1 and 50.

Product_Category: The category of the product sold. Categories include Electronics, Furniture, Clothing, and Food.

Unit_Cost: The cost per unit of the product sold, randomly generated between 50 and 5000 currency units.

Unit_Price: The selling price per unit of the product, calculated to be higher than the unit cost.

Customer_Type: Indicates whether the customer is a New or Returning customer.

Discount: The discount applied to the sale, randomly chosen between 0% and 30%.

Payment_Method: The method of payment used by the customer (e.g., Credit Card, Cash, Bank Transfer).

Sales_Channel: The channel through which the sale occurred. Either Online or Retail.

Region_and_Sales_Rep: A combined column that pairs the region and sales representative for easier tracking.

Disclaimer

Please note: This data was randomly generated and is intended solely for practice, learning, or testing. It does not reflect real-world sales, customers, or businesses, and should not be considered reliable for any real-time analysis or decision-making.
Power BI dataset
kaggle.com
zip
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmadali Jamali (2023). Power BI dataset [Dataset]. https://www.kaggle.com/datasets/ahmadalijamali/dataset
Explore at:
zip(1642 bytes)Available download formats
Dataset updated
Oct 31, 2023
Authors
Ahmadali Jamali
License
https://www.licenses.ai/ai-licenseshttps://www.licenses.ai/ai-licenses
Description
Tabular dataset for data analysis and machine learning practice. The dataset is about the market and is usable for Power BI practice and data science.
Data from: Global Superstore Dataset
kaggle.com
zip
Updated Nov 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatih İlhan (2023). Global Superstore Dataset [Dataset]. https://www.kaggle.com/datasets/fatihilhan/global-superstore-dataset
Explore at:
zip(3349507 bytes)Available download formats
Dataset updated
Nov 16, 2023
Authors
Fatih İlhan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
About this file The Kaggle Global Superstore dataset is a comprehensive dataset containing information about sales and orders in a global superstore. It is a valuable resource for data analysis and visualization tasks. This dataset has been processed and transformed from its original format (txt) to CSV using the R programming language. The original dataset is available here, and the transformed CSV file used in this analysis can be found here.

Here is a description of the columns in the dataset:

category: The category of products sold in the superstore.

city: The city where the order was placed.

country: The country in which the superstore is located.

customer_id: A unique identifier for each customer.

customer_name: The name of the customer who placed the order.

discount: The discount applied to the order.

market: The market or region where the superstore operates.

ji_lu_shu: An unknown or unspecified column.

order_date: The date when the order was placed.

order_id: A unique identifier for each order.

order_priority: The priority level of the order.

product_id: A unique identifier for each product.

product_name: The name of the product.

profit: The profit generated from the order.

quantity: The quantity of products ordered.

region: The region where the order was placed.

row_id: A unique identifier for each row in the dataset.

sales: The total sales amount for the order.

segment: The customer segment (e.g., consumer, corporate, or home office).

ship_date: The date when the order was shipped.

ship_mode: The shipping mode used for the order.

shipping_cost: The cost of shipping for the order.

state: The state or region within the country.

sub_category: The sub-category of products within the main category.

year: The year in which the order was placed.

market2: Another column related to market information.

weeknum: The week number when the order was placed.

This dataset can be used for various data analysis tasks, including understanding sales patterns, customer behavior, and profitability in the context of a global superstore.
UCI-dataset
kaggle.com
zip
Updated Aug 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Waquar Azam (2022). UCI-dataset [Dataset]. https://www.kaggle.com/datasets/mdwaquarazam/ucidatasetlist
Explore at:
zip(20774 bytes)Available download formats
Dataset updated
Aug 17, 2022
Authors
Md Waquar Azam
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is about list of dataset provided by UCI ML , If you are a learner and want some data on the basis of year ,categories, profession or some other criteria you search it from here.

There are 8 rows in the dataset in which all details are given. --link --Data-Name --data type --default task --attribute-type --instances --attributes --year

Some missing values are present there also,

You can analyse the as per your requirement

EDA
SQUAD 2.0 - csv format
kaggle.com
zip
Updated Apr 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Parth Chokhra (2020). SQUAD 2.0 - csv format [Dataset]. https://www.kaggle.com/datasets/parthplc/squad-20-csv-file
Explore at:
zip(9887206 bytes)Available download formats
Dataset updated
Apr 15, 2020
Authors
Parth Chokhra
Description
Dataset

This dataset was created by Parth Chokhra

Contents
People
kaggle.com
zip
Updated Jan 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aung M. Myat (2023). People [Dataset]. https://www.kaggle.com/datasets/aungdev/people-dataset
Explore at:
zip(581 bytes)Available download formats
Dataset updated
Jan 27, 2023
Authors
Aung M. Myat
Description
The dataset contains randomly generated persons' data. It is created to be used in explaining data science. It currently contains the following columns: - Name - Gender - Skin Color - Height(cm) - Weight(m) - Date of Birth
Natural Questions Dataset
kaggle.com
zip
Updated Mar 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fujoos (2024). Natural Questions Dataset [Dataset]. https://www.kaggle.com/datasets/frankossai/natural-questions-dataset
Explore at:
zip(116502047 bytes)Available download formats
Dataset updated
Mar 15, 2024
Authors
fujoos
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Context

The Natural Questions (NQ) dataset is a comprehensive collection of real user queries submitted to Google Search, with answers sourced from Wikipedia by expert annotators. Created by Google AI Research, this dataset aims to support the development and evaluation of advanced automated question-answering systems. The version provided here includes 89,312 meticulously annotated entries, tailored for ease of access and utility in natural language processing (NLP) and machine learning (ML) research.

Data Collection

The dataset is composed of authentic search queries from Google Search, reflecting the wide range of information sought by users globally. This approach ensures a realistic and diverse set of questions for NLP applications.

Data Pre-processing

The NQ dataset underwent significant pre-processing to prepare it for NLP tasks: - Removal of web-specific elements like URLs, hashtags, user mentions, and special characters using Python's "BeautifulSoup" and "regex" libraries. - Grammatical error identification and correction using the "LanguageTool" library, an open-source grammar, style, and spell checker.

These steps were taken to clean and simplify the text while retaining the essence of the questions and their answers, divided into 'questions', 'long answers', and 'short answers'.

Data Storage

The unprocessed data, including answers with embedded HTML, empty or complex long and short answers, is stored in "Natural-Questions-Base.csv". This version retains the raw structure of the data, featuring HTML elements in answers, and varied answer formats such as tables and lists, providing a comprehensive view for those interested in the original dataset's complexity and richness. The processed data is compiled into a single CSV file named "Natural-Questions-Filtered.csv". The file is structured for easy access and analysis, with each record containing the processed question, a detailed answer, and concise answer snippets.

Filtered Results

The filtered version is available where specific criteria, such as question length or answer complexity, were applied to refine the data further. This version allows for more focused research and application development.

Flask CSV Reader App

The repository at 'https://github.com/fujoos/natural_questions' also includes a Flask-based CSV reader application designed to read and display contents from the "NaturalQuestions.csv" file. The app provides functionalities such as: - Viewing questions and answers directly in your browser. - Filtering results based on criteria like question keywords or answer length. -See the live demo using the csv files converted to slite db at 'https://fujoos.pythonanywhere.com/'
Retail Product Dataset with Missing Values
kaggle.com
zip
Updated Feb 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himel Sarder (2025). Retail Product Dataset with Missing Values [Dataset]. https://www.kaggle.com/datasets/himelsarder/retail-product-dataset-with-missing-values
Explore at:
zip(47826 bytes)Available download formats
Dataset updated
Feb 17, 2025
Authors
Himel Sarder
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).

The dataset includes:
- Category (Categorical): Product category (A, B, C, D)
- Price (Numerical): Randomized product prices
- Rating (Numerical): Ratings between 1 to 5
- Stock (Categorical): Availability status (In Stock, Out of Stock)
- Discount (Numerical): Discount percentage

This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.
Subjective Question Answer Dataset
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pokhrel Arahanta (2023). Subjective Question Answer Dataset [Dataset]. https://www.kaggle.com/datasets/pokhrelarahanta/subjective-question-answer-dataset
Explore at:
zip(1993150 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
Pokhrel Arahanta
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Datasets containing Paragraphs :4118 Question1: 4118 Question2: 4118 Question3: 4118 Answer1: 4118 Answer2: 4118 Answer3 : 4118 were collected during data collection. It includes paragraphs consisting of related question-answer pairs. Each paragraph will have 3 questions and 3 answers. The dataset is stored as a Comma-Separated Values file (.csv). The dataset has been collected manually and subsequently cleaned and filtered. This laborious and time-consuming process was undertaken with the utmost care and dedication to craft a high-quality dataset specifically designed for generating extractive subjective questions and answers from the provided input paragraphs.
Global Country Information Dataset 2023
kaggle.com
zip
Updated Jul 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nidula Elgiriyewithana ⚡ (2023). Global Country Information Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023
Explore at:
zip(24063 bytes)Available download formats
Dataset updated
Jul 8, 2023
Authors
Nidula Elgiriyewithana ⚡
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

Key Features

Country: Name of the country.

Density (P/Km2): Population density measured in persons per square kilometer.

Abbreviation: Abbreviation or code representing the country.

Agricultural Land (%): Percentage of land area used for agricultural purposes.

Land Area (Km2): Total land area of the country in square kilometers.

Armed Forces Size: Size of the armed forces in the country.

Birth Rate: Number of births per 1,000 population per year.

Calling Code: International calling code for the country.

Capital/Major City: Name of the capital or major city.

CO2 Emissions: Carbon dioxide emissions in tons.

CPI: Consumer Price Index, a measure of inflation and purchasing power.

CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.

Currency_Code: Currency code used in the country.

Fertility Rate: Average number of children born to a woman during her lifetime.

Forested Area (%): Percentage of land area covered by forests.

Gasoline_Price: Price of gasoline per liter in local currency.

GDP: Gross Domestic Product, the total value of goods and services produced in the country.

Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.

Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.

Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.

Largest City: Name of the country's largest city.

Life Expectancy: Average number of years a newborn is expected to live.

Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.

Minimum Wage: Minimum wage level in local currency.

Official Language: Official language(s) spoken in the country.

Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.

Physicians per Thousand: Number of physicians per thousand people.

Population: Total population of the country.

Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.

Tax Revenue (%): Tax revenue as a percentage of GDP.

Total Tax Rate: Overall tax burden as a percentage of commercial profits.

Unemployment Rate: Percentage of the labor force that is unemployed.

Urban Population: Percentage of the population living in urban areas.

Latitude: Latitude coordinate of the country's location.

Longitude: Longitude coordinate of the country's location.

Potential Use Cases

Analyze population density and land area to study spatial distribution patterns.

Investigate the relationship between agricultural land and food security.

Examine carbon dioxide emissions and their impact on climate change.

Explore correlations between economic indicators such as GDP and various socio-economic factors.

Investigate educational enrollment rates and their implications for human capital development.

Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.

Study labor market dynamics through indicators such as labor force participation and unemployment rates.

Investigate the role of taxation and its impact on economic development.

Explore urbanization trends and their social and environmental consequences.

Data Source: This dataset was compiled from multiple data sources

If this was helpful, a vote is appreciated ❤️ Thank you 🙂
MHEALTH Dataset Data Set CSV
kaggle.com
zip
Updated Jan 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmal Sankalana (2023). MHEALTH Dataset Data Set CSV [Dataset]. https://www.kaggle.com/datasets/nirmalsankalana/mhealth-dataset-data-set-csv
Explore at:
zip(78174751 bytes)Available download formats
Dataset updated
Jan 4, 2023
Authors
Nirmal Sankalana
Description
Source:

Oresti Banos, Department of Computer Architecture and Computer Technology, University of Granada Rafael Garcia, Department of Computer Architecture and Computer Technology, University of Granada Alejandro Saez, Department of Computer Architecture and Computer Technology, University of Granada

Email to whom correspondence should be addressed: oresti '@' ugr.es (oresti.bl '@' gmail.com)

Data Set Information:

The MHEALTH (Mobile HEALTH) dataset comprises body motion and vital signs recordings for ten volunteers of the diverse profile while performing several physical activities. Sensors placed on the subject's chest, right wrist, and left ankle are used to measure the motion experienced by diverse body parts, namely, acceleration, rate of turn, and magnetic field orientation. The sensor positioned on the chest also provides 2-lead ECG measurements, which can be potentially used for basic heart monitoring, checking for various arrhythmias, or looking at the effects of exercise on the ECG.

DATASET SUMMARY:

Activities: 12

Sensor devices: 3

Subjects: 10

EXPERIMENTAL SETUP

The collected dataset comprises body motion and vital signs recordings for ten volunteers of the diverse profile while performing 12 physical activities (Table 1). Shimmer2 [BUR10] wearable sensors were used for the recordings. The sensors were respectively placed on the subject's chest, right wrist, and left ankle and attached by using elastic straps (as shown in the figure in the attachment). The use of multiple sensors permits us to measure the motion experienced by diverse body parts, namely, the acceleration, the rate of turn, and the magnetic field orientation, thus better capturing the body dynamics. The sensor positioned on the chest also provides 2-lead ECG measurements which are not used for the development of the recognition model but rather collected for future work purposes. This information can be used, for example, for basic heart monitoring, checking for various arrhythmias, or looking at the effects of exercise on the ECG. All sensing modalities are recorded at a sampling rate of 50 Hz, which is considered sufficient for capturing human activity. Each session was recorded using a video camera. This dataset is found to generalize to common activities of daily living, given the diversity of body parts involved in each one (e.g., the frontal elevation of arms vs. knees bending), the intensity of the actions (e.g., cycling vs. sitting and relaxing) and their execution speed or dynamicity (e.g., running vs. standing still). The activities were collected in an out-of-lab environment with no constraints on the way these must be executed, with the exception that the subject should try their best when executing them.

ACTIVITY SET

The activity set is listed in the following: L1: Standing still (1 min) L2: Sitting and relaxing (1 min) L3: Lying down (1 min) L4: Walking (1 min) L5: Climbing stairs (1 min) L6: Waist bends forward (20x) L7: Frontal elevation of arms (20x) L8: Knees bending (crouching) (20x) L9: Cycling (1 min) L10: Jogging (1 min) L11: Running (1 min) L12: Jump front & back (20x) NOTE: In brackets are the number of repetitions (Nx) or the duration of the exercises (min).

A complete and illustrated description (including table of activities, sensor setup, etc.) of the dataset is provided in the papers presented in the section â€œCitation Requestsâ€ .

Attribute Information:

The data collected for each subject is stored in a different log file: 'mHealth_subject.log'. Each file contains the samples (by rows) recorded for all sensors (by columns). The labels used to identify the activities are similar to the abovementioned (e.g., the label for walking is '4').

The meaning of each column is detailed next: Column 1: acceleration from the chest sensor (X-axis) Column 2: acceleration from the chest sensor (Y axis) Column 3: acceleration from the chest sensor (Z axis) Column 4: electrocardiogram signal (lead 1) Column 5: electrocardiogram signal (lead 2) Column 6: acceleration from the left-ankle sensor (X-axis) Column 7: acceleration from the left-ankle sensor (Y axis) Column 8: acceleration from the left-ankle sensor (Z axis) Column 9: gyro from the left-ankle sensor (X-axis) Column 10: gyro from the left-ankle sensor (Y axis) Column 11: gyro from the left-ankle sensor (Z axis) Column 13: magnetometer from the left-ankle sensor (X-axis) Column 13: magnetometer from the left-ankle sensor (Y axis) Column 14: magnetometer from the left-ankle sensor (Z axis) Column 15: acceleration from the right-lower-arm sensor (X-axis) Column 16: acceleration from the right-lower-arm sensor (Y axis) Column 17: acceleration from the right-lower-arm sensor (Z axis) Column 18: gyro from the right-lower-arm sensor (X-axis) Column 19: gyro from the right-lower-arm sensor (Y axis) Column 20: gyro fro...
Text Document Classification Dataset
kaggle.com
zip
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sunil thite (2023). Text Document Classification Dataset [Dataset]. https://www.kaggle.com/datasets/sunilthite/text-document-classification-dataset
Explore at:
zip(1941393 bytes)Available download formats
Dataset updated
Dec 4, 2023
Authors
sunil thite
Description
This is text document classification dataset which contains 2225 text data and five categories of documents. Five categories are politics, sport, tech, entertainment and business. We can use this dataset for documents classification and document clustering.

About Dataset - Dataset contains two features text and label. - No. of Rows : 2225 - No. of Columns : 2

Text: It contains different categories of text data Label: It contains labels for five different categories : 0,1,2,3,4

Politics = 0

Sport = 1

Technology = 2

Entertainment =3

Business = 4
Users dataset in csv format.
kaggle.com
zip
Updated Apr 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VictorDaniloCastanedaPinzon (2022). Users dataset in csv format. [Dataset]. https://www.kaggle.com/datasets/vicdancastpinz/users-dataset-in-csv-format
Explore at:
zip(5347 bytes)Available download formats
Dataset updated
Apr 24, 2022
Authors
VictorDaniloCastanedaPinzon
Description
Dataset

This dataset was created by VictorDaniloCastanedaPinzon

Contents

Facebook

Twitter

Click to copy link

Link copied

Cite

Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365

Top 2500 Kaggle Datasets

Explore, Analyze, Innovate: The Best of Kaggle's Data at Your Fingertips

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/7637365

Dataset updated

Feb 16, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Saket Kumar

License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

Column Definitions:

Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.

Clear search

Close search

Google apps

Main menu

Top 2500 Kaggle Datasets

Customer Dataset csv

Dataset

Contents

Top 1000 Kaggle Datasets

From wiki

Images in CSV datasets

"mnist_big.csv"

"sign_mnist_big.csv"

"zalando_small.csv"

"hmnist_8_8_RGB.csv"

"cifar10_small.csv"

Url Dataset

Dataset

Contents

Sample CSV Datasets

Dataset

Contents

train csv file

Dataset

Contents

sales dataset

Power BI dataset

Data from: Global Superstore Dataset

UCI-dataset

EDA

SQUAD 2.0 - csv format

Dataset

Contents

People

Natural Questions Dataset

Context

Data Collection

Data Pre-processing

Data Storage

Filtered Results

Flask CSV Reader App

Retail Product Dataset with Missing Values

Subjective Question Answer Dataset

Global Country Information Dataset 2023

Description

Key Features

Potential Use Cases

MHEALTH Dataset Data Set CSV

Source:

Data Set Information:

DATASET SUMMARY:

EXPERIMENTAL SETUP

ACTIVITY SET

Attribute Information:

Text Document Classification Dataset

Users dataset in csv format.

Dataset

Contents

Top 2500 Kaggle Datasets

Explore, Analyze, Innovate: The Best of Kaggle's Data at Your Fingertips