Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.
Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.
Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.
Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.
Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.
Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.
Column Definitions:
Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.
Facebook
TwitterThis dataset was created by Moses Moncy
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.
Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]
Source: Kaggle
Facebook
TwitterImages sous forme de fichiers CSV pour une application de méthodes de machine learning "classiques" Ces datasets sont utilisés pour le cours de Centrale Lille sur le Machine Learning de Pascal Yim
Reconnaissance d'images de chiffres manuscrits
Version "mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test
Source : https://www.kaggle.com/datasets/oddrationale/mnist-in-csv
Reconnaissance d'images de gestes de la langue des signes
Version "sign_mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test
Source : https://www.kaggle.com/datasets/datamunge/sign-language-mnist
Reconnaissance de vêtements et chaussures (Zalando)
Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000
Reconnaissance de tumeurs de la peau (images en couleurs, trois valeurs R,G,B par pixel)
Autres versions avec des images plus petites et/ou en niveaux de gris
Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000
Reconnaissance de petites images en couleurs dans 10 catégories Version en CSV du dataset CIFAR10
Source : https://www.kaggle.com/datasets/fedesoriano/cifar10-python-in-csv?select=train.csv
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by TeseRact
Released under CC0: Public Domain
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by SOURAV S V
Released under CC0: Public Domain
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Emmanuel Arias
Released under Database: Open Database, Contents: Database Contents
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Sales Data Description This dataset represents synthetic sales data generated for practice purposes only. It is not real-time or based on actual business operations, and should be used solely for educational or testing purposes. The dataset contains information that simulates sales transactions across different products, regions, and customers. Each row represents an individual sale event with various details associated with it.
Columns in the Dataset
Disclaimer
Please note: This data was randomly generated and is intended solely for practice, learning, or testing. It does not reflect real-world sales, customers, or businesses, and should not be considered reliable for any real-time analysis or decision-making.
Facebook
Twitterhttps://www.licenses.ai/ai-licenseshttps://www.licenses.ai/ai-licenses
Tabular dataset for data analysis and machine learning practice. The dataset is about the market and is usable for Power BI practice and data science.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
About this file The Kaggle Global Superstore dataset is a comprehensive dataset containing information about sales and orders in a global superstore. It is a valuable resource for data analysis and visualization tasks. This dataset has been processed and transformed from its original format (txt) to CSV using the R programming language. The original dataset is available here, and the transformed CSV file used in this analysis can be found here.
Here is a description of the columns in the dataset:
category: The category of products sold in the superstore.
city: The city where the order was placed.
country: The country in which the superstore is located.
customer_id: A unique identifier for each customer.
customer_name: The name of the customer who placed the order.
discount: The discount applied to the order.
market: The market or region where the superstore operates.
ji_lu_shu: An unknown or unspecified column.
order_date: The date when the order was placed.
order_id: A unique identifier for each order.
order_priority: The priority level of the order.
product_id: A unique identifier for each product.
product_name: The name of the product.
profit: The profit generated from the order.
quantity: The quantity of products ordered.
region: The region where the order was placed.
row_id: A unique identifier for each row in the dataset.
sales: The total sales amount for the order.
segment: The customer segment (e.g., consumer, corporate, or home office).
ship_date: The date when the order was shipped.
ship_mode: The shipping mode used for the order.
shipping_cost: The cost of shipping for the order.
state: The state or region within the country.
sub_category: The sub-category of products within the main category.
year: The year in which the order was placed.
market2: Another column related to market information.
weeknum: The week number when the order was placed.
This dataset can be used for various data analysis tasks, including understanding sales patterns, customer behavior, and profitability in the context of a global superstore.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is about list of dataset provided by UCI ML , If you are a learner and want some data on the basis of year ,categories, profession or some other criteria you search it from here.
There are 8 rows in the dataset in which all details are given. --link --Data-Name --data type --default task --attribute-type --instances --attributes --year
Some missing values are present there also,
You can analyse the as per your requirement
Facebook
TwitterThis dataset was created by Parth Chokhra
Facebook
TwitterThe dataset contains randomly generated persons' data. It is created to be used in explaining data science. It currently contains the following columns: - Name - Gender - Skin Color - Height(cm) - Weight(m) - Date of Birth
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Natural Questions (NQ) dataset is a comprehensive collection of real user queries submitted to Google Search, with answers sourced from Wikipedia by expert annotators. Created by Google AI Research, this dataset aims to support the development and evaluation of advanced automated question-answering systems. The version provided here includes 89,312 meticulously annotated entries, tailored for ease of access and utility in natural language processing (NLP) and machine learning (ML) research.
The dataset is composed of authentic search queries from Google Search, reflecting the wide range of information sought by users globally. This approach ensures a realistic and diverse set of questions for NLP applications.
The NQ dataset underwent significant pre-processing to prepare it for NLP tasks: - Removal of web-specific elements like URLs, hashtags, user mentions, and special characters using Python's "BeautifulSoup" and "regex" libraries. - Grammatical error identification and correction using the "LanguageTool" library, an open-source grammar, style, and spell checker.
These steps were taken to clean and simplify the text while retaining the essence of the questions and their answers, divided into 'questions', 'long answers', and 'short answers'.
The unprocessed data, including answers with embedded HTML, empty or complex long and short answers, is stored in "Natural-Questions-Base.csv". This version retains the raw structure of the data, featuring HTML elements in answers, and varied answer formats such as tables and lists, providing a comprehensive view for those interested in the original dataset's complexity and richness. The processed data is compiled into a single CSV file named "Natural-Questions-Filtered.csv". The file is structured for easy access and analysis, with each record containing the processed question, a detailed answer, and concise answer snippets.
The filtered version is available where specific criteria, such as question length or answer complexity, were applied to refine the data further. This version allows for more focused research and application development.
The repository at 'https://github.com/fujoos/natural_questions' also includes a Flask-based CSV reader application designed to read and display contents from the "NaturalQuestions.csv" file. The app provides functionalities such as: - Viewing questions and answers directly in your browser. - Filtering results based on criteria like question keywords or answer length. -See the live demo using the csv files converted to slite db at 'https://fujoos.pythonanywhere.com/'
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).
The dataset includes:
- Category (Categorical): Product category (A, B, C, D)
- Price (Numerical): Randomized product prices
- Rating (Numerical): Ratings between 1 to 5
- Stock (Categorical): Availability status (In Stock, Out of Stock)
- Discount (Numerical): Discount percentage
This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Datasets containing Paragraphs :4118 Question1: 4118 Question2: 4118 Question3: 4118 Answer1: 4118 Answer2: 4118 Answer3 : 4118 were collected during data collection. It includes paragraphs consisting of related question-answer pairs. Each paragraph will have 3 questions and 3 answers. The dataset is stored as a Comma-Separated Values file (.csv). The dataset has been collected manually and subsequently cleaned and filtered. This laborious and time-consuming process was undertaken with the utmost care and dedication to craft a high-quality dataset specifically designed for generating extractive subjective questions and answers from the provided input paragraphs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.
- Country: Name of the country.
- Density (P/Km2): Population density measured in persons per square kilometer.
- Abbreviation: Abbreviation or code representing the country.
- Agricultural Land (%): Percentage of land area used for agricultural purposes.
- Land Area (Km2): Total land area of the country in square kilometers.
- Armed Forces Size: Size of the armed forces in the country.
- Birth Rate: Number of births per 1,000 population per year.
- Calling Code: International calling code for the country.
- Capital/Major City: Name of the capital or major city.
- CO2 Emissions: Carbon dioxide emissions in tons.
- CPI: Consumer Price Index, a measure of inflation and purchasing power.
- CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
- Currency_Code: Currency code used in the country.
- Fertility Rate: Average number of children born to a woman during her lifetime.
- Forested Area (%): Percentage of land area covered by forests.
- Gasoline_Price: Price of gasoline per liter in local currency.
- GDP: Gross Domestic Product, the total value of goods and services produced in the country.
- Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
- Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
- Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
- Largest City: Name of the country's largest city.
- Life Expectancy: Average number of years a newborn is expected to live.
- Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
- Minimum Wage: Minimum wage level in local currency.
- Official Language: Official language(s) spoken in the country.
- Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
- Physicians per Thousand: Number of physicians per thousand people.
- Population: Total population of the country.
- Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
- Tax Revenue (%): Tax revenue as a percentage of GDP.
- Total Tax Rate: Overall tax burden as a percentage of commercial profits.
- Unemployment Rate: Percentage of the labor force that is unemployed.
- Urban Population: Percentage of the population living in urban areas.
- Latitude: Latitude coordinate of the country's location.
- Longitude: Longitude coordinate of the country's location.
- Analyze population density and land area to study spatial distribution patterns.
- Investigate the relationship between agricultural land and food security.
- Examine carbon dioxide emissions and their impact on climate change.
- Explore correlations between economic indicators such as GDP and various socio-economic factors.
- Investigate educational enrollment rates and their implications for human capital development.
- Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
- Study labor market dynamics through indicators such as labor force participation and unemployment rates.
- Investigate the role of taxation and its impact on economic development.
- Explore urbanization trends and their social and environmental consequences.
Data Source: This dataset was compiled from multiple data sources
If this was helpful, a vote is appreciated ❤️ Thank you 🙂
Facebook
TwitterOresti Banos, Department of Computer Architecture and Computer Technology, University of Granada Rafael Garcia, Department of Computer Architecture and Computer Technology, University of Granada Alejandro Saez, Department of Computer Architecture and Computer Technology, University of Granada
Email to whom correspondence should be addressed: oresti '@' ugr.es (oresti.bl '@' gmail.com)
The MHEALTH (Mobile HEALTH) dataset comprises body motion and vital signs recordings for ten volunteers of the diverse profile while performing several physical activities. Sensors placed on the subject's chest, right wrist, and left ankle are used to measure the motion experienced by diverse body parts, namely, acceleration, rate of turn, and magnetic field orientation. The sensor positioned on the chest also provides 2-lead ECG measurements, which can be potentially used for basic heart monitoring, checking for various arrhythmias, or looking at the effects of exercise on the ECG.
The collected dataset comprises body motion and vital signs recordings for ten volunteers of the diverse profile while performing 12 physical activities (Table 1). Shimmer2 [BUR10] wearable sensors were used for the recordings. The sensors were respectively placed on the subject's chest, right wrist, and left ankle and attached by using elastic straps (as shown in the figure in the attachment). The use of multiple sensors permits us to measure the motion experienced by diverse body parts, namely, the acceleration, the rate of turn, and the magnetic field orientation, thus better capturing the body dynamics. The sensor positioned on the chest also provides 2-lead ECG measurements which are not used for the development of the recognition model but rather collected for future work purposes. This information can be used, for example, for basic heart monitoring, checking for various arrhythmias, or looking at the effects of exercise on the ECG. All sensing modalities are recorded at a sampling rate of 50 Hz, which is considered sufficient for capturing human activity. Each session was recorded using a video camera. This dataset is found to generalize to common activities of daily living, given the diversity of body parts involved in each one (e.g., the frontal elevation of arms vs. knees bending), the intensity of the actions (e.g., cycling vs. sitting and relaxing) and their execution speed or dynamicity (e.g., running vs. standing still). The activities were collected in an out-of-lab environment with no constraints on the way these must be executed, with the exception that the subject should try their best when executing them.
The activity set is listed in the following: L1: Standing still (1 min) L2: Sitting and relaxing (1 min) L3: Lying down (1 min) L4: Walking (1 min) L5: Climbing stairs (1 min) L6: Waist bends forward (20x) L7: Frontal elevation of arms (20x) L8: Knees bending (crouching) (20x) L9: Cycling (1 min) L10: Jogging (1 min) L11: Running (1 min) L12: Jump front & back (20x) NOTE: In brackets are the number of repetitions (Nx) or the duration of the exercises (min).
A complete and illustrated description (including table of activities, sensor setup, etc.) of the dataset is provided in the papers presented in the section “Citation Requests†.
The data collected for each subject is stored in a different log file: 'mHealth_subject.log'. Each file contains the samples (by rows) recorded for all sensors (by columns). The labels used to identify the activities are similar to the abovementioned (e.g., the label for walking is '4').
The meaning of each column is detailed next: Column 1: acceleration from the chest sensor (X-axis) Column 2: acceleration from the chest sensor (Y axis) Column 3: acceleration from the chest sensor (Z axis) Column 4: electrocardiogram signal (lead 1) Column 5: electrocardiogram signal (lead 2) Column 6: acceleration from the left-ankle sensor (X-axis) Column 7: acceleration from the left-ankle sensor (Y axis) Column 8: acceleration from the left-ankle sensor (Z axis) Column 9: gyro from the left-ankle sensor (X-axis) Column 10: gyro from the left-ankle sensor (Y axis) Column 11: gyro from the left-ankle sensor (Z axis) Column 13: magnetometer from the left-ankle sensor (X-axis) Column 13: magnetometer from the left-ankle sensor (Y axis) Column 14: magnetometer from the left-ankle sensor (Z axis) Column 15: acceleration from the right-lower-arm sensor (X-axis) Column 16: acceleration from the right-lower-arm sensor (Y axis) Column 17: acceleration from the right-lower-arm sensor (Z axis) Column 18: gyro from the right-lower-arm sensor (X-axis) Column 19: gyro from the right-lower-arm sensor (Y axis) Column 20: gyro fro...
Facebook
TwitterThis is text document classification dataset which contains 2225 text data and five categories of documents. Five categories are politics, sport, tech, entertainment and business. We can use this dataset for documents classification and document clustering.
About Dataset - Dataset contains two features text and label. - No. of Rows : 2225 - No. of Columns : 2
Text: It contains different categories of text data Label: It contains labels for five different categories : 0,1,2,3,4
Facebook
TwitterThis dataset was created by VictorDaniloCastanedaPinzon
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.
Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.
Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.
Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.
Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.
Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.
Column Definitions:
Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.