100+ datasets found

LLM prompts in the context of machine learning
kaggle.com
Updated Jul 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan Nelson (2024). LLM prompts in the context of machine learning [Dataset]. https://www.kaggle.com/datasets/jordanln/llm-prompts-in-the-context-of-machine-learning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 1, 2024
Dataset provided by
Kaggle
Authors
Jordan Nelson
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is an extension of my previous work on creating a dataset for natural language processing tasks. It leverages binary representation to characterise various machine learning models. The attributes in the dataset are derived from a dictionary, which was constructed from a corpus of prompts typically provided to a large language model (LLM). These prompts reference specific machine learning algorithms and their implementations. For instance, consider a user asking an LLM or a generative AI to create a Multi-Layer Perceptron (MLP) model for a particular application. By applying this concept to multiple machine learning models, we constructed our corpus. This corpus was then transformed into the current dataset using a bag-of-words approach. In this dataset, each attribute corresponds to a word from our dictionary, represented as a binary value: 1 indicates the presence of the word in a given prompt, and 0 indicates its absence. At the end of each entry, there is a label. Each entry in the dataset pertains to a single class, where each class represents a distinct machine learning model or algorithm. This dataset is intended for multi-class classification tasks, not multi-label classification, as each entry is associated with only one label and does not belong to multiple labels simultaneously. This dataset has been utilised with a Convolutional Neural Network (CNN) using the Keras Automodel API, achieving impressive training and testing accuracy rates exceeding 97%. Post-training, the model's predictive performance was rigorously evaluated in a production environment, where it continued to demonstrate exceptional accuracy. For this evaluation, we employed a series of questions, which are listed below. These questions were intentionally designed to be similar to ensure that the model can effectively distinguish between different machine learning models, even when the prompts are closely related.

KNN How would you create a KNN model to classify emails as spam or not spam based on their content and metadata? How could you implement a KNN model to classify handwritten digits using the MNIST dataset? How would you use a KNN approach to build a recommendation system for suggesting movies to users based on their ratings and preferences? How could you employ a KNN algorithm to predict the price of a house based on features such as its location, size, and number of bedrooms etc? Can you create a KNN model for classifying different species of flowers based on their petal length, petal width, sepal length, and sepal width? How would you utilise a KNN model to predict the sentiment (positive, negative, or neutral) of text reviews or comments? Can you create a KNN model for me that could be used in malware classification? Can you make me a KNN model that can detect a network intrusion when looking at encrypted network traffic? Can you make a KNN model that would predict the stock price of a given stock for the next week? Can you create a KNN model that could be used to detect malware when using a dataset relating to certain permissions a piece of software may have access to?

Decision Tree Can you describe the steps involved in building a decision tree model to classify medical images as malignant or benign for cancer diagnosis and return a model for me? How can you utilise a decision tree approach to develop a model for classifying news articles into different categories (e.g., politics, sports, entertainment) based on their textual content? What approach would you take to create a decision tree model for recommending personalised university courses to students based on their academic strengths and weaknesses? Can you describe how to create a decision tree model for identifying potential fraud in financial transactions based on transaction history, user behaviour, and other relevant data? In what ways might you apply a decision tree model to classify customer complaints into different categories determining the severity of language used? Can you create a decision tree classifier for me? Can you make me a decision tree model that will help me determine the best course of action across a given set of strategies? Can you create a decision tree model for me that can recommend certain cars to customers based on their preferences and budget? How can you make a decision tree model that will predict the movement of star constellations in the sky based on data provided by the NASA website? How do I create a decision tree for time-series forecasting?

Random Forest Can you describe the steps involved in building a random forest model to classify different types of anomalies in network traffic data for cybersecurity purposes and return the code for me? In what ways could you implement a random forest model to predict the severity of traffic congestion in urban areas based on historical traffic patterns, weather...
home data for ml course
kaggle.com
zip
Updated Aug 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julián Pérez Pesce (2019). home data for ml course [Dataset]. https://www.kaggle.com/datasets/estrotococo/home-data-for-ml-course
Explore at:
zip(199207 bytes)Available download formats
Dataset updated
Aug 27, 2019
Authors
Julián Pérez Pesce
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Exercise: Machine Learning Competitions

When you click on Run / All, the notebook will give you an error: "Files doesn't exist" With this DataSet you fix that. It's the same from DanB. Please UPVOTE!

Enjoy!
a
Udemy - Machine Learning A-Z Become Kaggle Master
academictorrents.com
bittorrent
Updated Apr 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
None (2023). Udemy - Machine Learning A-Z Become Kaggle Master [Dataset]. https://academictorrents.com/details/9e378efb6e2f67de46c6c3660d9675be50bfc21f
Explore at:
bittorrent(15004863898)Available download formats
Dataset updated
Apr 24, 2023
Authors
None
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
A BitTorrent file to download data with the title 'Udemy - Machine Learning A-Z Become Kaggle Master'
Learn Images
kaggle.com
Updated Feb 26, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DanB (2019). Learn Images [Dataset]. https://www.kaggle.com/dansbecker/learn-images/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 26, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DanB
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by DanB

Released under CC0: Public Domain

Contents
ML Datasets
kaggle.com
Updated May 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bikram Saha (2023). ML Datasets [Dataset]. https://www.kaggle.com/datasets/imbikramsaha/ml-datasets/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 1, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bikram Saha
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset contains a diverse range of examples, including classification, regression, clustering, and dimensionality reduction problems, with varying levels of complexity and varying numbers of features. Each dataset comes with a detailed description of the problem and the corresponding features, making it easy to understand and work with. Additionally, the dataset provides an opportunity for machine learning enthusiasts to experiment with different SkLearn algorithms and evaluate their performance on different datasets. This dataset is perfect for both beginners and advanced practitioners looking to hone their skills in various machine learning techniques.
A
‘Deep Learning A-Z - ANN dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Deep Learning A-Z - ANN dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-deep-learning-a-z-ann-dataset-3c75/cb36262b/?iid=013-193&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Deep Learning A-Z - ANN dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/filippoo/deep-learning-az-ann on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

This is the dataset used in the section "ANN (Artificial Neural Networks)" of the Udemy course from Kirill Eremenko (Data Scientist & Forex Systems Expert) and Hadelin de Ponteves (Data Scientist), called Deep Learning A-Z™: Hands-On Artificial Neural Networks. The dataset is very useful for beginners of Machine Learning, and a simple playground where to compare several techniques/skills.

It can be freely downloaded here: https://www.superdatascience.com/deep-learning/

The story: A bank is investigating a very high rate of customer leaving the bank. Here is a 10.000 records dataset to investigate and predict which of the customers are more likely to leave the bank soon.

The story of the story: I'd like to compare several techniques (better if not alone, and with the experience of several Kaggle users) to improve my basic knowledge on Machine Learning.

Content

I will write more later, but the columns names are very self-explaining.

Acknowledgements

Udemy instructors Kirill Eremenko (Data Scientist & Forex Systems Expert) and Hadelin de Ponteves (Data Scientist), and their efforts to provide this dataset to their students.

Inspiration

Which methods score best with this dataset? Which are fastest (or, executable in a decent time)? Which are the basic steps with such a simple dataset, very useful to beginners?

--- Original source retains full ownership of the source dataset ---
t
DAST Customer Analysis
test.researchdata.tuwien.at
bin, csv, png +3
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joachim Boltz; Joachim Boltz; Joachim Boltz; Joachim Boltz (2025). DAST Customer Analysis [Dataset]. http://doi.org/10.70124/32sms-w5z07
Explore at:
text/x-python, bin, png, csv, text/markdown, txtAvailable download formats
Unique identifier
https://doi.org/10.70124/32sms-w5z07
Dataset updated
May 20, 2025
Dataset provided by
TU Wien
Authors
Joachim Boltz; Joachim Boltz; Joachim Boltz; Joachim Boltz
Time period covered
Apr 20, 2025
Description
We use the https://www.kaggle.com/datasets/imakash3011/customer-personality-analysis dataset to predict whether customers buy in web, store or by catalog.
Data from: Code4ML: a Large-scale Dataset of annotated Machine Learning Code...
zenodo.org
csv
Updated Sep 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous authors; Anonymous authors (2023). Code4ML: a Large-scale Dataset of annotated Machine Learning Code [Dataset]. http://doi.org/10.5281/zenodo.6607065
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6607065
Dataset updated
Sep 15, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous authors; Anonymous authors
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python code snippets, competition summaries, and data descriptions from Kaggle.

The data is organized in a table structure. Code4ML includes several main objects: competitions information, raw code blocks collected form Kaggle and manually marked up snippets. Each table has a .csv format.

Each competition has the text description and metadata, reflecting competition and used dataset characteristics as well as evaluation metrics (competitions.csv). The corresponding datasets can be loaded using Kaggle API and data sources.

The code blocks themselves and their metadata are collected to the data frames concerning the publishing year of the initial kernels. The current version of the corpus includes two code blocks files: snippets from kernels up to the 2020 year (сode_blocks_upto_20.csv) and those from the 2021 year (сode_blocks_21.csv) with corresponding metadata. The corpus consists of 2 743 615 ML code blocks collected from 107 524 Jupyter notebooks.

Marked up code blocks have the following metadata: anonymized id, the format of the used data (for example, table or audio), the id of the semantic type, a flag for the code errors, the estimated relevance to the semantic class (from 1 to 5), the id of the parent notebook, and the name of the competition. The current version of the corpus has ~12 000 labeled snippets (markup_data_20220415.csv).

As marked up code blocks data contains the numeric id of the code block semantic type, we also provide a mapping from this number to semantic type and subclass (actual_graph_2022-06-01.csv).

The dataset can help solve various problems, including code synthesis from a prompt in natural language, code autocompletion, and semantic code classification.
Kaggle's learning path map
kaggle.com
Updated May 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omar Aboelwafa (2022). Kaggle's learning path map [Dataset]. https://www.kaggle.com/datasets/omaraboelwafa/kaggles-learning-path-map
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 1, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Omar Aboelwafa
Description
Dataset

This dataset was created by Omar Aboelwafa

Contents
Duolingo Spaced Repetition Data
kaggle.com
Updated Feb 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vinicius Araujo (2024). Duolingo Spaced Repetition Data [Dataset]. https://www.kaggle.com/datasets/aravinii/duolingo-spaced-repetition-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vinicius Araujo
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
PLEASE UPVOTE IF YOU LIKE THIS CONTENT! 😍

Duolingo is an American educational technology company that produces learning apps and provides language certification. There main app is considered the most popular language learning app in the world.

To progress in their learning journey, each user of the application needs to complete a set of lessons in which they are presented with the words of the language they want to learn. In an infinite set of lessons, each word is applied in a different context and, on top of that, Duolingo uses a spaced repetition approach, where the user sees an already known word again to reinforce their learning.

Each line in this file refers to a Duolingo lesson that had a target word to practice.

The columns are as follows:

p_recall - proportion of exercises from this lesson/practice where the word/lexeme was correctly recalled

timestamp - UNIX timestamp of the current lesson/practice

delta - time (in seconds) since the last lesson/practice that included this word/lexeme

user_id - student user ID who did the lesson/practice (anonymized)

learning_language - language being learned

ui_language - user interface language (presumably native to the student)

lexeme_id - system ID for the lexeme tag (i.e., word)

lexeme_string - lexeme tag (see below)

history_seen - total times user has seen the word/lexeme prior to this lesson/practice

history_correct - total times user has been correct for the word/lexeme prior to this lesson/practice

session_seen - times the user saw the word/lexeme during this lesson/practice

session_correct - times the user got the word/lexeme correct during this lesson/practice

The lexeme_string column contains a string representation of the "lexeme tag" used by Duolingo for each lesson/practice (data instance) in our experiments. The lexeme_string field uses the following format:

`surface-form/lemma
Human Activity Recognition (HAR - Video Dataset)
kaggle.com
Updated May 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sharjeel M. (2023). Human Activity Recognition (HAR - Video Dataset) [Dataset]. http://doi.org/10.34740/kaggle/dsv/5722068
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5722068
Dataset updated
May 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sharjeel M.
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The dataset contains a comprehensive collection of human activity videos, spanning across 7 distinct classes. These classes include clapping, meeting and splitting, sitting, standing still, walking, walking while reading book, and walking while using the phone.

Each video clip in the dataset showcases a specific human activity and has been labeled with the corresponding class to facilitate supervised learning.

The primary inspiration behind creating this dataset is to enable machines to recognize and classify human activities accurately. With the advent of computer vision and deep learning techniques, it has become increasingly important to train machine learning models on large and diverse datasets to improve their accuracy and robustness.
Introduction to C++
kaggle.com
Updated Nov 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saloni Singhal (2021). Introduction to C++ [Dataset]. https://www.kaggle.com/datasets/salonisinghal16/introduction-to-c
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 16, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Saloni Singhal
Description
Dataset

This dataset was created by Saloni Singhal

Contents
learn-langchain
kaggle.com
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cat_and_dog2 (2023). learn-langchain [Dataset]. https://www.kaggle.com/datasets/catanddog2/learn-langchain
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 30, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
cat_and_dog2
Description
Dataset

This dataset was created by cat_and_dog2

Contents
College Dataset (unsupervised learning)
kaggle.com
zip
Updated May 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nishant Kumar (2020). College Dataset (unsupervised learning) [Dataset]. https://www.kaggle.com/datasets/nishantpatyal/college-dataset-unsupervised-learning
Explore at:
zip(32388 bytes)Available download formats
Dataset updated
May 2, 2020
Authors
Nishant Kumar
Description
Dataset

This dataset was created by Nishant Kumar

Contents
machine learning datasets
kaggle.com
Updated Nov 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roshan2429 (2022). machine learning datasets [Dataset]. https://www.kaggle.com/roshan2429/machine-learning-datasets/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Roshan2429
Description
Dataset

This dataset was created by Roshan2429

Contents
Machine Learning Datasets
kaggle.com
Updated Oct 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Devendra Parihar (2022). Machine Learning Datasets [Dataset]. https://www.kaggle.com/datasets/dev523/machine-learning-datasets/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 6, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Devendra Parihar
Description
Dataset

This dataset was created by Devendra Parihar

Contents
30 Days of ML
kaggle.com
Updated Mar 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luca Massaron (2022). 30 Days of ML [Dataset]. https://www.kaggle.com/datasets/lucamassaron/30-days-of-ml
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Luca Massaron
Description
Context

The data relative to the Kaggle learning competition 30 Days of ML (https://www.kaggle.com/thirty-days-of-ml) cannot be downloaded by Kagglers who have not initially participated to it. now you can download it from here and use it for testing the many tutorials and notebooks available from the learning competition.

Content

The dataset is used for this competition is synthetic (and generated using a CTGAN), but based on a real dataset. The original dataset deals with predicting the amount of an insurance claim. Although the features are anonymized, they have properties relating to real-world features.

Acknowledgements

The data comes from a Kaggle competition, 30 Days of ML (https://www.kaggle.com/c/30-days-of-ml).
2018 Kaggle Machine Learning Challenge dataset
kaggle.com
Updated Nov 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sreenanda Sai Dasari (2021). 2018 Kaggle Machine Learning Challenge dataset [Dataset]. https://www.kaggle.com/sreenandasaidasari/2021-kaggle-machine-learning-challenge/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 28, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sreenanda Sai Dasari
Description
Dataset

This dataset was created by Sreenanda Sai Dasari

Contents
Meta Kaggle Code
kaggle.com
Updated Aug 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
Dataset updated
Aug 14, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Car Acceptability Dataset for Machine Learning
kaggle.com
Updated Sep 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ismetgocer (2023). Car Acceptability Dataset for Machine Learning [Dataset]. https://www.kaggle.com/datasets/ismetgocer/car-acceptability
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ismetgocer
Description
Are you ready for a good challenge? There are many missing values, extraordinary symbols and problems Let's start wonderful analyses!

There are 1729 rows and 7 columns in this data set;

Features;

Column (buying): Buying price of the car (v-high, high, med, low). Examples: low, med, high Column (maint): Price of the maintenance of the car (v-high, high, med, low). Examples: low, med, high Column (doors): Number of doors (2, 3, 4, 5-more). Examples: 2, 3, 4 Column (persons): Capacity in terms of persons to carry (2, 4, more). Examples: 2, 4, more Column (lug_boot): The size of the luggage boot (small, med, big). Examples: small, med, big Column (safety): Estimated safety of the car (low, med, high). Examples: low, med, high

Target:

Column (class): Car acceptability (unacc: unacceptable, acc: acceptable, good: good, v-good: very good). Examples: unacc, acc, good

Facebook

Twitter

Click to copy link

Link copied

Cite

Jordan Nelson (2024). LLM prompts in the context of machine learning [Dataset]. https://www.kaggle.com/datasets/jordanln/llm-prompts-in-the-context-of-machine-learning

LLM prompts in the context of machine learning

Chatbot prompts relating to Machine Learning models

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 1, 2024

Dataset provided by

Kaggle

Authors

Jordan Nelson

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset is an extension of my previous work on creating a dataset for natural language processing tasks. It leverages binary representation to characterise various machine learning models. The attributes in the dataset are derived from a dictionary, which was constructed from a corpus of prompts typically provided to a large language model (LLM). These prompts reference specific machine learning algorithms and their implementations. For instance, consider a user asking an LLM or a generative AI to create a Multi-Layer Perceptron (MLP) model for a particular application. By applying this concept to multiple machine learning models, we constructed our corpus. This corpus was then transformed into the current dataset using a bag-of-words approach. In this dataset, each attribute corresponds to a word from our dictionary, represented as a binary value: 1 indicates the presence of the word in a given prompt, and 0 indicates its absence. At the end of each entry, there is a label. Each entry in the dataset pertains to a single class, where each class represents a distinct machine learning model or algorithm. This dataset is intended for multi-class classification tasks, not multi-label classification, as each entry is associated with only one label and does not belong to multiple labels simultaneously. This dataset has been utilised with a Convolutional Neural Network (CNN) using the Keras Automodel API, achieving impressive training and testing accuracy rates exceeding 97%. Post-training, the model's predictive performance was rigorously evaluated in a production environment, where it continued to demonstrate exceptional accuracy. For this evaluation, we employed a series of questions, which are listed below. These questions were intentionally designed to be similar to ensure that the model can effectively distinguish between different machine learning models, even when the prompts are closely related.

KNN How would you create a KNN model to classify emails as spam or not spam based on their content and metadata? How could you implement a KNN model to classify handwritten digits using the MNIST dataset? How would you use a KNN approach to build a recommendation system for suggesting movies to users based on their ratings and preferences? How could you employ a KNN algorithm to predict the price of a house based on features such as its location, size, and number of bedrooms etc? Can you create a KNN model for classifying different species of flowers based on their petal length, petal width, sepal length, and sepal width? How would you utilise a KNN model to predict the sentiment (positive, negative, or neutral) of text reviews or comments? Can you create a KNN model for me that could be used in malware classification? Can you make me a KNN model that can detect a network intrusion when looking at encrypted network traffic? Can you make a KNN model that would predict the stock price of a given stock for the next week? Can you create a KNN model that could be used to detect malware when using a dataset relating to certain permissions a piece of software may have access to?

Decision Tree Can you describe the steps involved in building a decision tree model to classify medical images as malignant or benign for cancer diagnosis and return a model for me? How can you utilise a decision tree approach to develop a model for classifying news articles into different categories (e.g., politics, sports, entertainment) based on their textual content? What approach would you take to create a decision tree model for recommending personalised university courses to students based on their academic strengths and weaknesses? Can you describe how to create a decision tree model for identifying potential fraud in financial transactions based on transaction history, user behaviour, and other relevant data? In what ways might you apply a decision tree model to classify customer complaints into different categories determining the severity of language used? Can you create a decision tree classifier for me? Can you make me a decision tree model that will help me determine the best course of action across a given set of strategies? Can you create a decision tree model for me that can recommend certain cars to customers based on their preferences and budget? How can you make a decision tree model that will predict the movement of star constellations in the sky based on data provided by the NASA website? How do I create a decision tree for time-series forecasting?

Random Forest Can you describe the steps involved in building a random forest model to classify different types of anomalies in network traffic data for cybersecurity purposes and return the code for me? In what ways could you implement a random forest model to predict the severity of traffic congestion in urban areas based on historical traffic patterns, weather...

Clear search

Close search

Google apps

Main menu

LLM prompts in the context of machine learning

home data for ml course

Udemy - Machine Learning A-Z Become Kaggle Master

Learn Images

Dataset

Contents

ML Datasets

‘Deep Learning A-Z - ANN dataset’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

DAST Customer Analysis

Data from: Code4ML: a Large-scale Dataset of annotated Machine Learning Code...

Kaggle's learning path map

Dataset

Contents

Duolingo Spaced Repetition Data

Human Activity Recognition (HAR - Video Dataset)

Introduction to C++

Dataset

Contents

learn-langchain

Dataset

Contents

College Dataset (unsupervised learning)

Dataset

Contents

machine learning datasets

Dataset

Contents

Machine Learning Datasets

Dataset

Contents

30 Days of ML

Context

Content

Acknowledgements

2018 Kaggle Machine Learning Challenge dataset

Dataset

Contents

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Car Acceptability Dataset for Machine Learning

LLM prompts in the context of machine learning

Chatbot prompts relating to Machine Learning models