100+ datasets found

P
DSEval-Kaggle Dataset
paperswithcode.com
Updated Apr 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren (2024). DSEval-Kaggle Dataset [Dataset]. https://paperswithcode.com/dataset/dseval
Explore at:
Dataset updated
Apr 19, 2024
Authors
Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren
Description
In this paper, we introduce a novel benchmarking framework designed specifically for evaluations of data science agents. Our contributions are three-fold. First, we propose DSEval, an evaluation paradigm that enlarges the evaluation scope to the full lifecycle of LLM-based data science agents. We also cover aspects including but not limited to the quality of the derived analytical solutions or machine learning models, as well as potential side effects such as unintentional changes to the original data. Second, we incorporate a novel bootstrapped annotation process letting LLM themselves generate and annotate the benchmarks with ``human in the loop''. A novel language (i.e., DSEAL) has been proposed and the derived four benchmarks have significantly improved the benchmark scalability and coverage, with largely reduced human labor. Third, based on DSEval and the four benchmarks, we conduct a comprehensive evaluation of various data science agents from different aspects. Our findings reveal the common challenges and limitations of the current works, providing useful insights and shedding light on future research on LLM-based data science agents.

This is one of DSEval benchmarks.
Performance vs. Predicted Performance
kaggle.com
Updated Dec 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Calathea21 (2022). Performance vs. Predicted Performance [Dataset]. http://doi.org/10.34740/kaggle/dsv/4752670
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/4752670
Dataset updated
Dec 21, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Calathea21
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset contains information about high school students and their actual and predicted performance on an exam. Most of the information, including some general information about high school students and their grade for an exam, was based on an already existing dataset, while the predicted exam performance was based on a human experiment. In this experiment, participants were shown short descriptions of the students (based on the information in the original data) and had to rank and grade according to their expected performance. Prior to this task some participants were exposed to some "Stereotype Activation", suggesting that boys perform less well in school than girls.

Description of *original_data.csv*

Based on this dataset (which is also available on kaggle), we extracted a number of student profiles that participants had to make grade predictions for. For more information about this dataset we refer to the corresponding kaggle page: https://www.kaggle.com/datasets/uciml/student-alcohol-consumption

Note that we performed some preprocessing on the original data:

The original data consisted of two parts: the information about students following a Maths course and the information about students following a Portuguese course. Since in both datasets the same type of information was recorded, we merged both datasets and added a column "subject", to show which course each student belongs to

We excluded all data where G3 = 0 (i.e. the grade for the last exam = 0)

From original_data.csv we randomly sampled 856 students that participants in our study had to make grade predictions for.

Description of *CompleteDataAndBiases.csv*

index - this column corresponds to the indeces in the file "original_data.csv". Through these indices, it is possible to add columns from the original data to the dataset with the grade prediction

ParticipantID - the ID of the participant who made the performance predictions for the corresponding student. Predictions needed to be made for 856 students, and each participant made 8 predictions total. Thus there are 107 different participant IDs

name - to make the prediction task more engaging for participants, each of the 8 student profiles, that participants had to grade & rank was randomly matched to one of four boy/girl's names (depending on the sex of the student)

sex - the sex of each student, either female (F) or male (M). For benchmarking fair ML algorithms, this can be used as the sensitive attribute. We assume that in the fair version of the decision variable ("Pass"), no sex discrimination occurs. The biased versions of the variable ("Predicted Pass") are mostly discriminatory towards male students.

studytime - this variable is taken from the original dataset and denotes how long a student studied for their exam. In the original data this variable consisted of four levels (less than 2 hours vs. 2-5 hours vs. 5-10 hours vs. more than 10 hours). We binned the latter two levels together and encoded this column numerically from 1-3.

freetime - Originally, this variable ranged from 1 (very low) to 5 (very high). We binned this variable into three categories, where level 1 and 2 are binned, as well as level 4 and 5.

romantic - Binary variable, denoting whether the student is in a romantic relationship or not.

Walc - This variable shows how much alcohol each student consumes in the weekend. Originally it ranged from 1 to 5 (5 corresponding to the highest alcohol consumption), but we binned the last two levels together.

goout - This variable shows how often a student goes out in a week. Originally it ranged from 1 to 5 (5 corresponding to going out very often), but we binned the last two levels together.

Parents_edu - This variable was not present in the original dataset. Instead, the original dataset consisted of two variables "mum_edu" and "dad_edu". We obtained "Parents_edu" by taking the higher one of both. The variable consist of 4 levels, whereas 4 = highest level of education.

absences - This variable shows the number of absences per student. Originally it ranged from 0 - 93, but because large number of absences were infrequent we binned all absences of >=7 into one level.

reason - The reason for why a student chose to go to the school in question. The levels are close to home, school's reputation, school's curricular and other

G3 - The actual grade each student received for the final exam of the course, ranging from 0-20.

Pass - A binary variable showing whether G3 is a passing grade (i.e. >=10) or not.

Predicted Grade - The grade the student was predicted to receive in our experiment

Predicted Rank - In our ex...
💾DDR4 Memory Benchmarks📊
kaggle.com
Updated May 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alan Jo (2022). 💾DDR4 Memory Benchmarks📊 [Dataset]. https://www.kaggle.com/alanjo/ddr4-memory-benchmarks/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 3, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Alan Jo
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Enhanced version of this dataset can be found here 🙂

Context

Benchmarks allow for easy comparison between multiple RAM kits by scoring their performance on a standardized series of tests, and they are useful in many instances: When buying or building a new PC.

Content

Newest data as of May 3rd, 2022 This dataset contains benchmarks of DDR4 memory models

Acknowledgements

Data scrapped from PassMark

If you enjoyed this dataset, here's some similar datasets you may like 😎
SNIPS Natural Language Understanding Benchmark:
kaggle.com
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
renan ferreira (2024). SNIPS Natural Language Understanding Benchmark: [Dataset]. https://www.kaggle.com/datasets/renanaferreira/snips-natural-language-understanding-benchmark
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
renan ferreira
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Natural Language Understanding benchmark

This repository contains the results of three benchmarks that compare natural language understanding services offering: 1. built-in intents (Apple’s SiriKit, Amazon’s Alexa, Microsoft’s Luis, Google’s API.ai, and Snips.ai) on a selection of various intents. This benchmark was performed in December 2016. Its results are described in length in the following post. 2. custom intent engines (Google's API.ai, Facebook's Wit, Microsoft's Luis, Amazon's Alexa, and Snips' NLU) for seven chosen intents. This benchmark was performed in June 2017. Its results are described in a paper and a blog post. 3. extension of Braun et al., 2017 (Google's API.AI, Microsoft's Luis, IBM's Watson, Rasa) This experiment replicates the analysis made by Braun et al., 2017, published in Evaluating Natural Language Understanding Services for Conversational Question Answering Systems as part of SIGDIAL 2017 proceedings. Snips and Rasa are added. Details are available in a paper and a blog post.

The data is provided for each benchmark and more details about the methods are available in the README file in each folder.

Any publication based on these datasets must include a full citation to the following paper in which the results were published by the Snips Team:

"https://arxiv.org/abs/1805.10190">Coucke A. et al., "Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces." 2018,

accepted for a spotlight presentation at the Privacy in Machine Learning and Artificial Intelligence workshop colocated with ICML 2018.

The Snips team has joined Sonos in November 2019. These open datasets remain available and their access is now managed by the Sonos Voice Experience Team. Please email sve-research@sonos.com with any question.
n
Data from: Assessing predictive performance of supervised machine learning...
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated May 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evans Omondi (2023). Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model [Dataset]. http://doi.org/10.5061/dryad.wh70rxwrh
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.wh70rxwrh
Dataset updated
May 23, 2023
Dataset provided by
Strathmore University
Authors
Evans Omondi
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The diamond is 58 times harder than any other mineral in the world, and its elegance as a jewel has long been appreciated. Forecasting diamond prices is challenging due to nonlinearity in important features such as carat, cut, clarity, table, and depth. Against this backdrop, the study conducted a comparative analysis of the performance of multiple supervised machine learning models (regressors and classifiers) in predicting diamond prices. Eight supervised machine learning algorithms were evaluated in this work including Multiple Linear Regression, Linear Discriminant Analysis, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Support Vector Machines, Boosted Regression and Classification Trees, and Multi-Layer Perceptron. The analysis is based on data preprocessing, exploratory data analysis (EDA), training the aforementioned models, assessing their accuracy, and interpreting their results. Based on the performance metrics values and analysis, it was discovered that eXtreme Gradient Boosting was the most optimal algorithm in both classification and regression, with a R2 score of 97.45% and an Accuracy value of 74.28%. As a result, eXtreme Gradient Boosting was recommended as the optimal regressor and classifier for forecasting the price of a diamond specimen. Methods Kaggle, a data repository with thousands of datasets, was used in the investigation. It is an online community for machine learning practitioners and data scientists, as well as a robust, well-researched, and sufficient resource for analyzing various data sources. On Kaggle, users can search for and publish various datasets. In a web-based data-science environment, they can study datasets and construct models.
Kaggle, IHME and LANL Forecasts
kaggle.com
Updated May 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Goldbloom (2020). Kaggle, IHME and LANL Forecasts [Dataset]. https://www.kaggle.com/antgoldbloom/covid19-epidemiological-benchmarking-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 26, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anthony Goldbloom
Description
Dataset

This dataset was created by Anthony Goldbloom

Contents
A
‘STUDENTS PERFORMANCE DATASET’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘STUDENTS PERFORMANCE DATASET’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-students-performance-dataset-18ad/98348783/?iid=001-124&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘STUDENTS PERFORMANCE DATASET’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/balavashan/students-performance-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

This Dataset contains the information about the students and their Performance try to predict using the simple machine Learning Algorithm.

Content

It contains the Specific information about the Schooling,Family issues,personal Relationship of the students,Internet facility and so on.

Acknowledgements

This data are gathered from numerous sources thanks a lot @uci_repository.

Inspiration

Try to find out the issues with the Students as they are future Personalities ,to save their Schooling and Youth life.

--- Original source retains full ownership of the source dataset ---
Chess XAI Benchmark
kaggle.com
Updated Jul 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
smuecke (2022). Chess XAI Benchmark [Dataset]. https://www.kaggle.com/datasets/smuecke/chess-xai-benchmark
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 25, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
smuecke
Description
This data set contains 30 million chess positions along with a label that indicates if the position is not check (0), check (1) or checkmate (2). In addition, we provide 3 reference explanations per data point consisting of 8×8 bit masks that highlight certain squares that are relevant for the decision. For each class, we identified one explanation type that characterizes it most accurately: - No check (0): All squares that are controlled by the enemy player, i.e., all squares that can be reached or captured on by any enemy piece. - Check (1): All squares (origin or target) of legal moves. As a checkmate is a check where the player under attack has no more legal moves, highlighting legal moves is sufficient to disprove a checkmate. - Checkmate (2): All squares with pieces that are essential for creating the checkmate. This includes attackers, friendly pieces blocking the King, enemy pieces guarding escape squares and enemy pieces protecting attackers.

The data is saved as a CSV file containing the chess positions in Forsyth–Edwards Notation (FEN) and the label (0-2) as columns. The FEN string can be read by most chess software packages and encodes the current piece setup, whose turn it is and some more game-specific information (castling rights, en-passant squares). The explanations are saved as 64-bit unsigned integers, which can be converted to SquareSet objects from the chess library. We provide code for converting between different data and explanation representations.

Our data set is based on the Lichess open database, which contains records of over 3 billion games of chess played online by human players on the free chess website Lichess. To read and process the games and to create the explanations, we used the Python package chess. We selected only those games that end in checkmate, excluding those that end by timeout or resignation. Also we skip the first ten moves, as they lead to lots of duplicate positions.
A
‘Body performance Data’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Body performance Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-body-performance-data-aa99/53970b34/?iid=005-594&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Body performance Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kukuroo3/body-performance-data on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

This is data that confirmed the grade of performance with age and some exercise performance data.

Content

data shape : (13393, 12)

age : 20 ~64

gender : F,M

height_cm : (If you want to convert to feet, divide by 30.48)

weight_kg

body fat_%

diastolic : diastolic blood pressure (min)

systolic : systolic blood pressure (min)

gripForce

sit and bend forward_cm

sit-ups counts

broad jump_cm

class : A,B,C,D ( A: best) / stratified

Source

link (Korea Sports Promotion Foundation) Some post-processing and filtering has done from the raw data.

--- Original source retains full ownership of the source dataset ---
Clustering benchmark datasets
kaggle.com
Updated Feb 12, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krid Jin (2018). Clustering benchmark datasets [Dataset]. https://www.kaggle.com/vasopikof/clustering-benchmark-datasets/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Krid Jin
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

Clustering benchmark datasets published by School of Computing, University of Eastern Finland

Content

2D scatter points and label which need to process the formatting first.

find more in https://cs.joensuu.fi/sipu/datasets/

Acknowledgements

@misc{ClusteringDatasets, author = {Pasi Fr"anti et al}, title = {Clustering datasets}, year = {2015}, url = {http://cs.uef.fi/sipu/datasets/}, }

Inspiration

With standard and famous benchmark, various clustering algorithm can be performed and compared though a number of kernels.

Student Performance

kaggle.com

Updated Oct 7, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Aman Chauhan (2022). Student Performance [Dataset]. https://www.kaggle.com/datasets/whenamancodes/student-performance

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 7, 2022

Dataset provided by

Kaggle

Authors

Aman Chauhan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

Attributes for both Maths.csv (Math course) and Portuguese.csv (Portuguese language course) datasets:

Columns	Description
school	student's school (binary: 'GP' - Gabriel Pereira or 'MS' - Mousinho da Silveira)
sex	student's sex (binary: 'F' - female or 'M' - male)
age	student's age (numeric: from 15 to 22)
address	student's home address type (binary: 'U' - urban or 'R' - rural)
famsize	family size (binary: 'LE3' - less or equal to 3 or 'GT3' - greater than 3)
Pstatus	parent's cohabitation status (binary: 'T' - living together or 'A' - apart)
Medu	mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 â€“ 5th to 9th grade, 3 â€“ secondary education or 4 â€“ higher education)
Fedu	father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 â€“ 5th to 9th grade, 3 â€“ secondary education or 4 â€“ higher education)
Mjob	mother's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
Fjob	father's job (nominal: 'teacher', 'health' care related, civil 'services' (e.g. administrative or police), 'at_home' or 'other')
reason	reason to choose this school (nominal: close to 'home', school 'reputation', 'course' preference or 'other')
guardian	student's guardian (nominal: 'mother', 'father' or 'other')
traveltime	home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
studytime	weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
failures	number of past class failures (numeric: n if 1<=n<3, else 4)
schoolsup	extra educational support (binary: yes or no)
famsup	family educational support (binary: yes or no)
paid	extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
activities	extra-curricular activities (binary: yes or no)
nursery	attended nursery school (binary: yes or no)
higher	wants to take higher education (binary: yes or no)
internet	Internet access at home (binary: yes or no)
romantic	with a romantic relationship (binary: yes or no)
famrel	quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
freetime	free time after school (numeric: from 1 - very low to 5 - very high)
goout	going out with friends (numeric: from 1 - very low to 5 - very high)
Dalc	workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
Walc	weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
health	current health status (numeric: from 1 - very bad to 5 - very good)
absences	number of school absences (numeric: from 0 to 93)

These grades are related with the course subject, Math or Portuguese:

Grade	Description
G1	first period grade (numeric: from 0 to 20)
G2	second period grade (numeric: from 0 to 20)
G3	final grade (numeric: from 0 to 20, output target)

More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Haha

Student Engagement
kaggle.com
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Student Engagement [Dataset]. https://www.kaggle.com/datasets/thedevastator/student-engagement-with-tableau-a-data-science-p
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 23, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Student Engagement

Predicting Engagement and Exam Performance

By [source]

About this dataset

This dataset contains information on student engagement with Tableau, including quizzes, exams, and lessons. The data includes the course title, the rating of the course, the date the course was rated, the exam category, the exam duration, whether the answer was correct or not, the number of quizzes completed, the number of exams completed, the number of lessons completed, the date engaged, the exam result, and more

How to use the dataset

The 'Student Engagement with Tableau' dataset offers insights into student engagement with the Tableau software. The data includes information on courses, exams, quizzes, and student learning.

This dataset can be used to examine how students use Tableau, what kind of engagement leads to better learning outcomes, and whether certain course or exam characteristics are associated with student engagement

Research Ideas

Creating a heat map of student engagement by course and location

Determining which courses are most popular among students from different countries

Identifying patterns in students' exam results

Acknowledgements

Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: 365_course_info.csv | Column name | Description | |:-----------------|:----------------------------------| | course_title | The title of the course. (String) |

File: 365_course_ratings.csv | Column name | Description | |:------------------|:---------------------------------------------------------| | course_rating | The rating given to the course by the student. (Numeric) | | date_rated | The date on which the course was rated. (Date) |

File: 365_exam_info.csv | Column name | Description | |:------------------|:-------------------------------------------------| | exam_category | The category of the exam. (Categorical) | | exam_duration | The duration of the exam in minutes. (Numerical) |

File: 365_quiz_info.csv | Column name | Description | |:-------------------|:----------------------------------------------------------------------| | answer_correct | Whether or not the student answered the question correctly. (Boolean) |

File: 365_student_engagement.csv | Column name | Description | |:-----------------------|:------------------------------------------------------------------| | engagement_quizzes | The number of times a student has engaged with quizzes. (Numeric) | | engagement_exams | The number of times a student has engaged with exams. (Numeric) | | engagement_lessons | The number of times a student has engaged with lessons. (Numeric) | | date_engaged | The date of the student's engagement. (Date) |

File: 365_student_exams.csv | Column name | Description | |:-------------------------|:---------------------------------------------------| | exam_result | The result of the exam. (Categorical) | | exam_completion_time | The time it took to complete the exam. (Numerical) | | date_exam_completed | The date the exam was completed. (Date) |

File: 365_student_hub_questions.csv | Column name | Description | |:------------------------|:----------------------------------------| | date_question_asked | The date the question was asked. (Date) |

File: 365_student_info.csv | Column name | Description | |:--------------------|:-------------------------------------------------------| | student_country | The country of the student. (Categorical) | | date_registered | The date the student registered for the course. (Date) |

File: 365_student_learning.csv | Column name | Description | |:--------------------|:------------------------------...
BrowseComp: A Benchmark for Browsing Agents
kaggle.com
zip
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open Benchmarks (2025). BrowseComp: A Benchmark for Browsing Agents [Dataset]. https://www.kaggle.com/datasets/open-benchmarks/browsecomp-a-benchmark-for-browsing-agents
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Jun 4, 2025
Dataset authored and provided by
Open Benchmarks
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Relevant links: * leaderboard: Coming soon * implementation: * publication: https://arxiv.org/abs/2504.12516 * original repository: https://github.com/openai/simple-evals/tree/main

Abstract We present BrowseComp, a simple yet challenging benchmark for measuring the ability for agents to browse the web. BrowseComp comprises 1,266 questions that require persistently navigating the internet in search of hard-to-find, entangled information. Despite the difficulty of the questions, BrowseComp is simple and easy-to-use, as predicted answers are short and easily verifiable against reference answers. BrowseComp for browsing agents can be seen as analogous to how programming competitions are an incomplete but useful benchmark for coding agents. While BrowseComp sidesteps challenges of a true user query distribution, like generating long answers or resolving ambiguity, it measures the important core capability of exercising persistence and creativity in finding information.
ChitroJera
kaggle.com
Updated Nov 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Sakib Ul Rahman Sourove (2024). ChitroJera [Dataset]. https://www.kaggle.com/datasets/sourove/chitrojera
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 12, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Md Sakib Ul Rahman Sourove
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Visual Question Answer (VQA) poses the problem of answering a natural language question about a visual context. Bangla, despite being a widely spoken language, is considered lowresource in the realm of VQA due to the lack of a proper benchmark dataset. The absence of such datasets challenges models that are known to be performant in other languages. Furthermore, existing Bangla VQA datasets offer little cultural relevance and are largely adapted from their foreign counterparts. To address these challenges, we introduce a large-scale Bangla VQA dataset titled ChitroJera, totaling over 15k samples where diverse and locally relevant data sources are used. We assess the performance of text encoders, image encoders, multimodal models, and our novel dual-encoder models. The experiments reveal that the pretrained dual-encoders outperform other models of its scale. We also evaluate the performance of large language models (LLMs) using promptbased techniques, with LLMs achieving the best performance. Given the underdeveloped state of existing datasets, we envision ChitroJera expanding the scope of Vision-Language tasks in Bangla.
UAV-benchmark
kaggle.com
Updated Mar 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dmitry Sokolov (2023). UAV-benchmark [Dataset]. https://www.kaggle.com/datasets/dimka11/uavbenchmark
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 13, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dmitry Sokolov
Description
Dataset

This dataset was created by Dmitry Sokolov

Contents
M4 Forecasting Competition Dataset
kaggle.com
zip
Updated Mar 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sri Yogesh (2020). M4 Forecasting Competition Dataset [Dataset]. https://www.kaggle.com/yogesh94/m4-forecasting-competition-dataset
Explore at:
zip(83502902 bytes)Available download formats
Dataset updated
Mar 9, 2020
Authors
Sri Yogesh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The M4 Forecasting Competition Dataset

The M4 competition which is a continuation of the Makridakis Competitions for forecasting and was conducted in 2018. This competion includes the prediction of both Point Forecasts and Prediction Intervals.

More Details

Paper describing the competition and the various benchmarks and approaches was published in a special edition of the International Journal of Forecasting and is available for open access and can be found here

Code for benchmarks

The code for various benchmarks on this dataset can be found at the following github repository

Source

The data is available at both the github link and the official website of MOFC
benchmark
kaggle.com
Updated May 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KoHanE (2023). benchmark [Dataset]. https://www.kaggle.com/datasets/kohane/benchmark/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 4, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
KoHanE
Description
Dataset

This dataset was created by KoHanE

Contents
AbRank: Antibody Affinity Ranking
kaggle.com
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aurélien Pélissier (2025). AbRank: Antibody Affinity Ranking [Dataset]. https://www.kaggle.com/datasets/aurlienplissier/abrank
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aurélien Pélissier
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
AbRank is a large-scale benchmark and evaluation framework that reframes affinity prediction as a pairwise ranking problem. It aggregates over 380,000 binding assays from nine heterogeneous sources, spanning diverse antibodies, antigens, and experimental conditions, and introduces standardized data splits that systematically increase distribution shift, from local perturbations such as point mutations to broad generalization across novel antigens and antibodies. To ensure robust supervision, AbRank defines a 10-confident ranking framework by filtering out comparisons with marginal affinity differences, focusing training on pairs with at least an 10-fold difference in measured binding strength.
Audio Benchmark
kaggle.com
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lisa Sharapova (2023). Audio Benchmark [Dataset]. https://www.kaggle.com/datasets/lallucycle/audio-benchmark
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Lisa Sharapova
Description
Dataset

This dataset was created by Lisa Sharapova

Contents
ZeroShot LLM4TS Benchmark
kaggle.com
Updated May 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vittorio Rossi (2024). ZeroShot LLM4TS Benchmark [Dataset]. https://www.kaggle.com/datasets/vittoriorossi/zeroshot-llm4ts-benchmark
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 26, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vittorio Rossi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The "ZeroShot LLM4TS Benchmark" dataset is designed to evaluate the performance of large language models (LLMs) in zero-shot time series forecasting tasks. The dataset contains various time series data from different domains, providing a comprehensive benchmark for testing LLM capabilities in forecasting without prior training on the specific data.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren (2024). DSEval-Kaggle Dataset [Dataset]. https://paperswithcode.com/dataset/dseval

DSEval-Kaggle Dataset

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Apr 19, 2024

Authors

Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren

Description

In this paper, we introduce a novel benchmarking framework designed specifically for evaluations of data science agents. Our contributions are three-fold. First, we propose DSEval, an evaluation paradigm that enlarges the evaluation scope to the full lifecycle of LLM-based data science agents. We also cover aspects including but not limited to the quality of the derived analytical solutions or machine learning models, as well as potential side effects such as unintentional changes to the original data. Second, we incorporate a novel bootstrapped annotation process letting LLM themselves generate and annotate the benchmarks with ``human in the loop''. A novel language (i.e., DSEAL) has been proposed and the derived four benchmarks have significantly improved the benchmark scalability and coverage, with largely reduced human labor. Third, based on DSEval and the four benchmarks, we conduct a comprehensive evaluation of various data science agents from different aspects. Our findings reveal the common challenges and limitations of the current works, providing useful insights and shedding light on future research on LLM-based data science agents.

This is one of DSEval benchmarks.

Clear search

Close search

Google apps

Main menu

DSEval-Kaggle Dataset

Performance vs. Predicted Performance

Description of *original_data.csv*

Description of *CompleteDataAndBiases.csv*

💾DDR4 Memory Benchmarks📊

Enhanced version of this dataset can be found here 🙂

Context

Content

Acknowledgements

If you enjoyed this dataset, here's some similar datasets you may like 😎

SNIPS Natural Language Understanding Benchmark:

Natural Language Understanding benchmark

Data from: Assessing predictive performance of supervised machine learning...

Kaggle, IHME and LANL Forecasts

Dataset

Contents

‘STUDENTS PERFORMANCE DATASET’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Chess XAI Benchmark

‘Body performance Data’ analyzed by Analyst-2

Context

Content

Source

Clustering benchmark datasets

Context

Content

Acknowledgements

Inspiration

Student Performance

Attributes for both Maths.csv (Math course) and Portuguese.csv (Portuguese language course) datasets:

These grades are related with the course subject, Math or Portuguese:

Student Engagement

Student Engagement

Predicting Engagement and Exam Performance

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

BrowseComp: A Benchmark for Browsing Agents

ChitroJera

UAV-benchmark

Dataset

Contents

M4 Forecasting Competition Dataset

The M4 Forecasting Competition Dataset

More Details

Code for benchmarks

Source

benchmark

Dataset

Contents

AbRank: Antibody Affinity Ranking

Audio Benchmark

Dataset

Contents

ZeroShot LLM4TS Benchmark

DSEval-Kaggle DatasetSee More Versions

**Description of *original_data.csv***

**Description of *CompleteDataAndBiases.csv***

DSEval-Kaggle Dataset