Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘ Predicting Student Performance’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/student-performance on 28 January 2022.
--- Dataset description provided by original source is as follows ---
- This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
- Predict Student's future performance
- Understand the root causes for low performance
- More datasets
If you use this dataset in your research, please credit ewenme
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project description:
This publication contains the research data on the impact of digital games on students with ADHD. The data includes various formats such as CSV, Excel, and JSON.
Data details:
- Surveys on the use of digital games in education.
- Students' socioeconomic profiles.
- Academic performance data.
Access to GitHub Repository:
For access to the GitHub repository of this project, please visit: https://github.com/luizclaudiomcz/ADHD_Games_ResearchData.
DOI:
The data is available with DOI: https://doi.org/10.5281/zenodo.13361882.
Additional notes:
We appreciate the contributions of all collaborators who helped in collecting and organizing the data. For feedback or issues, please open an issue in the GitHub repository.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset contains information on student engagement with Tableau, including quizzes, exams, and lessons. The data includes the course title, the rating of the course, the date the course was rated, the exam category, the exam duration, whether the answer was correct or not, the number of quizzes completed, the number of exams completed, the number of lessons completed, the date engaged, the exam result, and more
The 'Student Engagement with Tableau' dataset offers insights into student engagement with the Tableau software. The data includes information on courses, exams, quizzes, and student learning.
This dataset can be used to examine how students use Tableau, what kind of engagement leads to better learning outcomes, and whether certain course or exam characteristics are associated with student engagement
- Creating a heat map of student engagement by course and location
- Determining which courses are most popular among students from different countries
- Identifying patterns in students' exam results
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: 365_course_info.csv | Column name | Description | |:-----------------|:----------------------------------| | course_title | The title of the course. (String) |
File: 365_course_ratings.csv | Column name | Description | |:------------------|:---------------------------------------------------------| | course_rating | The rating given to the course by the student. (Numeric) | | date_rated | The date on which the course was rated. (Date) |
File: 365_exam_info.csv | Column name | Description | |:------------------|:-------------------------------------------------| | exam_category | The category of the exam. (Categorical) | | exam_duration | The duration of the exam in minutes. (Numerical) |
File: 365_quiz_info.csv | Column name | Description | |:-------------------|:----------------------------------------------------------------------| | answer_correct | Whether or not the student answered the question correctly. (Boolean) |
File: 365_student_engagement.csv | Column name | Description | |:-----------------------|:------------------------------------------------------------------| | engagement_quizzes | The number of times a student has engaged with quizzes. (Numeric) | | engagement_exams | The number of times a student has engaged with exams. (Numeric) | | engagement_lessons | The number of times a student has engaged with lessons. (Numeric) | | date_engaged | The date of the student's engagement. (Date) |
File: 365_student_exams.csv | Column name | Description | |:-------------------------|:---------------------------------------------------| | exam_result | The result of the exam. (Categorical) | | exam_completion_time | The time it took to complete the exam. (Numerical) | | date_exam_completed | The date the exam was completed. (Date) |
File: 365_student_hub_questions.csv | Column name | Description | |:------------------------|:----------------------------------------| | date_question_asked | The date the question was asked. (Date) |
File: 365_student_info.csv | Column name | Description | |:--------------------|:-------------------------------------------------------| | student_country | The country of the student. (Categorical) | | date_registered | The date the student registered for the course. (Date) |
File: 365_student_learning.csv | Column name | Description | |:--------------------|:------------------------------...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the studentlife dataset, converted from it's original form (hosted at https://studentlife.cs.dartmouth.edu/dataset.html) into a series of R tibbles (which are similar to a data.frame) and stored in the RData format, for compression / speed and ease of use with R. Note, in this RData format the dataset takes up much less space than the original.
These tibbles are suitable for use with the studentlife R package https://github.com/frycast/studentlife which is also available on CRAN, but we recommend installing the latest version from GitHub.
Studentlife dataset reference:
Wang, Rui, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari, Stefanie Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T. Campbell. "StudentLife: Assessing Mental Health, Academic Performance and Behavioral Trends of College Students using Smartphones." In Proceedings of the ACM Conference on Ubiquitous Computing. 2014.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides a valuable tool for evidence-based decision-making to prioritize environmental initiatives for California schools. It offers a comprehensive assessment of key factors such as tree canopy cover, demographics, school performance, vulnerability to climate change and pollution, poverty factors, and more. By utilizing this GIS-based assessment, stakeholders can, for example, identify and prioritize schools in need of cooling and greening interventions. This data-driven approach ensures that vulnerable schools are given priority and resources can be allocated more efficiently to make a positive impact on the well-being and education of California's students. This dataset was made using this created script (script excludes academic performance indicators): https://github.com/camipawlak/school_enviro/tree/main The authors would like to acknowledge the support of Cal Poly's Office of University Diversity and Inclusion via the BEACoN Research Mentoring Program.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The COKI Open Access Dataset measures open access performance for 142 countries and 5117 institutions and is available in JSON Lines format. The data is visualised at the COKI Open Access Dashboard: https://open.coki.ac/.
The COKI Open Access Dataset is created with the COKI Academic Observatory data collection pipeline, which fetches data about research publications from multiple sources, synthesises the datasets and creates the open access calculations for each country and institution.
Each week a number of specialised research publication datasets are collected. The datasets that are used for the COKI Open Access Dataset release include Crossref Metadata, Microsoft Academic Graph, Unpaywall and the Research Organization Registry.
After fetching the datasets, they are synthesised to produce aggregate time series statistics for each country and institution in the dataset. The aggregate timeseries statistics include publication count, open access status and citation count.
See https://open.coki.ac/data/ for the dataset schema. A new version of the dataset is deposited every week.
Code
License
COKI Open Access Dataset © 2022 by Curtin University is licenced under CC BY 4.0.
Attributions
This work contains information from:
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
SPAR-Benchmark: A Realistic Evaluation Dataset for Academic Search Systems
Paper: SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search Code: https://github.com/xiaofengShi/SPAR
Benchmark Overview
SPAR-Benchmark is an evaluation dataset constructed for realistic academic search scenarios, aiming to provide a reliable and practical performance evaluation foundation for academic search systems. The dataset covers the complete process from… See the full description on the dataset page: https://huggingface.co/datasets/MonteXiaofeng/SPARBench.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Date: September 3, 2025; 13:30 - 15:00
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We aim to collect, clean, and store corpora of Fon and French sentences for Natural Language Processing researches including Neural Machine Translation, Named Entity Recognition, etc. for Fon, a very low-resourced and endangered African native language.
Fon (also called Fongbe) is an African-indigenous language spoken mostly in Benin, Togo, and Nigeria - by about 2 million people.
As training data is crucial to the high performance of a machine learning model, the aim of this project is to compile the largest set of training corpora for the research and design of translation and NLP models involving Fon.
Through crowdsourcing, Google Form Surveys, we gathered and cleaned #25377 parallel Fon-French# all based on daily conversations.
To the crowdsourcing, creation, and cleaning of this version have contributed:
1) Name: Bonaventure DOSSOUAffiliation: MSc Student in Data Engineering, Jacobs UniversityContact: femipancrace.dossou@gmail.com
2) Name: Ricardo AHOUNVLAMEAffiliation: Student in LinguisticsContact: tontonjars@gmail.com
3) Name: Fabroni YOCLOUNONAffiliation: Creator of the Label IamYourClounonContact: iamyourclounon@gmail.com
4) Name: BeninLanguesAffiliation: BeninLanguesContact: https://beninlangues.com/
5) Name: Chris EmezueAffiliation: MSc Student in Mathematics in Data Science, Technical University of MunichContact: chris.emezue@gmail.com
_
To join as a contributor, please contact us at: 1) https://twitter.com/bonadossou 2) https://twitter.com/ChrisEmezue 3) https://twitter.com/edAIOfficialOr contact Bonaventure Dossou (femipancrace.dossou@gmail.com), Chris Emezue (chris.emezue@gmail.com)_
Clavier Fongbé (WebView): https://bonaventuredossou.github.io/clavierfongbe/ (Made by Bonaventure Dossou)Clavier Fongbé (Mobile Android Version): https://play.google.com/store/apps/details?id=com.fulbertodev.clavierfongbe&hl=en&gl=US (Fabroni Yoclounon, Bonventure Dossou et. al.)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Musical Aptitude Standard Test (MAST) melody dataset is designed and shared to facilitate comparison of algorithms in the field of automatic music performance assessment.
The dataset includes melodic pattern reproduction performances by students (singing) together with the reference melodic pattern played on piano and assessment results. All recordings are collected during entrance exams (in 2015 and 2016) of Istanbul Technical University (ITU).
The recordings are annotations in 2022 within the context of another research project supported by TUBITAK with grant number 121E198 as a part of the Scientific and Technological Research Projects Funding Program (1001).
Annotations are performed via blind listening of individual performances after listening to a few renditions of the melodic pattern by the experts. The files were presented in random order (after grouping samples in terms of melodic patterns) (i.e. the expert annotated all samples of a melodic pattern in random order and moved to the next group of samples for the next melodic pattern).
A 4-level grading system was used during the evaluations of the data set;
1-Completely Off, 2-Major Mistakes, 3-Minor Mistakes, and 4-Perfect.
Annotations were carried by 3 experts; a professor of musicology who has taken part as a jury member in entrance
exam auditions, and two music conservatory students of graduate-level programs. The last two annotators re-annotated
all collection a few months after the first annotation task. The csv file contains 5 annotations in 5 columns where
two of these columns are for the repeated annotations. To facilitate analysis, we also added columns that carry a flag
if all annotations match (column: 'fullAgree'), the score/grade all annotations agreed on (column: 'fullAgree_score')
and the majority score.
In addition to audio files and annotations, two commonly used features are also included: 1) f0-series extracted using Crepe Pitch Tracker (https://github.com/marl/crepe), 2) chroma features computed using Librosa library's chroma_stft function (https://librosa.org/doc/main/generated/librosa.feature.chroma_stft.html)
The directory structure is:
If you use this dataset, please refer to the following paper which announced its original version:
Bozkurt, B., Baysal, O., Yuret, D. A Dataset and Baseline System for Singing Voice Assessment, 13th Int. Symposium on Computer Music Multidisciplinary Research, Porto, Sept. 25-28, 2017.
@inproceedings{inproceedings,
author={Bozkurt, B., Baysal, O., Yuret, D.},
title={A Dataset and Baseline System for Singing Voice Assessment},
year={2017},
booktitle={13th Int. Symposium on Computer Music Multidisciplinary Research, CMMR 2017}
}
This data set contains data on interpreting accuracy of ten professional and ten student interpreters. Information on the level of expertise is contained in the filename: "student" refers to a student's rendition, "professional" to a rendition by a professional interpreting. The data set further includes information on:
For data protection reasons and in compliance with the declaration of Helsinki, the data set does not contain information about the source text or the rendition as a transcript.
The data set is primarily used as an example data set to assess interpreting accuracy. Please also refer to the following publication:
-> If you use Turkish_Ecommerce_Products_by_Gozukara_and_Ozel_2016 dataset, please cite: https://academic.oup.com/comjnl/advance-article-abstract/doi/10.1093/comjnl/bxab179/6425234
@article{10.1093/comjnl/bxab179, author = {Gözükara, Furkan and Özel, Selma Ayşe}, title = "{An Incremental Hierarchical Clustering Based System For Record Linkage In E-Commerce Domain}", journal = {The Computer Journal}, year = {2021}, month = {11}, abstract = "{In this study, a novel record linkage system for E-commerce products is presented. Our system aims to cluster the same products that are crawled from different E-commerce websites into the same cluster. The proposed system achieves a very high success rate by combining both semi-supervised and unsupervised approaches. Unlike the previously proposed systems in the literature, neither a training set nor structured corpora are necessary. The core of the system is based on Hierarchical Agglomerative Clustering (HAC); however, the HAC algorithm is modified to be dynamic such that it can efficiently cluster a stream of incoming new data. Since the proposed system does not depend on any prior data, it can cluster new products. The system uses bag-of-words representation of the product titles, employs a single distance metric, exploits multiple domain-based attributes and does not depend on the characteristics of the natural language used in the product records. To our knowledge, there is no commonly used tool or technique to measure the quality of a clustering task. Therefore in this study, we use ELKI (Environment for Developing KDD-Applications Supported by Index-Structures), an open-source data mining software, for performance measurement of the clustering methods; and show how to use ELKI for this purpose. To evaluate our system, we collect our own dataset and make it publicly available to researchers who study E-commerce product clustering. Our proposed system achieves 96.25\% F-Measure according to our experimental analysis. The other state-of-the-art clustering systems obtain the best 89.12\% F-Measure.}", issn = {0010-4620}, doi = {10.1093/comjnl/bxab179}, url = {https://doi.org/10.1093/comjnl/bxab179}, note = {bxab179}, eprint = {https://academic.oup.com/comjnl/advance-article-pdf/doi/10.1093/comjnl/bxab179/41133297/bxab179.pdf}, }
-> elki-bundle-0.7.2-SNAPSHOT.jar Is the ELKI bundle that we have compiled from the github source code of ELKI. The date of the source code is 6 June 2016. The compile command is as below: ->-> mvn -DskipTests -Dmaven.javadoc.skip=true -P svg,bundle package ->-> Github repository of ELKI: https://github.com/elki-project/elki ->-> This bundle file is used for all of the experiments that are presented in the article
-> Turkish_Ecommerce_Products_by_Gozukara_and_Ozel_2016 dataset is composed as below: ->-> Top 50 E-commerce websites that operate in Turkey are crawled, and their attributes are extracted. ->-> The crawling is made between 2015-01-13 15:12:46 ---- 2015-01-17 19:07:53 dates. ->-> Then 250 product offers from Vatanbilgisayar are randomly selected. ->-> Then the entire dataset is manually scanned to find which other products that are sold in different E-commerce websites are same as the selected ones. ->-> Then each product is classified respectively. ->-> This dataset contains these products along with their price (if available), title, categories (if available), free text description (if available), wrapped features (if available), crawled URL (the URL might have expired) attributes
-> The dataset files are provided as used in the study. -> ARFF files are generated with Raw Frequency of terms rather than used Weighting Schemes for All_Products and Only_Price_Having_Products. The reason is, we have tested these datasets with only our system and since our system does incremental clustering, even if provide TF-IDF weightings, they wouldn't be same as used in the article. More information provided in the article. ->-> For Macro_Average_Datasets we provide both Raw frequency and TF-IDF scheme weightings as used in the experiments
-> There are 3 main folders -> All_Products: This folder contains 1800 products. ->-> This is the entire collection that is manually labeled. ->-> They are from 250 different classes. -> Only_Price_Having_Products: This folder contains all of the products that have the price feature set. ->-> The collection has 1721 products from 250 classes. ->-> This is the dataset that we have experimented. -> Macro_Average_Datasets: This folder contains 100 datasets that we have used to conduct more reliable experiments. ->-> Each dataset is composed of selecting 1000 different products from the price having products dataset and then randomly ordering them...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘ Predicting Student Performance’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/student-performance on 28 January 2022.
--- Dataset description provided by original source is as follows ---
- This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
- Predict Student's future performance
- Understand the root causes for low performance
- More datasets
If you use this dataset in your research, please credit ewenme
--- Original source retains full ownership of the source dataset ---