12 datasets found

A
‘ Predicting Student Performance’ analyzed by Analyst-2
analyst-2.ai
Updated Mar 2, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘ Predicting Student Performance’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-predicting-student-performance-ec1b/b7296868/?iid=058-803&v=presentation
Explore at:
Dataset updated
Mar 2, 2015
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘ Predicting Student Performance’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/student-performance on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

How to use this dataset

Predict Student's future performance

Understand the root causes for low performance

More datasets

Acknowledgements

If you use this dataset in your research, please credit ewenme

--- Original source retains full ownership of the source dataset ---
luizclaudiomcz/ADHD_Games_ResearchData: Version 1.0: Initial Release of ADHD...
zenodo.org
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luiz Claudio Ferreira; Luiz Claudio Ferreira (2025). luizclaudiomcz/ADHD_Games_ResearchData: Version 1.0: Initial Release of ADHD Games Research Data [Dataset]. http://doi.org/10.5281/zenodo.13361882
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13361882
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Luiz Claudio Ferreira; Luiz Claudio Ferreira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Aug 22, 2024
Description
Project description:

This publication contains the research data on the impact of digital games on students with ADHD. The data includes various formats such as CSV, Excel, and JSON.

Data details:
- Surveys on the use of digital games in education.
- Students' socioeconomic profiles.
- Academic performance data.

Access to GitHub Repository:
For access to the GitHub repository of this project, please visit: https://github.com/luizclaudiomcz/ADHD_Games_ResearchData.

DOI:
The data is available with DOI: https://doi.org/10.5281/zenodo.13361882.

Additional notes:
We appreciate the contributions of all collaborators who helped in collecting and organizing the data. For feedback or issues, please open an issue in the GitHub repository.
Student Engagement
kaggle.com
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Student Engagement [Dataset]. https://www.kaggle.com/datasets/thedevastator/student-engagement-with-tableau-a-data-science-p
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 23, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Student Engagement

Predicting Engagement and Exam Performance

By [source]

About this dataset

This dataset contains information on student engagement with Tableau, including quizzes, exams, and lessons. The data includes the course title, the rating of the course, the date the course was rated, the exam category, the exam duration, whether the answer was correct or not, the number of quizzes completed, the number of exams completed, the number of lessons completed, the date engaged, the exam result, and more

How to use the dataset

The 'Student Engagement with Tableau' dataset offers insights into student engagement with the Tableau software. The data includes information on courses, exams, quizzes, and student learning.

This dataset can be used to examine how students use Tableau, what kind of engagement leads to better learning outcomes, and whether certain course or exam characteristics are associated with student engagement

Research Ideas

Creating a heat map of student engagement by course and location

Determining which courses are most popular among students from different countries

Identifying patterns in students' exam results

Acknowledgements

Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: 365_course_info.csv | Column name | Description | |:-----------------|:----------------------------------| | course_title | The title of the course. (String) |

File: 365_course_ratings.csv | Column name | Description | |:------------------|:---------------------------------------------------------| | course_rating | The rating given to the course by the student. (Numeric) | | date_rated | The date on which the course was rated. (Date) |

File: 365_exam_info.csv | Column name | Description | |:------------------|:-------------------------------------------------| | exam_category | The category of the exam. (Categorical) | | exam_duration | The duration of the exam in minutes. (Numerical) |

File: 365_quiz_info.csv | Column name | Description | |:-------------------|:----------------------------------------------------------------------| | answer_correct | Whether or not the student answered the question correctly. (Boolean) |

File: 365_student_engagement.csv | Column name | Description | |:-----------------------|:------------------------------------------------------------------| | engagement_quizzes | The number of times a student has engaged with quizzes. (Numeric) | | engagement_exams | The number of times a student has engaged with exams. (Numeric) | | engagement_lessons | The number of times a student has engaged with lessons. (Numeric) | | date_engaged | The date of the student's engagement. (Date) |

File: 365_student_exams.csv | Column name | Description | |:-------------------------|:---------------------------------------------------| | exam_result | The result of the exam. (Categorical) | | exam_completion_time | The time it took to complete the exam. (Numerical) | | date_exam_completed | The date the exam was completed. (Date) |

File: 365_student_hub_questions.csv | Column name | Description | |:------------------------|:----------------------------------------| | date_question_asked | The date the question was asked. (Date) |

File: 365_student_info.csv | Column name | Description | |:--------------------|:-------------------------------------------------------| | student_country | The country of the student. (Categorical) | | date_registered | The date the student registered for the course. (Date) |

File: 365_student_learning.csv | Column name | Description | |:--------------------|:------------------------------...
Z
studentlife data in RData format
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Fryer (2020). studentlife data in RData format [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3529252
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Daniel Fryer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the studentlife dataset, converted from it's original form (hosted at https://studentlife.cs.dartmouth.edu/dataset.html) into a series of R tibbles (which are similar to a data.frame) and stored in the RData format, for compression / speed and ease of use with R. Note, in this RData format the dataset takes up much less space than the original.

These tibbles are suitable for use with the studentlife R package https://github.com/frycast/studentlife which is also available on CRAN, but we recommend installing the latest version from GitHub.

Studentlife dataset reference:

Wang, Rui, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari, Stefanie Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T. Campbell. "StudentLife: Assessing Mental Health, Academic Performance and Behavioral Trends of College Students using Smartphones." In Proceedings of the ACM Conference on Ubiquitous Computing. 2014.
H
Mapping Social and Environmental Justice Across California Schools
dataverse.harvard.edu
Updated Sep 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jessica Baiza; Camille Pawlak; Alice Baehr; Jenn Yost; Matt Ritter; Andrew Fricker (2023). Mapping Social and Environmental Justice Across California Schools [Dataset]. http://doi.org/10.7910/DVN/7NNBJD
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/7NNBJD
Dataset updated
Sep 30, 2023
Dataset provided by
Harvard Dataverse
Authors
Jessica Baiza; Camille Pawlak; Alice Baehr; Jenn Yost; Matt Ritter; Andrew Fricker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Dec 17, 2022 - May 3, 2023
Area covered
California
Dataset funded by
US Forest Service
CAL FIRE
Description
This dataset provides a valuable tool for evidence-based decision-making to prioritize environmental initiatives for California schools. It offers a comprehensive assessment of key factors such as tree canopy cover, demographics, school performance, vulnerability to climate change and pollution, poverty factors, and more. By utilizing this GIS-based assessment, stakeholders can, for example, identify and prioritize schools in need of cooling and greening interventions. This data-driven approach ensures that vulnerable schools are given priority and resources can be allocated more efficiently to make a positive impact on the well-being and education of California's students. This dataset was made using this created script (script excludes academic performance indicators): https://github.com/camipawlak/school_enviro/tree/main The authors would like to acknowledge the support of Cal Poly's Office of University Diversity and Inclusion via the BEACoN Research Mentoring Program.
COKI Open Access Dataset
zenodo.org
explore.openaire.eu
+1more
zip
Updated Oct 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Hosking; Richard Hosking; James P. Diprose; James P. Diprose; Aniek Roelofs; Aniek Roelofs; Tuan-Yow Chien; Tuan-Yow Chien; Lucy Montgomery; Lucy Montgomery; Cameron Neylon; Cameron Neylon (2023). COKI Open Access Dataset [Dataset]. http://doi.org/10.5281/zenodo.7090742
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7090742
Dataset updated
Oct 3, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Richard Hosking; Richard Hosking; James P. Diprose; James P. Diprose; Aniek Roelofs; Aniek Roelofs; Tuan-Yow Chien; Tuan-Yow Chien; Lucy Montgomery; Lucy Montgomery; Cameron Neylon; Cameron Neylon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The COKI Open Access Dataset measures open access performance for 142 countries and 5117 institutions and is available in JSON Lines format. The data is visualised at the COKI Open Access Dashboard: https://open.coki.ac/.

The COKI Open Access Dataset is created with the COKI Academic Observatory data collection pipeline, which fetches data about research publications from multiple sources, synthesises the datasets and creates the open access calculations for each country and institution.

Each week a number of specialised research publication datasets are collected. The datasets that are used for the COKI Open Access Dataset release include Crossref Metadata, Microsoft Academic Graph, Unpaywall and the Research Organization Registry.

After fetching the datasets, they are synthesised to produce aggregate time series statistics for each country and institution in the dataset. The aggregate timeseries statistics include publication count, open access status and citation count.

See https://open.coki.ac/data/ for the dataset schema. A new version of the dataset is deposited every week.

Code

The COKI Academic Observatory data collection pipeline is used to create the dataset.

The COKI OA Website Github project contains the code for the web app that visualises the dataset at open.coki.ac. It can be found on Zenodo here.

License
COKI Open Access Dataset © 2022 by Curtin University is licenced under CC BY 4.0.

Attributions
This work contains information from:

Microsoft Academic Graph which is made available under the ODC Attribution Licence.

Crossref Metadata via the Metadata Plus program. Bibliographic metadata is made available without copyright restriction and Crossref generated data under a CC0 licence. See metadata licence information for more details.

Unpaywall. The Unpaywall Data Feed is used under license. Data is freely available from Unpaywall via the API, data dumps and as a data feed.

Research Organization Registry which is made available under a CC0 licence.
h
SPARBench
huggingface.co
Updated Jul 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
XiaofengShi (2025). SPARBench [Dataset]. https://huggingface.co/datasets/MonteXiaofeng/SPARBench
Explore at:
Dataset updated
Jul 23, 2025
Authors
XiaofengShi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
SPAR-Benchmark: A Realistic Evaluation Dataset for Academic Search Systems

Paper: SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search Code: https://github.com/xiaofengShi/SPAR

Benchmark Overview

SPAR-Benchmark is an evaluation dataset constructed for realistic academic search scenarios, aiming to provide a reliable and practical performance evaluation foundation for academic search systems. The dataset covers the complete process from… See the full description on the dataset page: https://huggingface.co/datasets/MonteXiaofeng/SPARBench.
Data EO summer school 2025
zenodo.org
bin, tiff
Updated Aug 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ku Ou; Ku Ou (2025). Data EO summer school 2025 [Dataset]. http://doi.org/10.5281/zenodo.16613694
Explore at:
bin, tiffAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16613694
Dataset updated
Aug 1, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ku Ou; Ku Ou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 2025
Description
Data for workshop: Tutorial: High performance computing with Python/RS-DAT, EO summer school 2025

Date: September 3, 2025; 13:30 - 15:00

For more details of the workshop, please check the GitHub repository:

RS-DAT/2025-09-03-EO-summer-school: Material for the EO Summer School 2025 by OpenGeoHub

This dataset contains the following files:

sentinel2_rgb_res_20_size_8000_cog.tif: Sentinel-2 image with RGB band of the area of interest, resolution 20m, size 8000x8000 pixels

sentinel2_rgb_res_20_cutout.tif: a cutout of the above image

waterbody_labels.gpkg: manually created waterbody polygons in the cutout

none_waterbody_labels.gpkg: manually created none-waterbody polygons in the cutout
E
Fon French Daily Dialogues Parallel Data
live.european-language-grid.eu
huggingface.co
+1more
csv
Updated Apr 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Fon French Daily Dialogues Parallel Data [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7709
Explore at:
csvAvailable download formats
Dataset updated
Apr 11, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
French
Description
We aim to collect, clean, and store corpora of Fon and French sentences for Natural Language Processing researches including Neural Machine Translation, Named Entity Recognition, etc. for Fon, a very low-resourced and endangered African native language.
Fon (also called Fongbe) is an African-indigenous language spoken mostly in Benin, Togo, and Nigeria - by about 2 million people.
As training data is crucial to the high performance of a machine learning model, the aim of this project is to compile the largest set of training corpora for the research and design of translation and NLP models involving Fon.
Through crowdsourcing, Google Form Surveys, we gathered and cleaned #25377 parallel Fon-French# all based on daily conversations.
To the crowdsourcing, creation, and cleaning of this version have contributed:
1) Name: Bonaventure DOSSOUAffiliation: MSc Student in Data Engineering, Jacobs UniversityContact: femipancrace.dossou@gmail.com
2) Name: Ricardo AHOUNVLAMEAffiliation: Student in LinguisticsContact: tontonjars@gmail.com
3) Name: Fabroni YOCLOUNONAffiliation: Creator of the Label IamYourClounonContact: iamyourclounon@gmail.com
4) Name: BeninLanguesAffiliation: BeninLanguesContact: https://beninlangues.com/
5) Name: Chris EmezueAffiliation: MSc Student in Mathematics in Data Science, Technical University of MunichContact: chris.emezue@gmail.com
_
To join as a contributor, please contact us at: 1) https://twitter.com/bonadossou 2) https://twitter.com/ChrisEmezue 3) https://twitter.com/edAIOfficialOr contact Bonaventure Dossou (femipancrace.dossou@gmail.com), Chris Emezue (chris.emezue@gmail.com)_
Clavier Fongbé (WebView): https://bonaventuredossou.github.io/clavierfongbe/ (Made by Bonaventure Dossou)Clavier Fongbé (Mobile Android Version): https://play.google.com/store/apps/details?id=com.fulbertodev.clavierfongbe&hl=en&gl=US (Fabroni Yoclounon, Bonventure Dossou et. al.)
MAST melody dataset
zenodo.org
zip
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Baris Bozkurt; Baris Bozkurt; Ozan Baysal; Ozan Baysal (2023). MAST melody dataset [Dataset]. http://doi.org/10.5281/zenodo.8007358
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8007358
Dataset updated
Jun 6, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Baris Bozkurt; Baris Bozkurt; Ozan Baysal; Ozan Baysal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Musical Aptitude Standard Test (MAST) melody dataset is designed and shared to facilitate comparison of algorithms in the field of automatic music performance assessment.

The dataset includes melodic pattern reproduction performances by students (singing) together with the reference melodic pattern played on piano and assessment results. All recordings are collected during entrance exams (in 2015 and 2016) of Istanbul Technical University (ITU).

The recordings are annotations in 2022 within the context of another research project supported by TUBITAK with grant number 121E198 as a part of the Scientific and Technological Research Projects Funding Program (1001).

Annotations are performed via blind listening of individual performances after listening to a few renditions of the melodic pattern by the experts. The files were presented in random order (after grouping samples in terms of melodic patterns) (i.e. the expert annotated all samples of a melodic pattern in random order and moved to the next group of samples for the next melodic pattern).

A 4-level grading system was used during the evaluations of the data set;
1-Completely Off, 2-Major Mistakes, 3-Minor Mistakes, and 4-Perfect.

Annotations were carried by 3 experts; a professor of musicology who has taken part as a jury member in entrance
exam auditions, and two music conservatory students of graduate-level programs. The last two annotators re-annotated
all collection a few months after the first annotation task. The csv file contains 5 annotations in 5 columns where
two of these columns are for the repeated annotations. To facilitate analysis, we also added columns that carry a flag
if all annotations match (column: 'fullAgree'), the score/grade all annotations agreed on (column: 'fullAgree_score')
and the majority score.

In addition to audio files and annotations, two commonly used features are also included: 1) f0-series extracted using Crepe Pitch Tracker (https://github.com/marl/crepe), 2) chroma features computed using Librosa library's chroma_stft function (https://librosa.org/doc/main/generated/librosa.feature.chroma_stft.html)

The directory structure is:

annotations: Contains 5 distinct annotations by 3 experts in a scale 1-4 in a csv file.

f0data_crepe: MASTmelody dataset, latest version (f0 series extracted using Crepe Pitch Tracker)

audioFiles: Audio files sampled at 8kHz

chroma: chroma features computed using Librosa library's chroma_stft function using its default settings except n_chroma which is set to 24.

If you use this dataset, please refer to the following paper which announced its original version:

Bozkurt, B., Baysal, O., Yuret, D. A Dataset and Baseline System for Singing Voice Assessment, 13th Int. Symposium on Computer Music Multidisciplinary Research, Porto, Sept. 25-28, 2017.

@inproceedings{inproceedings, author={Bozkurt, B., Baysal, O., Yuret, D.}, title={A Dataset and Baseline System for Singing Voice Assessment}, year={2017}, booktitle={13th Int. Symposium on Computer Music Multidisciplinary Research, CMMR 2017} }
Interpreting accuracy revisted: A refined approach to interpreting...
zenodo.org
Updated Sep 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anne Catherine Gieshoff; Anne Catherine Gieshoff; Michaela Albl-Mikasa; Michaela Albl-Mikasa (2024). Interpreting accuracy revisted: A refined approach to interpreting performance analysis. Data set [Dataset]. http://doi.org/10.5281/zenodo.5764021
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5764021
Dataset updated
Sep 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anne Catherine Gieshoff; Anne Catherine Gieshoff; Michaela Albl-Mikasa; Michaela Albl-Mikasa
Description
This data set contains data on interpreting accuracy of ten professional and ten student interpreters. Information on the level of expertise is contained in the filename: "student" refers to a student's rendition, "professional" to a rendition by a professional interpreting. The data set further includes information on:

unit: The meaning unit in the source text (anonymized for rata protection resons)

identfier: a number from 1 to 488 to identify each unit

sentence: The full sentence taht contains the unit (anynomized for data protection reasons)

rating: whether the unit was correctly (1) rendered, or incorrectly/missing (0),

category: information about the category of the unit

weighing: weighing assigned to each category

score: score obtained for the unit

For data protection reasons and in compliance with the declaration of Helsinki, the data set does not contain information about the source text or the rendition as a transcript.

The data set is primarily used as an example data set to assess interpreting accuracy. Please also refer to the following publication:

Gieshoff, A. C., & Albl-Mikasa, M. (2022). Interpreting accuracy revisited: a refined approach to interpreting performance analysis. Perspectives, 32(2), 210–228. https://doi.org/10.1080/0907676X.2022.2088296

The data set can be re-analysed using the following R-script: https://github.com/ac-gieshoff/interpreting-accuracy
E-Commerce Products Dataset For Record Linkage
kaggle.com
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Furkan Gözükara (2025). E-Commerce Products Dataset For Record Linkage [Dataset]. https://www.kaggle.com/furkangozukara/ecommerce-products-dataset-for-record-linkage/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Furkan Gözükara
Description
-> If you use Turkish_Ecommerce_Products_by_Gozukara_and_Ozel_2016 dataset, please cite: https://academic.oup.com/comjnl/advance-article-abstract/doi/10.1093/comjnl/bxab179/6425234

@article{10.1093/comjnl/bxab179, author = {Gözükara, Furkan and Özel, Selma Ayşe}, title = "{An Incremental Hierarchical Clustering Based System For Record Linkage In E-Commerce Domain}", journal = {The Computer Journal}, year = {2021}, month = {11}, abstract = "{In this study, a novel record linkage system for E-commerce products is presented. Our system aims to cluster the same products that are crawled from different E-commerce websites into the same cluster. The proposed system achieves a very high success rate by combining both semi-supervised and unsupervised approaches. Unlike the previously proposed systems in the literature, neither a training set nor structured corpora are necessary. The core of the system is based on Hierarchical Agglomerative Clustering (HAC); however, the HAC algorithm is modified to be dynamic such that it can efficiently cluster a stream of incoming new data. Since the proposed system does not depend on any prior data, it can cluster new products. The system uses bag-of-words representation of the product titles, employs a single distance metric, exploits multiple domain-based attributes and does not depend on the characteristics of the natural language used in the product records. To our knowledge, there is no commonly used tool or technique to measure the quality of a clustering task. Therefore in this study, we use ELKI (Environment for Developing KDD-Applications Supported by Index-Structures), an open-source data mining software, for performance measurement of the clustering methods; and show how to use ELKI for this purpose. To evaluate our system, we collect our own dataset and make it publicly available to researchers who study E-commerce product clustering. Our proposed system achieves 96.25\% F-Measure according to our experimental analysis. The other state-of-the-art clustering systems obtain the best 89.12\% F-Measure.}", issn = {0010-4620}, doi = {10.1093/comjnl/bxab179}, url = {https://doi.org/10.1093/comjnl/bxab179}, note = {bxab179}, eprint = {https://academic.oup.com/comjnl/advance-article-pdf/doi/10.1093/comjnl/bxab179/41133297/bxab179.pdf}, }

-> elki-bundle-0.7.2-SNAPSHOT.jar Is the ELKI bundle that we have compiled from the github source code of ELKI. The date of the source code is 6 June 2016. The compile command is as below: ->-> mvn -DskipTests -Dmaven.javadoc.skip=true -P svg,bundle package ->-> Github repository of ELKI: https://github.com/elki-project/elki ->-> This bundle file is used for all of the experiments that are presented in the article

-> Turkish_Ecommerce_Products_by_Gozukara_and_Ozel_2016 dataset is composed as below: ->-> Top 50 E-commerce websites that operate in Turkey are crawled, and their attributes are extracted. ->-> The crawling is made between 2015-01-13 15:12:46 ---- 2015-01-17 19:07:53 dates. ->-> Then 250 product offers from Vatanbilgisayar are randomly selected. ->-> Then the entire dataset is manually scanned to find which other products that are sold in different E-commerce websites are same as the selected ones. ->-> Then each product is classified respectively. ->-> This dataset contains these products along with their price (if available), title, categories (if available), free text description (if available), wrapped features (if available), crawled URL (the URL might have expired) attributes

-> The dataset files are provided as used in the study. -> ARFF files are generated with Raw Frequency of terms rather than used Weighting Schemes for All_Products and Only_Price_Having_Products. The reason is, we have tested these datasets with only our system and since our system does incremental clustering, even if provide TF-IDF weightings, they wouldn't be same as used in the article. More information provided in the article. ->-> For Macro_Average_Datasets we provide both Raw frequency and TF-IDF scheme weightings as used in the experiments

-> There are 3 main folders -> All_Products: This folder contains 1800 products. ->-> This is the entire collection that is manually labeled. ->-> They are from 250 different classes. -> Only_Price_Having_Products: This folder contains all of the products that have the price feature set. ->-> The collection has 1721 products from 250 classes. ->-> This is the dataset that we have experimented. -> Macro_Average_Datasets: This folder contains 100 datasets that we have used to conduct more reliable experiments. ->-> Each dataset is composed of selecting 1000 different products from the price having products dataset and then randomly ordering them...
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘ Predicting Student Performance’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-predicting-student-performance-ec1b/b7296868/?iid=058-803&v=presentation

‘ Predicting Student Performance’ analyzed by Analyst-2

Explore at:

Dataset updated

Mar 2, 2015

Dataset authored and provided by

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘ Predicting Student Performance’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/student-performance on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

How to use this dataset

Predict Student's future performance

Understand the root causes for low performance

More datasets

Acknowledgements

If you use this dataset in your research, please credit ewenme

--- Original source retains full ownership of the source dataset ---

Clear search

Close search

Google apps

Main menu

‘ Predicting Student Performance’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

luizclaudiomcz/ADHD_Games_ResearchData: Version 1.0: Initial Release of ADHD...

Student Engagement

Student Engagement

Predicting Engagement and Exam Performance

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

studentlife data in RData format

Mapping Social and Environmental Justice Across California Schools

COKI Open Access Dataset

SPARBench

Data EO summer school 2025

Fon French Daily Dialogues Parallel Data

MAST melody dataset

Interpreting accuracy revisted: A refined approach to interpreting...

E-Commerce Products Dataset For Record Linkage

‘ Predicting Student Performance’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements