12 datasets found
  1. A

    ‘ Predicting Student Performance’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Mar 2, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘ Predicting Student Performance’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-predicting-student-performance-ec1b/b7296868/?iid=058-803&v=presentation
    Explore at:
    Dataset updated
    Mar 2, 2015
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘ Predicting Student Performance’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/student-performance on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    • This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

    How to use this dataset

    • Predict Student's future performance
    • Understand the root causes for low performance
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit ewenme

    --- Original source retains full ownership of the source dataset ---

  2. luizclaudiomcz/ADHD_Games_ResearchData: Version 1.0: Initial Release of ADHD...

    • zenodo.org
    zip
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luiz Claudio Ferreira; Luiz Claudio Ferreira (2025). luizclaudiomcz/ADHD_Games_ResearchData: Version 1.0: Initial Release of ADHD Games Research Data [Dataset]. http://doi.org/10.5281/zenodo.13361882
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Luiz Claudio Ferreira; Luiz Claudio Ferreira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Aug 22, 2024
    Description

    Project description:

    This publication contains the research data on the impact of digital games on students with ADHD. The data includes various formats such as CSV, Excel, and JSON.

    Data details:
    - Surveys on the use of digital games in education.
    - Students' socioeconomic profiles.
    - Academic performance data.

    Access to GitHub Repository:
    For access to the GitHub repository of this project, please visit: https://github.com/luizclaudiomcz/ADHD_Games_ResearchData.

    DOI:
    The data is available with DOI: https://doi.org/10.5281/zenodo.13361882.

    Additional notes:
    We appreciate the contributions of all collaborators who helped in collecting and organizing the data. For feedback or issues, please open an issue in the GitHub repository.

  3. Student Engagement

    • kaggle.com
    Updated Nov 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Student Engagement [Dataset]. https://www.kaggle.com/datasets/thedevastator/student-engagement-with-tableau-a-data-science-p
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 23, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Student Engagement

    Predicting Engagement and Exam Performance

    By [source]

    About this dataset

    This dataset contains information on student engagement with Tableau, including quizzes, exams, and lessons. The data includes the course title, the rating of the course, the date the course was rated, the exam category, the exam duration, whether the answer was correct or not, the number of quizzes completed, the number of exams completed, the number of lessons completed, the date engaged, the exam result, and more

    How to use the dataset

    The 'Student Engagement with Tableau' dataset offers insights into student engagement with the Tableau software. The data includes information on courses, exams, quizzes, and student learning.

    This dataset can be used to examine how students use Tableau, what kind of engagement leads to better learning outcomes, and whether certain course or exam characteristics are associated with student engagement

    Research Ideas

    • Creating a heat map of student engagement by course and location
    • Determining which courses are most popular among students from different countries
    • Identifying patterns in students' exam results

    Acknowledgements

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: 365_course_info.csv | Column name | Description | |:-----------------|:----------------------------------| | course_title | The title of the course. (String) |

    File: 365_course_ratings.csv | Column name | Description | |:------------------|:---------------------------------------------------------| | course_rating | The rating given to the course by the student. (Numeric) | | date_rated | The date on which the course was rated. (Date) |

    File: 365_exam_info.csv | Column name | Description | |:------------------|:-------------------------------------------------| | exam_category | The category of the exam. (Categorical) | | exam_duration | The duration of the exam in minutes. (Numerical) |

    File: 365_quiz_info.csv | Column name | Description | |:-------------------|:----------------------------------------------------------------------| | answer_correct | Whether or not the student answered the question correctly. (Boolean) |

    File: 365_student_engagement.csv | Column name | Description | |:-----------------------|:------------------------------------------------------------------| | engagement_quizzes | The number of times a student has engaged with quizzes. (Numeric) | | engagement_exams | The number of times a student has engaged with exams. (Numeric) | | engagement_lessons | The number of times a student has engaged with lessons. (Numeric) | | date_engaged | The date of the student's engagement. (Date) |

    File: 365_student_exams.csv | Column name | Description | |:-------------------------|:---------------------------------------------------| | exam_result | The result of the exam. (Categorical) | | exam_completion_time | The time it took to complete the exam. (Numerical) | | date_exam_completed | The date the exam was completed. (Date) |

    File: 365_student_hub_questions.csv | Column name | Description | |:------------------------|:----------------------------------------| | date_question_asked | The date the question was asked. (Date) |

    File: 365_student_info.csv | Column name | Description | |:--------------------|:-------------------------------------------------------| | student_country | The country of the student. (Categorical) | | date_registered | The date the student registered for the course. (Date) |

    File: 365_student_learning.csv | Column name | Description | |:--------------------|:------------------------------...

  4. Z

    studentlife data in RData format

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Fryer (2020). studentlife data in RData format [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3529252
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Daniel Fryer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the studentlife dataset, converted from it's original form (hosted at https://studentlife.cs.dartmouth.edu/dataset.html) into a series of R tibbles (which are similar to a data.frame) and stored in the RData format, for compression / speed and ease of use with R. Note, in this RData format the dataset takes up much less space than the original.

    These tibbles are suitable for use with the studentlife R package https://github.com/frycast/studentlife which is also available on CRAN, but we recommend installing the latest version from GitHub.

    Studentlife dataset reference:

    Wang, Rui, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari, Stefanie Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T. Campbell. "StudentLife: Assessing Mental Health, Academic Performance and Behavioral Trends of College Students using Smartphones." In Proceedings of the ACM Conference on Ubiquitous Computing. 2014.

  5. H

    Mapping Social and Environmental Justice Across California Schools

    • dataverse.harvard.edu
    Updated Sep 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jessica Baiza; Camille Pawlak; Alice Baehr; Jenn Yost; Matt Ritter; Andrew Fricker (2023). Mapping Social and Environmental Justice Across California Schools [Dataset]. http://doi.org/10.7910/DVN/7NNBJD
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 30, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Jessica Baiza; Camille Pawlak; Alice Baehr; Jenn Yost; Matt Ritter; Andrew Fricker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Dec 17, 2022 - May 3, 2023
    Area covered
    California
    Dataset funded by
    US Forest Service
    CAL FIRE
    Description

    This dataset provides a valuable tool for evidence-based decision-making to prioritize environmental initiatives for California schools. It offers a comprehensive assessment of key factors such as tree canopy cover, demographics, school performance, vulnerability to climate change and pollution, poverty factors, and more. By utilizing this GIS-based assessment, stakeholders can, for example, identify and prioritize schools in need of cooling and greening interventions. This data-driven approach ensures that vulnerable schools are given priority and resources can be allocated more efficiently to make a positive impact on the well-being and education of California's students. This dataset was made using this created script (script excludes academic performance indicators): https://github.com/camipawlak/school_enviro/tree/main The authors would like to acknowledge the support of Cal Poly's Office of University Diversity and Inclusion via the BEACoN Research Mentoring Program.

  6. COKI Open Access Dataset

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated Oct 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Hosking; Richard Hosking; James P. Diprose; James P. Diprose; Aniek Roelofs; Aniek Roelofs; Tuan-Yow Chien; Tuan-Yow Chien; Lucy Montgomery; Lucy Montgomery; Cameron Neylon; Cameron Neylon (2023). COKI Open Access Dataset [Dataset]. http://doi.org/10.5281/zenodo.7090742
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 3, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Richard Hosking; Richard Hosking; James P. Diprose; James P. Diprose; Aniek Roelofs; Aniek Roelofs; Tuan-Yow Chien; Tuan-Yow Chien; Lucy Montgomery; Lucy Montgomery; Cameron Neylon; Cameron Neylon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The COKI Open Access Dataset measures open access performance for 142 countries and 5117 institutions and is available in JSON Lines format. The data is visualised at the COKI Open Access Dashboard: https://open.coki.ac/.

    The COKI Open Access Dataset is created with the COKI Academic Observatory data collection pipeline, which fetches data about research publications from multiple sources, synthesises the datasets and creates the open access calculations for each country and institution.

    Each week a number of specialised research publication datasets are collected. The datasets that are used for the COKI Open Access Dataset release include Crossref Metadata, Microsoft Academic Graph, Unpaywall and the Research Organization Registry.

    After fetching the datasets, they are synthesised to produce aggregate time series statistics for each country and institution in the dataset. The aggregate timeseries statistics include publication count, open access status and citation count.

    See https://open.coki.ac/data/ for the dataset schema. A new version of the dataset is deposited every week.

    Code

    License
    COKI Open Access Dataset © 2022 by Curtin University is licenced under CC BY 4.0.

    Attributions
    This work contains information from:

  7. h

    SPARBench

    • huggingface.co
    Updated Jul 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XiaofengShi (2025). SPARBench [Dataset]. https://huggingface.co/datasets/MonteXiaofeng/SPARBench
    Explore at:
    Dataset updated
    Jul 23, 2025
    Authors
    XiaofengShi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    SPAR-Benchmark: A Realistic Evaluation Dataset for Academic Search Systems

    Paper: SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search Code: https://github.com/xiaofengShi/SPAR

      Benchmark Overview
    

    SPAR-Benchmark is an evaluation dataset constructed for realistic academic search scenarios, aiming to provide a reliable and practical performance evaluation foundation for academic search systems. The dataset covers the complete process from… See the full description on the dataset page: https://huggingface.co/datasets/MonteXiaofeng/SPARBench.

  8. Data EO summer school 2025

    • zenodo.org
    bin, tiff
    Updated Aug 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ku Ou; Ku Ou (2025). Data EO summer school 2025 [Dataset]. http://doi.org/10.5281/zenodo.16613694
    Explore at:
    bin, tiffAvailable download formats
    Dataset updated
    Aug 1, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ku Ou; Ku Ou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 2025
    Description
    Data for workshop: Tutorial: High performance computing with Python/RS-DAT, EO summer school 2025


    Date: September 3, 2025; 13:30 - 15:00

    For more details of the workshop, please check the GitHub repository:
    This dataset contains the following files:

  9. E

    Fon French Daily Dialogues Parallel Data

    • live.european-language-grid.eu
    • huggingface.co
    • +1more
    csv
    Updated Apr 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Fon French Daily Dialogues Parallel Data [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7709
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 11, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    We aim to collect, clean, and store corpora of Fon and French sentences for Natural Language Processing researches including Neural Machine Translation, Named Entity Recognition, etc. for Fon, a very low-resourced and endangered African native language.

    Fon (also called Fongbe) is an African-indigenous language spoken mostly in Benin, Togo, and Nigeria - by about 2 million people.

    As training data is crucial to the high performance of a machine learning model, the aim of this project is to compile the largest set of training corpora for the research and design of translation and NLP models involving Fon.

    Through crowdsourcing, Google Form Surveys, we gathered and cleaned #25377 parallel Fon-French# all based on daily conversations.

    To the crowdsourcing, creation, and cleaning of this version have contributed:

    1) Name: Bonaventure DOSSOUAffiliation: MSc Student in Data Engineering, Jacobs UniversityContact: femipancrace.dossou@gmail.com

    2) Name: Ricardo AHOUNVLAMEAffiliation: Student in LinguisticsContact: tontonjars@gmail.com

    3) Name: Fabroni YOCLOUNONAffiliation: Creator of the Label IamYourClounonContact: iamyourclounon@gmail.com

    4) Name: BeninLanguesAffiliation: BeninLanguesContact: https://beninlangues.com/

    5) Name: Chris EmezueAffiliation: MSc Student in Mathematics in Data Science, Technical University of MunichContact: chris.emezue@gmail.com

    _

    To join as a contributor, please contact us at: 1) https://twitter.com/bonadossou 2) https://twitter.com/ChrisEmezue 3) https://twitter.com/edAIOfficialOr contact Bonaventure Dossou (femipancrace.dossou@gmail.com), Chris Emezue (chris.emezue@gmail.com)_

    Clavier Fongbé (WebView): https://bonaventuredossou.github.io/clavierfongbe/ (Made by Bonaventure Dossou)Clavier Fongbé (Mobile Android Version): https://play.google.com/store/apps/details?id=com.fulbertodev.clavierfongbe&hl=en&gl=US (Fabroni Yoclounon, Bonventure Dossou et. al.)

  10. MAST melody dataset

    • zenodo.org
    zip
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Baris Bozkurt; Baris Bozkurt; Ozan Baysal; Ozan Baysal (2023). MAST melody dataset [Dataset]. http://doi.org/10.5281/zenodo.8007358
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Baris Bozkurt; Baris Bozkurt; Ozan Baysal; Ozan Baysal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Musical Aptitude Standard Test (MAST) melody dataset is designed and shared to facilitate comparison of algorithms in the field of automatic music performance assessment.

    The dataset includes melodic pattern reproduction performances by students (singing) together with the reference melodic pattern played on piano and assessment results. All recordings are collected during entrance exams (in 2015 and 2016) of Istanbul Technical University (ITU).

    The recordings are annotations in 2022 within the context of another research project supported by TUBITAK with grant number 121E198 as a part of the Scientific and Technological Research Projects Funding Program (1001).

    Annotations are performed via blind listening of individual performances after listening to a few renditions of the melodic pattern by the experts. The files were presented in random order (after grouping samples in terms of melodic patterns) (i.e. the expert annotated all samples of a melodic pattern in random order and moved to the next group of samples for the next melodic pattern).

    A 4-level grading system was used during the evaluations of the data set;
    1-Completely Off, 2-Major Mistakes, 3-Minor Mistakes, and 4-Perfect.

    Annotations were carried by 3 experts; a professor of musicology who has taken part as a jury member in entrance
    exam auditions, and two music conservatory students of graduate-level programs. The last two annotators re-annotated
    all collection a few months after the first annotation task. The csv file contains 5 annotations in 5 columns where
    two of these columns are for the repeated annotations. To facilitate analysis, we also added columns that carry a flag
    if all annotations match (column: 'fullAgree'), the score/grade all annotations agreed on (column: 'fullAgree_score')
    and the majority score.

    In addition to audio files and annotations, two commonly used features are also included: 1) f0-series extracted using Crepe Pitch Tracker (https://github.com/marl/crepe), 2) chroma features computed using Librosa library's chroma_stft function (https://librosa.org/doc/main/generated/librosa.feature.chroma_stft.html)

    The directory structure is:

    • annotations: Contains 5 distinct annotations by 3 experts in a scale 1-4 in a csv file.
    • f0data_crepe: MASTmelody dataset, latest version (f0 series extracted using Crepe Pitch Tracker)
    • audioFiles: Audio files sampled at 8kHz
    • chroma: chroma features computed using Librosa library's chroma_stft function using its default settings except n_chroma which is set to 24.

    If you use this dataset, please refer to the following paper which announced its original version:

    Bozkurt, B., Baysal, O., Yuret, D. A Dataset and Baseline System for Singing Voice Assessment, 13th Int. Symposium on Computer Music Multidisciplinary Research, Porto, Sept. 25-28, 2017.

    @inproceedings{inproceedings,
     author={Bozkurt, B., Baysal, O., Yuret, D.},
     title={A Dataset and Baseline System for Singing Voice Assessment},
     year={2017},
     booktitle={13th Int. Symposium on Computer Music Multidisciplinary Research, CMMR 2017}
    }
  11. Interpreting accuracy revisted: A refined approach to interpreting...

    • zenodo.org
    Updated Sep 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anne Catherine Gieshoff; Anne Catherine Gieshoff; Michaela Albl-Mikasa; Michaela Albl-Mikasa (2024). Interpreting accuracy revisted: A refined approach to interpreting performance analysis. Data set [Dataset]. http://doi.org/10.5281/zenodo.5764021
    Explore at:
    Dataset updated
    Sep 28, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anne Catherine Gieshoff; Anne Catherine Gieshoff; Michaela Albl-Mikasa; Michaela Albl-Mikasa
    Description

    This data set contains data on interpreting accuracy of ten professional and ten student interpreters. Information on the level of expertise is contained in the filename: "student" refers to a student's rendition, "professional" to a rendition by a professional interpreting. The data set further includes information on:

    • unit: The meaning unit in the source text (anonymized for rata protection resons)
    • identfier: a number from 1 to 488 to identify each unit
    • sentence: The full sentence taht contains the unit (anynomized for data protection reasons)
    • rating: whether the unit was correctly (1) rendered, or incorrectly/missing (0),
    • category: information about the category of the unit
    • weighing: weighing assigned to each category
    • score: score obtained for the unit

    For data protection reasons and in compliance with the declaration of Helsinki, the data set does not contain information about the source text or the rendition as a transcript.

    The data set is primarily used as an example data set to assess interpreting accuracy. Please also refer to the following publication:

    Gieshoff, A. C., & Albl-Mikasa, M. (2022). Interpreting accuracy revisited: a refined approach to interpreting performance analysis. Perspectives, 32(2), 210–228. https://doi.org/10.1080/0907676X.2022.2088296
    The data set can be re-analysed using the following R-script: https://github.com/ac-gieshoff/interpreting-accuracy

  12. E-Commerce Products Dataset For Record Linkage

    • kaggle.com
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Furkan Gözükara (2025). E-Commerce Products Dataset For Record Linkage [Dataset]. https://www.kaggle.com/furkangozukara/ecommerce-products-dataset-for-record-linkage/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Furkan Gözükara
    Description

    -> If you use Turkish_Ecommerce_Products_by_Gozukara_and_Ozel_2016 dataset, please cite: https://academic.oup.com/comjnl/advance-article-abstract/doi/10.1093/comjnl/bxab179/6425234

    @article{10.1093/comjnl/bxab179, author = {Gözükara, Furkan and Özel, Selma Ayşe}, title = "{An Incremental Hierarchical Clustering Based System For Record Linkage In E-Commerce Domain}", journal = {The Computer Journal}, year = {2021}, month = {11}, abstract = "{In this study, a novel record linkage system for E-commerce products is presented. Our system aims to cluster the same products that are crawled from different E-commerce websites into the same cluster. The proposed system achieves a very high success rate by combining both semi-supervised and unsupervised approaches. Unlike the previously proposed systems in the literature, neither a training set nor structured corpora are necessary. The core of the system is based on Hierarchical Agglomerative Clustering (HAC); however, the HAC algorithm is modified to be dynamic such that it can efficiently cluster a stream of incoming new data. Since the proposed system does not depend on any prior data, it can cluster new products. The system uses bag-of-words representation of the product titles, employs a single distance metric, exploits multiple domain-based attributes and does not depend on the characteristics of the natural language used in the product records. To our knowledge, there is no commonly used tool or technique to measure the quality of a clustering task. Therefore in this study, we use ELKI (Environment for Developing KDD-Applications Supported by Index-Structures), an open-source data mining software, for performance measurement of the clustering methods; and show how to use ELKI for this purpose. To evaluate our system, we collect our own dataset and make it publicly available to researchers who study E-commerce product clustering. Our proposed system achieves 96.25\% F-Measure according to our experimental analysis. The other state-of-the-art clustering systems obtain the best 89.12\% F-Measure.}", issn = {0010-4620}, doi = {10.1093/comjnl/bxab179}, url = {https://doi.org/10.1093/comjnl/bxab179}, note = {bxab179}, eprint = {https://academic.oup.com/comjnl/advance-article-pdf/doi/10.1093/comjnl/bxab179/41133297/bxab179.pdf}, }

    -> elki-bundle-0.7.2-SNAPSHOT.jar Is the ELKI bundle that we have compiled from the github source code of ELKI. The date of the source code is 6 June 2016. The compile command is as below: ->-> mvn -DskipTests -Dmaven.javadoc.skip=true -P svg,bundle package ->-> Github repository of ELKI: https://github.com/elki-project/elki ->-> This bundle file is used for all of the experiments that are presented in the article

    -> Turkish_Ecommerce_Products_by_Gozukara_and_Ozel_2016 dataset is composed as below: ->-> Top 50 E-commerce websites that operate in Turkey are crawled, and their attributes are extracted. ->-> The crawling is made between 2015-01-13 15:12:46 ---- 2015-01-17 19:07:53 dates. ->-> Then 250 product offers from Vatanbilgisayar are randomly selected. ->-> Then the entire dataset is manually scanned to find which other products that are sold in different E-commerce websites are same as the selected ones. ->-> Then each product is classified respectively. ->-> This dataset contains these products along with their price (if available), title, categories (if available), free text description (if available), wrapped features (if available), crawled URL (the URL might have expired) attributes

    -> The dataset files are provided as used in the study. -> ARFF files are generated with Raw Frequency of terms rather than used Weighting Schemes for All_Products and Only_Price_Having_Products. The reason is, we have tested these datasets with only our system and since our system does incremental clustering, even if provide TF-IDF weightings, they wouldn't be same as used in the article. More information provided in the article. ->-> For Macro_Average_Datasets we provide both Raw frequency and TF-IDF scheme weightings as used in the experiments

    -> There are 3 main folders -> All_Products: This folder contains 1800 products. ->-> This is the entire collection that is manually labeled. ->-> They are from 250 different classes. -> Only_Price_Having_Products: This folder contains all of the products that have the price feature set. ->-> The collection has 1721 products from 250 classes. ->-> This is the dataset that we have experimented. -> Macro_Average_Datasets: This folder contains 100 datasets that we have used to conduct more reliable experiments. ->-> Each dataset is composed of selecting 1000 different products from the price having products dataset and then randomly ordering them...

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘ Predicting Student Performance’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-predicting-student-performance-ec1b/b7296868/?iid=058-803&v=presentation

‘ Predicting Student Performance’ analyzed by Analyst-2

Explore at:
Dataset updated
Mar 2, 2015
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘ Predicting Student Performance’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/student-performance on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

  • This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

How to use this dataset

  • Predict Student's future performance
  • Understand the root causes for low performance
  • More datasets

Acknowledgements

If you use this dataset in your research, please credit ewenme

--- Original source retains full ownership of the source dataset ---

Search
Clear search
Close search
Google apps
Main menu