100+ datasets found
  1. Student Performance Data Set

    • kaggle.com
    zip
    Updated Mar 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
    Explore at:
    zip(12353 bytes)Available download formats
    Dataset updated
    Mar 27, 2020
    Authors
    Data-Science Sean
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

  2. Student Sample Dataset

    • kaggle.com
    zip
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabru1912 (2023). Student Sample Dataset [Dataset]. https://www.kaggle.com/datasets/gabru1912/student-sample-dataset
    Explore at:
    zip(306 bytes)Available download formats
    Dataset updated
    Sep 6, 2023
    Authors
    Gabru1912
    Description

    Dataset

    This dataset was created by Gabru1912

    Contents

  3. Predict students' dropout and academic success

    • kaggle.com
    zip
    Updated Jan 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Predict students' dropout and academic success [Dataset]. https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention
    Explore at:
    zip(89332 bytes)Available download formats
    Dataset updated
    Jan 3, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Predict students' dropout and academic success

    Investigating the Impact of Social and Economic Factors

    By [source]

    About this dataset

    This dataset provides a comprehensive view of students enrolled in various undergraduate degrees offered at a higher education institution. It includes demographic data, social-economic factors and academic performance information that can be used to analyze the possible predictors of student dropout and academic success. This dataset contains multiple disjoint databases consisting of relevant information available at the time of enrollment, such as application mode, marital status, course chosen and more. Additionally, this data can be used to estimate overall student performance at the end of each semester by assessing curricular units credited/enrolled/evaluated/approved as well as their respective grades. Finally, we have unemployment rate, inflation rate and GDP from the region which can help us further understand how economic factors play into student dropout rates or academic success outcomes. This powerful analysis tool will provide valuable insight into what motivates students to stay in school or abandon their studies for a wide range of disciplines such as agronomy, design, education nursing journalism management social service or technologies

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be used to understand and predict student dropouts and academic outcomes. The data includes a variety of demographic, social-economic and academic performance factors related to the students enrolled in higher education institutions. The dataset provides valuable insights into the factors that affect student success and could be used to guide interventions and policies related to student retention.

    Using this dataset, researchers can investigate two key questions: - which specific predictive factors are linked with student dropout or completion? - how do different features interact with each other? For example, researchers could explore if there any demographic characteristics (e.g., gender, age at enrollment etc.) or immersion conditions (e.g., unemployment rate in region) are associated with higher student success rates, as well as understand what implications poverty has for educational outcomes. By answering these questions, research insight is generated which can provide critical information for administrators on formulating strategies that promote successful degree completion among students from diverse backgrounds in their institutions.

    In order to use this dataset effectively it is important that scientists familiarize themselves with all variables provided in the dataset including categorical (qualitative) variables such as gender or application mode; numerical variables such as number of curricular units at the beginning of semesters or age at enrollment; ordinal data measurement type variables such as marital status; studied trends over time such as inflation rate or GDP; frequency measurements variables like percentage of scholarship holders; etc.. Additionally scientists should make sure they aware off all potential bias included in the data prior running analysis–for example understanding if one population is underrepresented compared another -as this phenomenon could lead unexpected results if not taken into consideration while conducting research undertaken using this data set.. Finally it would be important for practitioners realize that this current Kaggle Dataset contains only one semester-worth information on each admission intake whereas additional studies conducted for a longer time period might be able provide more accurate results related selected topic area due further deterioration retention achievement coefficients obtained from those gradually accurate experiments unfolding different year-long admissions seasons

    Research Ideas

    • Prediction of Student Retention: This dataset can be used to develop predictive models that can identify student risk factors for dropout and take early interventions to improve student retention rate.
    • Improved Academic Performance: By using this data, higher education institutions could better understand their students' academic progress and identify areas of improvement from both an individual and institutional perspective. This will enable them to develop targeted courses, activities, or initiatives that enhance academic performance more effectively and efficiently.
    • Accessibility Assistance: Using the demographic information included in the dataset, institutions could develop s...
  4. d

    School Attendance by District, 2020-2021

    • catalog.data.gov
    • data.ct.gov
    • +2more
    Updated Jun 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ct.gov (2025). School Attendance by District, 2020-2021 [Dataset]. https://catalog.data.gov/dataset/school-attendance-by-district-2020-2021
    Explore at:
    Dataset updated
    Jun 28, 2025
    Dataset provided by
    data.ct.gov
    Description

    This dataset includes the attendance rate for public school students PK-12 by district during the 2020-2021 school year. Attendance rates are provided for each district for the overall student population and for the high needs student population. Students who are considered high needs include students who are English language learners, who receive special education, or who qualify for free and reduced lunch. When no attendance data is displayed in a cell, data have been suppressed to safeguard student confidentiality, or to ensure that statistics based on a very small sample size are not interpreted as equally representative as those based on a sufficiently larger sample size. For more information on CSDE data suppression policies, please visit http://edsight.ct.gov/relatedreports/BDCRE%20Data%20Suppression%20Rules.pdf.

  5. c

    Student Enrollment - Datasets - CTData.org

    • data.ctdata.org
    Updated Mar 16, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Student Enrollment - Datasets - CTData.org [Dataset]. http://data.ctdata.org/dataset/student-enrollment
    Explore at:
    Dataset updated
    Mar 16, 2016
    Description

    Student Enrollment reports the number of enrolled students, per grade.

  6. d

    School Attendance by Student Group and District, 2021-2022

    • catalog.data.gov
    • data.ct.gov
    • +2more
    Updated Jun 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ct.gov (2025). School Attendance by Student Group and District, 2021-2022 [Dataset]. https://catalog.data.gov/dataset/school-attendance-by-student-group-and-district-2021-2022
    Explore at:
    Dataset updated
    Jun 21, 2025
    Dataset provided by
    data.ct.gov
    Description

    This dataset includes the attendance rate for public school students PK-12 by student group and by district during the 2021-2022 school year. Student groups include: Students experiencing homelessness Students with disabilities Students who qualify for free/reduced lunch English learners All high needs students Non-high needs students Students by race/ethnicity (Hispanic/Latino of any race, Black or African American, White, All other races) Attendance rates are provided for each student group by district and for the state. Students who are considered high needs include students who are English language learners, who receive special education, or who qualify for free and reduced lunch. When no attendance data is displayed in a cell, data have been suppressed to safeguard student confidentiality, or to ensure that statistics based on a very small sample size are not interpreted as equally representative as those based on a sufficiently larger sample size. For more information on CSDE data suppression policies, please visit http://edsight.ct.gov/relatedreports/BDCRE%20Data%20Suppression%20Rules.pdf.

  7. h

    Data from: sample-datasets

    • huggingface.co
    Updated Mar 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nimish (2023). sample-datasets [Dataset]. https://huggingface.co/datasets/nsanghi/sample-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 18, 2023
    Authors
    Nimish
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    nsanghi/sample-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  9. h

    Data from: sample-datasets

    • huggingface.co
    Updated Apr 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir (2023). sample-datasets [Dataset]. https://huggingface.co/datasets/amir7d0/sample-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2023
    Authors
    Amir
    Description

    amir7d0/sample-datasets dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. u

    Steam Video Game and Bundle Data

    • cseweb.ucsd.edu
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Steam Video Game and Bundle Data [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    These datasets contain reviews from the Steam video game platform, and information about which games were bundled together.

    Metadata includes

    • reviews

    • purchases, plays, recommends (likes)

    • product bundles

    • pricing information

    Basic Statistics:

    • Reviews: 7,793,069

    • Users: 2,567,538

    • Items: 15,474

    • Bundles: 615

  11. m

    Raw data outputs 1-18

    • bridges.monash.edu
    • researchdata.edu.au
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie (2023). Raw data outputs 1-18 [Dataset]. http://doi.org/10.26180/21259491.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Monash University
    Authors
    Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.

  12. Sample Students Grades Dataset Tutorial Notebook

    • kaggle.com
    zip
    Updated Jun 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arya Shah (2022). Sample Students Grades Dataset Tutorial Notebook [Dataset]. https://www.kaggle.com/datasets/aryashah2k/sample-students-grades-dataset-tutorial-notebook/discussion?sort=undefined
    Explore at:
    zip(360 bytes)Available download formats
    Dataset updated
    Jun 7, 2022
    Authors
    Arya Shah
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a sample Dataset used for the Tutorial Notebook Titled:

    Mistakes To Avoid In Data Science⚠️| Python🐍

    Follow the Notebook here: https://www.kaggle.com/code/aryashah2k/mistakes-to-avoid-in-data-science-python

  13. h

    sample-data-upload

    • huggingface.co
    Updated Apr 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    aman (2025). sample-data-upload [Dataset]. https://huggingface.co/datasets/ns-1/sample-data-upload
    Explore at:
    Dataset updated
    Apr 9, 2025
    Authors
    aman
    Description

    This directory includes a few sample datasets to get you started.

    california_housing_data*.csv is California housing data from the 1990 US Census; more information is available at: https://docs.google.com/document/d/e/2PACX-1vRhYtsvc5eOR2FWNCwaBiKL6suIOrxJig8LcSBbmCbyYsayia_DvPOOBlXZ4CAlQ5nlDD8kTaIDRwrN/pub

    mnist_*.csv is a small sample of the MNIST database, which is described at: http://yann.lecun.com/exdb/mnist/

    anscombe.json contains a copy of Anscombe's quartet; it was originally… See the full description on the dataset page: https://huggingface.co/datasets/ns-1/sample-data-upload.

  14. g

    Sample dataset

    • carlvlewis.github.io
    • siciliahub.github.io
    • +7more
    api, csv, shp
    Updated Sep 22, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Sample dataset [Dataset]. https://carlvlewis.github.io/jkan/datasets/sample-dataset/
    Explore at:
    shp, api, csvAvailable download formats
    Dataset updated
    Sep 22, 2018
    Description

    This is an example dataset that comes with a new installation of JKAN

  15. z

    Requirements data sets (user stories)

    • zenodo.org
    • data.mendeley.com
    txt
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabiano Dalpiaz; Fabiano Dalpiaz (2025). Requirements data sets (user stories) [Dataset]. http://doi.org/10.17632/7zbk8zsd8y.1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    Mendeley Data
    Authors
    Fabiano Dalpiaz; Fabiano Dalpiaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A collection of 22 data set of 50+ requirements each, expressed as user stories.

    The dataset has been created by gathering data from web sources and we are not aware of license agreements or intellectual property rights on the requirements / user stories. The curator took utmost diligence in minimizing the risks of copyright infringement by using non-recent data that is less likely to be critical, by sampling a subset of the original requirements collection, and by qualitatively analyzing the requirements. In case of copyright infringement, please contact the dataset curator (Fabiano Dalpiaz, f.dalpiaz@uu.nl) to discuss the possibility of removal of that dataset [see Zenodo's policies]

    The data sets have been originally used to conduct experiments about ambiguity detection with the REVV-Light tool: https://github.com/RELabUU/revv-light

    This collection has been originally published in Mendeley data: https://data.mendeley.com/datasets/7zbk8zsd8y/1

    Overview of the datasets [data and links added in December 2024]

    The following text provides a description of the datasets, including links to the systems and websites, when available. The datasets are organized by macro-category and then by identifier.

    Public administration and transparency

    g02-federalspending.txt (2018) originates from early data in the Federal Spending Transparency project, which pertain to the website that is used to share publicly the spending data for the U.S. government. The website was created because of the Digital Accountability and Transparency Act of 2014 (DATA Act). The specific dataset pertains a system called DAIMS or Data Broker, which stands for DATA Act Information Model Schema. The sample that was gathered refers to a sub-project related to allowing the government to act as a data broker, thereby providing data to third parties. The data for the Data Broker project is currently not available online, although the backend seems to be hosted in GitHub under a CC0 1.0 Universal license. Current and recent snapshots of federal spending related websites, including many more projects than the one described in the shared collection, can be found here.

    g03-loudoun.txt (2018) is a set of extracted requirements from a document, by the Loudoun County Virginia, that describes the to-be user stories and use cases about a system for land management readiness assessment called Loudoun County LandMARC. The source document can be found here and it is part of the Electronic Land Management System and EPlan Review Project - RFP RFQ issued in March 2018. More information about the overall LandMARC system and services can be found here.

    g04-recycling.txt(2017) concerns a web application where recycling and waste disposal facilities can be searched and located. The application operates through the visualization of a map that the user can interact with. The dataset has obtained from a GitHub website and it is at the basis of a students' project on web site design; the code is available (no license).

    g05-openspending.txt (2018) is about the OpenSpending project (www), a project of the Open Knowledge foundation which aims at transparency about how local governments spend money. At the time of the collection, the data was retrieved from a Trello board that is currently unavailable. The sample focuses on publishing, importing and editing datasets, and how the data should be presented. Currently, OpenSpending is managed via a GitHub repository which contains multiple sub-projects with unknown license.

    g11-nsf.txt (2018) refers to a collection of user stories referring to the NSF Site Redesign & Content Discovery project, which originates from a publicly accessible GitHub repository (GPL 2.0 license). In particular, the user stories refer to an early version of the NSF's website. The user stories can be found as closed Issues.

    (Research) data and meta-data management

    g08-frictionless.txt (2016) regards the Frictionless Data project, which offers an open source dataset for building data infrastructures, to be used by researchers, data scientists, and data engineers. Links to the many projects within the Frictionless Data project are on GitHub (with a mix of Unlicense and MIT license) and web. The specific set of user stories has been collected in 2016 by GitHub user @danfowler and are stored in a Trello board.

    g14-datahub.txt (2013) concerns the open source project DataHub, which is currently developed via a GitHub repository (the code has Apache License 2.0). DataHub is a data discovery platform which has been developed over multiple years. The specific data set is an initial set of user stories, which we can date back to 2013 thanks to a comment therein.

    g16-mis.txt (2015) is a collection of user stories that pertains a repository for researchers and archivists. The source of the dataset is a public Trello repository. Although the user stories do not have explicit links to projects, it can be inferred that the stories originate from some project related to the library of Duke University.

    g17-cask.txt (2016) refers to the Cask Data Application Platform (CDAP). CDAP is an open source application platform (GitHub, under Apache License 2.0) that can be used to develop applications within the Apache Hadoop ecosystem, an open-source framework which can be used for distributed processing of large datasets. The user stories are extracted from a document that includes requirements regarding dataset management for Cask 4.0, which includes the scenarios, user stories and a design for the implementation of these user stories. The raw data is available in the following environment.

    g18-neurohub.txt (2012) is concerned with the NeuroHub platform, a neuroscience data management, analysis and collaboration platform for researchers in neuroscience to collect, store, and share data with colleagues or with the research community. The user stories were collected at a time NeuroHub was still a research project sponsored by the UK Joint Information Systems Committee (JISC). For information about the research project from which the requirements were collected, see the following record.

    g22-rdadmp.txt (2018) is a collection of user stories from the Research Data Alliance's working group on DMP Common Standards. Their GitHub repository contains a collection of user stories that were created by asking the community to suggest functionality that should part of a website that manages data management plans. Each user story is stored as an issue on the GitHub's page.

    g23-archivesspace.txt (2012-2013) refers to ArchivesSpace: an open source, web application for managing archives information. The application is designed to support core functions in archives administration such as accessioning; description and arrangement of processed materials including analog, hybrid, and
    born digital content; management of authorities and rights; and reference service. The application supports collection management through collection management records, tracking of events, and a growing number of administrative reports. ArchivesSpace is open source and its

  16. e

    resource metadata sample - Datasets - OpenData.eol.org

    • opendata.eol.org
    Updated Aug 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). resource metadata sample - Datasets - OpenData.eol.org [Dataset]. https://opendata.eol.org/dataset/resource-metadata-sample
    Explore at:
    Dataset updated
    Aug 30, 2022
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    For registration on https://opentraits.org/ of all branch painting datasets as of Aug 30, 2022

  17. h

    Data from: Rotor37

    • huggingface.co
    Updated May 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PLAID-datasets (2025). Rotor37 [Dataset]. https://huggingface.co/datasets/PLAID-datasets/Rotor37
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 5, 2025
    Dataset authored and provided by
    PLAID-datasets
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card

    This dataset contains a single huggingface split, named 'all_samples'. The samples contains a single huggingface feature, named called "sample". Samples are instances of plaid.containers.sample.Sample. Mesh objects included in samples follow the CGNS standard, and can be converted in Muscat.Containers.Mesh.Mesh. Example of commands: from datasets import load_dataset from plaid.bridges.huggingface_bridge import huggingface_dataset_to_plaid

    hf_dataset =… See the full description on the dataset page: https://huggingface.co/datasets/PLAID-datasets/Rotor37.

  18. Student Performance Dataset

    • kaggle.com
    Updated Aug 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghulam Muhammad Nabeel (2025). Student Performance Dataset [Dataset]. https://www.kaggle.com/datasets/nabeelqureshitiii/student-performance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ghulam Muhammad Nabeel
    Description

    📊 Student Performance Dataset (Synthetic, Realistic)

    Overview

    This dataset contains 1000000 rows of realistic student performance data, designed for beginners in Machine Learning to practice Linear Regression, model training, and evaluation techniques.

    Each row represents one student with features like study hours, attendance, class participation, and final score.
    The dataset is small, clean, and structured to be beginner-friendly.

    🔑 Columns Description

    • student_id → Unique identifier for each student.
    • weekly_self_study_hours → Average weekly self-study hours (0–40). Generated using a normal distribution centered around 15 hours.
    • attendance_percentage → Attendance percentage (50–100). Simulated with a normal distribution around 85%.
    • class_participation → Score between 0–10 indicating how actively the student participates in class. Generated from a normal distribution centered around 6.
    • total_score → Final performance score (0–100). Calculated as a function of study hours + random noise, then clipped between 0–100. Stronger correlation with study hours.
    • grade → Categorical label (A, B, C, D, F) derived from total_score.

    📐 Data Generation Logic

    1. Weekly Study Hours: Modeled using a normal distribution (mean ≈ 15, std ≈ 7), capped between 0 and 40 hours.
    2. Scores: More study hours → higher score. Formula:

    Random noise simulates differences in learning ability, motivation, etc.

    1. Attendance & Participation: Independent but realistic variations added.
    2. Grades: Assigned from scores using thresholds:
    • A: ≥ 85
    • B: ≥ 70
    • C: ≥ 55
    • D: ≥ 40
    • F: < 40

    🎯 How to Use This Dataset

    Regression Tasks

    • Predict total_score from weekly_self_study_hours.
    • Train and evaluate Linear Regression models.
    • Extend to multiple regression using attendance_percentage and class_participation.

    Classification Tasks

    • Predict grade (A–F) using study hours, attendance, and participation.

    Model Evaluation Practice

    • Apply train-test split and cross-validation.
    • Evaluate with MAE, RMSE, R².
    • Compare simple vs. multiple regression.

    ✅ This dataset is intentionally kept simple, so that new ML learners can clearly see the relationship between input features (study, attendance, participation) and output (score/grade).

  19. d

    (HS 3) Large Spatial Sample Datasets in Maryland

    • search.dataone.org
    Updated Dec 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Don Choi (2023). (HS 3) Large Spatial Sample Datasets in Maryland [Dataset]. https://search.dataone.org/view/sha256%3A0c1fb040332273a6f82c7474485554a47dd112d2f39225f9ff00889d5db26581
    Explore at:
    Dataset updated
    Dec 30, 2023
    Dataset provided by
    Hydroshare
    Authors
    Young-Don Choi
    Area covered
    Description

    This HydroShare resource was created to share large spatial sample datasets in Maryland on GeoServer (https://geoserver.hydroshare.org/geoserver/web/wicket/bookmarkable/org.geoserver.web.demo.MapPreviewPage) and THREDDS (https://thredds.hydroshare.org/thredds/catalog/hydroshare/resources/catalog.html).

    Users can check the uploaded LSS datasets on HydroShare-GeoServer and THREDDS using this HS resource id.

    Then, through the RHESSys workflows, users can subset LSS datasets using OWSLib and xarray.

  20. m

    PISA 2018 - Sample of Spanish student data

    • data.mendeley.com
    • portalinvestigacion.uniovi.es
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miguel Alvarez-Garcia (2023). PISA 2018 - Sample of Spanish student data [Dataset]. http://doi.org/10.17632/hwj6bfb899.1
    Explore at:
    Dataset updated
    Jul 11, 2023
    Authors
    Miguel Alvarez-Garcia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset for manuscript "A comprehensive framework for explainable cluster analysis".

    This dataset contains student data collected for the OECD’s Programme for International Student Assessment (PISA). Specifically, we use a sample of 5,000 students randomly selected from the 35,943 Spanish students who took the PISA 2018 survey [1]. A total of 80 variables are selected.

    The dataset contains two files: - A CSV file containing the student data to which an additional column, stu_original_order has been added as unique identifier. - An Excel file containing a description of all variables.

    References [1] OECD, PISA 2018 Technical Report, 2020. URL https://www.oecd.org/pisa/data/pisa2018technicalreport/

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
Organization logo

Student Performance Data Set

Student achievement in secondary education of two Portuguese schools.

Explore at:
10 scholarly articles cite this dataset (View in Google Scholar)
zip(12353 bytes)Available download formats
Dataset updated
Mar 27, 2020
Authors
Data-Science Sean
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

Search
Clear search
Close search
Google apps
Main menu