100+ datasets found

👨‍🎓 Open University Learning Analytics
kaggle.com
zip
Updated Mar 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mexwell (2024). 👨‍🎓 Open University Learning Analytics [Dataset]. https://www.kaggle.com/datasets/mexwell/open-university-learning-analytics
Explore at:
zip(44198573 bytes)Available download formats
Dataset updated
Mar 5, 2024
Authors
mexwell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset introduces the anonymised Open University Learning Analytics Dataset (OULAD). It contains data about courses, students and their interactions with Virtual Learning Environment (VLE) for seven selected courses (called modules). Presentations of courses start in February and October - they are marked by “B” and “J” respectively. The dataset consists of tables connected using unique identifiers. All tables are stored in the csv format.

Database schema

https://analyse.kmi.open.ac.uk/resources/images/model.png" alt="">

courses.csv File contains the list of all available modules and their presentations. The columns are: - code_module – code name of the module, which serves as the identifier. - code_presentation – code name of the presentation. It consists of the year and “B” for the presentation starting in February and “J” for the presentation starting in October. - length - length of the module-presentation in days.

The structure of B and J presentations may differ and therefore it is good practice to analyse the B and J presentations separately. Nevertheless, for some presentations the corresponding previous B/J presentation do not exist and therefore the J presentation must be used to inform the B presentation or vice versa. In the dataset this is the case of CCC, EEE and GGG modules.

assessments.csv This file contains information about assessments in module-presentations. Usually, every presentation has a number of assessments followed by the final exam. CSV contains columns:

code_module – identification code of the module, to which the assessment belongs.

code_presentation - identification code of the presentation, to which the assessment belongs.

id_assessment – identification number of the assessment.

assessment_type – type of assessment. Three types of assessments exist: Tutor Marked Assessment (TMA), Computer Marked Assessment (CMA) and Final Exam (Exam).

date – information about the final submission date of the assessment calculated as the number of days since the start of the module-presentation. The starting date of the presentation has number 0 (zero).

weight - weight of the assessment in %. Typically, Exams are treated separately and have the weight 100%; the sum of all other assessments is 100%. If the information about the final exam date is missing, it is at the end of the last presentation week.

vle.csv The csv file contains information about the available materials in the VLE. Typically these are html pages, pdf files, etc. Students have access to these materials online and their interactions with the materials are recorded. The vle.csv file contains the following columns:

id_site – an identification number of the material.

code_module – an identification code for module.

code_presentation - the identification code of presentation.

activity_type – the role associated with the module material.

week_from – the week from which the material is planned to be used.

week_to – week until which the material is planned to be used.

studentInfo.csv This file contains demographic information about the students together with their results. File contains the following columns:

code_module – an identification code for a module on which the student is registered.

code_presentation - the identification code of the presentation during which the student is registered on the module.

id_student – a unique identification number for the student.

gender – the student’s gender.

region – identifies the geographic region, where the student lived while taking the module-presentation.

highest_education – highest student education level on entry to the module presentation.

imd_band – specifies the Index of Multiple Depravation band of the place where the student lived during the module-presentation.

age_band – band of the student’s age.

num_of_prev_attempts – the number times the student has attempted this module.

studied_credits – the total number of credits for the modules the student is currently studying.

disability – indicates whether the student has declared a disability.

final_result – student’s final result in the module-presentation.

studentRegistration.csv This file contains information about the time when the student registered for the module presentation. For students who unregistered the date of unregistration is also recorded. File contains five columns:

code_module – an identification code for a module.

code_presentation - the identification code of the presentation.

id_student – a unique identification number for the student.

date_registration – the date of student’s registration on the module presentation, this is the number of days measured relative to the start of the module-presentation (e.g. the negative value -30 means that the student registered to module presentation 30 days before it started).

date_unr...
m
Data from: Dataset of Computer Science Course Queries from Students:...
data.mendeley.com
Updated Jan 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khandoker Ashik Uz Zaman (2024). Dataset of Computer Science Course Queries from Students: Categorized and Scored According to Bloom's Taxonomy [Dataset]. http://doi.org/10.17632/w5zt9n6vsc.1
Explore at:
Unique identifier
https://doi.org/10.17632/w5zt9n6vsc.1
Dataset updated
Jan 5, 2024
Authors
Khandoker Ashik Uz Zaman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset consists of 3 .csv files - 1. Data_Structure.csv 2. Introduction_to_Computers_and_Research.csv 3. Irrelevant_Questions.csv.

Each of the files consists of questions asked by students of Independent University, Bangladesh on the Summer 2023 Semester in Computer Science Courses.

The questions have been manually pre-processed and categorized according to their course and topics. The questions have also been scored using Bloom's taxonomy's six levels of questions [remember (5 points), understand (10 points), apply (15 points), analyze (20 points), evaluate (20 points), create (30 points).].

File-1 consists of the scored and categorized questions from the "Data Structure" course. File-2 consists of the scored and categorized questions from the "Introduction to Computers and Research" course. File-3 consists of the irrelevant questions which do not belong to the courses above but were asked by the students from those courses.
U.S. Education Datasets: Unification Project
kaggle.com
zip
Updated Apr 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roy Garrard (2020). U.S. Education Datasets: Unification Project [Dataset]. https://www.kaggle.com/noriuk/us-education-datasets-unification-project
Explore at:
zip(155201337 bytes)Available download formats
Dataset updated
Apr 13, 2020
Authors
Roy Garrard
Area covered
United States
Description
Author's Note 2019/04/20: Revisiting this project, I recently discovered the incredibly comprehensive API produced by the Urban Institute. It achieves all of the goals laid out for this dataset in wonderful detail. I recommend that users interested pay a visit to their site.

Context

This dataset is designed to bring together multiple facets of U.S. education data into one convenient CSV (states_all.csv).

Contents

states_all.csv: The primary data file. Contains aggregates from all state-level sources in one CSV.

output_files/states_all_extended.csv: The contents of states_all.csv with additional data related to race and gender.

Column Breakdown

Identification

PRIMARY_KEY: A combination of the year and state name.

YEAR

STATE

Enrollment

A breakdown of students enrolled in schools by school year.

GRADES_PK: Number of students in Pre-Kindergarten education.

GRADES_4: Number of students in fourth grade.

GRADES_8: Number of students in eighth grade.

GRADES_12: Number of students in twelfth grade.

GRADES_1_8: Number of students in the first through eighth grades.

GRADES 9_12: Number of students in the ninth through twelfth grades.

GRADES_ALL: The count of all students in the state. Comparable to ENROLL in the financial data (which is the U.S. Census Bureau's estimate for students in the state).

The extended version of states_all contains additional columns that breakdown enrollment by race and gender. For example:

G06_A_A: Total number of sixth grade students.

G06_AS_M: Number of sixth grade male students whose ethnicity was classified as "Asian".

G08_AS_A_READING: Average reading score of eighth grade students whose ethnicity was classified as "Asian".

The represented races include AM (American Indian or Alaska Native), AS (Asian), HI (Hispanic/Latino), BL (Black or African American), WH (White), HP (Hawaiian Native/Pacific Islander), and TR (Two or More Races). The represented genders include M (Male) and F (Female).

Financials

A breakdown of states by revenue and expenditure.

ENROLL: The U.S. Census Bureau's count for students in the state. Should be comparable to GRADES_ALL (which is the NCES's estimate for students in the state).

TOTAL REVENUE: The total amount of revenue for the state.

FEDERAL_REVENUE

STATE_REVENUE

LOCAL_REVENUE

TOTAL_EXPENDITURE: The total expenditure for the state.

INSTRUCTION_EXPENDITURE

SUPPORT_SERVICES_EXPENDITURE

CAPITAL_OUTLAY_EXPENDITURE

OTHER_EXPENDITURE

Academic Achievement

A breakdown of student performance as assessed by the corresponding exams (math and reading, grades 4 and 8).

AVG_MATH_4_SCORE: The state's average score for fourth graders taking the NAEP math exam.

AVG_MATH_8_SCORE: The state's average score for eight graders taking the NAEP math exam.

AVG_READING_4_SCORE: The state's average score for fourth graders taking the NAEP reading exam.

AVG_READING_8_SCORE: The state's average score for eighth graders taking the NAEP reading exam.

Data Processing

The original sources can be found here:

# Enrollment https://nces.ed.gov/ccd/stnfis.asp # Financials https://www.census.gov/programs-surveys/school-finances/data/tables.html # Academic Achievement https://www.nationsreportcard.gov/ndecore/xplore/NDE

Data was aggregated using a Python program I wrote. The code (as well as additional project information) can be found [here][1].

Methodology Notes

Spreadsheets for NCES enrollment data for 2014, 2011, 2010, and 2009 were modified to place key data on the same sheet, making scripting easier.

The column 'ENROLL' represents the U.S. Census Bureau data value (financial data), while the column 'GRADES_ALL' represents the NCES data value (demographic data). Though the two organizations correspond on this matter, these values (which are ostensibly the same) do vary. Their documentation chalks this up to differences in membership (i.e. what is and is not a fourth grade student).

Enrollment data from NCES has seen a number of changes across survey years. One of the more notable is that data on student gender does not appear to have been collected until 2009. The information in states_all_extended.csv reflects this.

NAEP test score data is only available for certain years

The current version of this data is concerned with state-level patterns. It is the author's hope that future versions will allow for school district-level granularity.

Acknowledgements

Data is sourced from the U.S. Census Bureau and the National Center for Education Statistics (NCES).

Licensing Notes

The licensing of these datasets state that it must not be us...
Z
Dataset for Paper "Towards Increased Diversity in STEM Education: Five...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2024). Dataset for Paper "Towards Increased Diversity in STEM Education: Five archetypes Derived through a Data-Driven Approach Examining a Computer Science Student Cohort [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_4737551
Explore at:
Dataset updated
Jul 17, 2024
Dataset provided by
Anonymous
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset for Paper "Towards Increased Diversity in STEM Education: Five archetypes Derived through a Data-Driven Approach Examining a Computer Science Student Cohort" - Rev #1

This is the dataset for the paper titled "Towards Increased Diversity in STEM Education: Five archetypes Derived through a Data-Driven Approach Examining a Computer Science Student Cohort".

In case of questions, feel free to contact the authors, anonymised, ORCID: https://orcid.org/*anonymised*, current affiliation and email: anonymised

Survey 2019

The raw survey data for the initial 2019 survey is available in the file survey2019_anon.csv. Note that the data is anonymised as free-text comments have been removed. Explanations on the variables and their levels are given in the files variables_survey2019.csv and values_survey2019.csv. The questionnaire for the 2019 survey is contained in survey2019_instrument.pdf.

Survey 2020

The raw survey data for the 2020 survey is available in the file rdata_anon_survey2020.csv. Additional scripts are supplied to reproduce the exploratory factor analysis. The main entry is the file EFA.R, which imports the data. The file contains some comments on the process. The questionnaire for the 2020 survey is contained in survey2020_instrument.pdf.

Interviews

The interview guide used for the five interviews is available in the file interview_instrument.pdf.

XAI-FUNGI: Dataset from the user study on comprehensibility of XAI...

zenodo.org

csv, pdf, zip

Updated Oct 15, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Szymon Bobek; Szymon Bobek; Paloma Korycińska; Paloma Korycińska; Monika Krakowska; Monika Krakowska; Maciej Mozolewski; Maciej Mozolewski; Dorota Rak; Dorota Rak; Magdalena Zych; Magdalena Zych; Magdalena Wójcik; Magdalena Wójcik; Grzegorz J. Nalepa; Grzegorz J. Nalepa (2024). XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms [Dataset]. http://doi.org/10.5281/zenodo.11448395

Explore at:

csv, zip, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.11448395

Dataset updated

Oct 15, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms

We present the dataset which was created during a user study on evaluation of explainability of artificial intelligence (AI) at the Jagielloninan University as a collaborative work of computer science (GEIST team) and information sciences research groups. The main goal of the research was to explore effective explanations of AI model patterns to diverse audiences.

The dataset contains material collected from 39 participants during the interviews conducted by the Information Sciences research group. The participants were recruited from 149 candidates to form three groups that represented domain experts in the field of mycology (DE), students with data science and visualization background (IT) and students from social sciences and humanities (SSH). Each group was given an explanation of a machine learning model trained to predict edible and non-edible mushrooms and asked to interpret the explanations and answer various questions during the interview. The machine learning model and explanations for its decision were prepared by the computer science research team.

The resulting dataset was constructed from the surveys obtained from the candidates, anonymized transcripts of the interviews, the results from thematic analysis, and original explanations with modifications suggested by the participants. The dataset is complemented with the source code allowing one to reproduce the initial machine leaning model and explanations.

The general structure of the dataset is described in the following table. The files that contain in their names [RR]_[SS]_[NN] contain the individual results obtained from particular participant. The meaning of the prefix is as follows:

RR - initials of the researcher conducting the interview,
SS - type of the participant (DE for domain expert, SSH for social sciences and humanities students, or IT for computer science students),
NN - number of the participant

File	Description
SURVEY.csv	The results from a survey that was filled by 149 participants out of which 39 were selected to form a final group of particiapnts.
CODEBOOK.csv	The codebook used in thematic analysis and MAXQDA coding
QUESTIONS.csv	List of questions that the participants were asked during interviews.
SLIDES.csv	List of slides used in the study with their interpretation and reference to MAXQDA themes and VISUAL_MODIFICATIONS tables.
MAXQDA_SUMMARY.csv	Summary of thematic analysis performed with codes used in CODEBOOK for each participant
PROBLEMS.csv	List of problems that participants were asked to solve during interviews. They correspond to three instances from the dataset that the participants had to classify using knowledge gained from explanations.
PROBLEMS_RESPONSES.csv	The responses to the problems for each participant to the problems listed in PROBLEMS.csv
VISUALIZATION_MODIFICATIONS.csv	Information on how the order of the slides was modified by the participant, which slides (explanations) were removed, and what kind of additional explanation was suggested.
ORIGINAL_VISUZALIZATIONS.pdf	The PDF file containing the visualization of explanations presented to the participants during the interviews
VISUALIZATION_MODIFICATIONS.zip	The PDF file containing the original slides from ORIGINAL_VISUZALIZATIONS.pdf with the modifications suggested by the participant. Each file is a PDF file named with the participant ID, i.e. [RR]_[SS]_[NN].pdf
TRANSCRIPTS.zip	The anonymized transcripts of interviews for each given participant, zipped into one archive. Each transcript is named after the particiapnt ID, i.e. [RR]_[SS]_[NN].csv and contains text tagged with slide number that it related to, question number from QUESTIONS.csv, and problem number from PROBLEMS.csv.

The detailed structure of the files presented in the previous Table is given in the Technical info section.

The source code used to train ML model and to generate explanations is available on Gitlab

US Dept of Education: College Scorecard
kaggle.com
zip
Updated Nov 9, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2017). US Dept of Education: College Scorecard [Dataset]. https://www.kaggle.com/forums/f/810/us-dept-of-education-college-scorecard
Explore at:
zip(589617678 bytes)Available download formats
Dataset updated
Nov 9, 2017
Dataset authored and provided by
Kaggle
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
It's no secret that US university students often graduate with debt repayment obligations that far outstrip their employment and income prospects. While it's understood that students from elite colleges tend to earn more than graduates from less prestigious universities, the finer relationships between future income and university attendance are quite murky. In an effort to make educational investments less speculative, the US Department of Education has matched information from the student financial aid system with federal tax returns to create the College Scorecard dataset.

Kaggle is hosting the College Scorecard dataset in order to facilitate shared learning and collaboration. Insights from this dataset can help make the returns on higher education more transparent and, in turn, more fair.

Data Description

Here's a script showing an exploratory overview of some of the data.

college-scorecard-release-*.zip contains a compressed version of the same data available through Kaggle Scripts.

It consists of three components:

All the raw data files released in version 1.40 of the college scorecard data

Scorecard.csv, a single CSV file with all the years data combined. In it, we've converted categorical variables represented by integer keys in the original data to their labels and added a Year column

database.sqlite, a SQLite database containing a single Scorecard table that contains the same information as Scorecard.csv

New to data exploration in R? Take the free, interactive DataCamp course, "Data Exploration With Kaggle Scripts," to learn the basics of visualizing data with ggplot. You'll also create your first Kaggle Scripts along the way.
Drop Project Student Plugin for IntelliJ IDEA - Evaluation Survey
zenodo.org
data.niaid.nih.gov
csv
Updated May 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bruno Pereira Cipriano; Bernardo Baltazar; Bernardo Baltazar; Pedro Alves; Nuno Fachada; Nuno Fachada; Bruno Pereira Cipriano; Pedro Alves (2024). Drop Project Student Plugin for IntelliJ IDEA - Evaluation Survey [Dataset]. http://doi.org/10.5281/zenodo.8432997
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8432997
Dataset updated
May 30, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bruno Pereira Cipriano; Bernardo Baltazar; Bernardo Baltazar; Pedro Alves; Nuno Fachada; Nuno Fachada; Bruno Pereira Cipriano; Pedro Alves
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Contains a CSV file with students replies to the survey used to evaluate the Drop Project Student plugin for IntelliJ IDEA.

To support international readers, the question names (CSV headers) were translated to English and/or match the numbering that appear in the paper. However, the student's textual replies to the open ended questions were left in their original language, Portuguese.
AP Computer Science A Exam Dataset
kaggle.com
zip
Updated Nov 13, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institute for Computing Education at Georgia Tech (2016). AP Computer Science A Exam Dataset [Dataset]. https://www.kaggle.com/iceatgt/ap-computer-science-a-exam-dataset
Explore at:
zip(10410 bytes)Available download formats
Dataset updated
Nov 13, 2016
Dataset authored and provided by
Institute for Computing Education at Georgia Tech
Description
Context

The datasets contain all the data for the number of CS AP A exam taken in each state from 1998 to 2013, and detailed data on pass rates, race, and gender from 2006-2013. The data was complied from the data available at http://research.collegeboard.org/programs/ap/data. This data was originally gathered by the CSTA board, but Barb Ericson of Georgia Tech keeps adding to it each year.

Content

historical.csv contains data for the number of CS AP A exam taken in each state from 1998 to 2013:

state: US states

1998-2013

Pop: population

pass_06_13.csv contains exam pass rates, race and gender data from 2006 to 2013 for selected states.

pass_12_13.csv contains exam pass rates, race and gender information for every state for 2012 and 2013.

Acknowledgements

The original datasets can be found here and here.

Inspiration

Using the datasets, can you examine the temporal trends in the exam pass rates by race, gender, and geographical location?
Data from: Automatic composition of descriptive music: A case study of the...
figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucía Martín-Gómez (2023). Automatic composition of descriptive music: A case study of the relationship between image and sound [Dataset]. http://doi.org/10.6084/m9.figshare.6682998.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6682998.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Lucía Martín-Gómez
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
FANTASIAThis repository contains the data related to image descriptors and sound associated with a selection of frames of the films Fantasia and Fantasia 2000 produced by DisneyAboutThis repository contains the data used in the article Automatic composition of descriptive music: A case study of the relationship between image and sound published in the 6th International Workshop on Computational Creativity, Concept Invention, and General Intelligence (C3GI). Data structure is explained in detail in the article. AbstractHuman beings establish relationships with the environment mainly through sight and hearing. This work focuses on the concept of descriptive music, which makes use of sound resources to narrate a story. The Fantasia film, produced by Walt Disney was used in the case study. One of its musical pieces is analyzed in order to obtain the relationship between image and music. This connection is subsequently used to create a descriptive musical composition from a new video. Naive Bayes, Support Vector Machine and Random Forest are the three classifiers studied for the model induction process. After an analysis of their performance, it was concluded that Random Forest provided the best solution; the produced musical composition had a considerably high descriptive quality. DataNutcracker_data.arff: Image descriptors and the most important sound of each frame from the fragment "The Nutcracker Suite" in film Fantasia. Data stored into ARFF format.Firebird_data.arff: Image descriptors of each frame from the fragment "The Firebird" in film Fantasia 2000. Data stored into ARFF format.Firebird_midi_prediction.csv: Frame number of the fragment "The Firebird" in film Fantasia 2000 and the sound predicted by the system encoded in MIDI. Data stored into CSV format.Firebird_prediction.mp3: Audio file with the synthesizing of the prediction data for the fragment "The Firebird" of film Fantasia 2000.LicenseData is available under MIT License. To make use of the data the article must be cited.
Annotated Benchmark of Real-World Data for Approximate Functional Dependency...
zenodo.org
data.niaid.nih.gov
csv
Updated Jul 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren (2023). Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery [Dataset]. http://doi.org/10.5281/zenodo.8098909
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8098909
Dataset updated
Jul 1, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery

This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport.

The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs.

The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small.

Dataset References

adult.csv: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.

claims.csv: TSA Claims Data 2002 to 2006, published by the U.S. Department of Homeland Security.

dblp10k.csv: Frequency-aware Similarity Measures. Lange, Dustin; Naumann, Felix (2011). 243–248. Made available as DBLP Dataset 2.

hospital.csv: Hospital dataset used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper.

t_biocase_... files: t_bioc_... files used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper.

tax.csv: Tax dataset used in Johann Birnick, Thomas Bläsius, Tobias Friedrich, Felix Naumann, Thorsten Papenbrock, and Martin Schirneck. 2020. Hitting set enumeration with partial information for unique column combination discovery. Proc. VLDB Endow. 13, 12 (August 2020), 2270–2283. https://doi.org/10.14778/3407790.3407824. Made available as part the dataset collection to that paper.
Data from: 2024 dataset on independent researchers collected from OpenAlex
zenodo.org
repository.uantwerpen.be
+1more
csv, tsv
Updated Apr 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eline Vandewalle; Eline Vandewalle; Camilla Hertil Lindelöw; Camilla Hertil Lindelöw (2024). 2024 dataset on independent researchers collected from OpenAlex [Dataset]. http://doi.org/10.5281/zenodo.10925112
Explore at:
csv, tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10925112
Dataset updated
Apr 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Eline Vandewalle; Eline Vandewalle; Camilla Hertil Lindelöw; Camilla Hertil Lindelöw
License
https://creativecommons.org/public-domainhttps://creativecommons.org/public-domain
Description
This dataset belongs to a paper about independent researchers submitted for the STI conference 2024 (https://sti2024.org/). It consists of several files described below. The data is from OpenAlex, collected through the InSySPo instance of the february snapshot of OpenAlex, hosted on Google Cloud. Since Topics are a new feature of OpenAlex data and therefore not part of the snapshot, this data as well as some other data not available at the InSySPo instance at the time of collection have been collected through the OpenAlex API, and incorporated in the files. Data from Scopus and Web of Science may be retrieved by using the search string in the appendix of the article.

Files all domains

240307_open_alex_works.tsv

contains all works retrieved with the search string for Independent researchers in OpenAlex in the article's appendix.

Files Social Sciences and/or Arts & Humanities

240312_open_alex_works_soc_sci_arts_2010.tsv

contains articles by Independent researchers in Social Sciences and Humanities published from 2010 and retrieved from OpenAlex.

240312_open_alex_authors_soc_sci_arts_2010.tsv

contains authors who are Independent researchers in Social Sciences and Humanities published from 2010 and retrieved from OpenAlex.

240313_open_alex_authors_all_works_soc_sci_arts_2010.tsv

contains all works by Independent researchers in Social Sciences and Humanities published from 2010 and retrieved from OpenAlex. All works mean that the researcher has at least once indicated independent status in the affiliation, and the author's other works are also included.

author_distribution_domain1.csv

contains number of works per number of authors in the domain Social Sciences (includes Arts & Humanities).

author_distribution_field33.csv

contains number of works per number of authors in the field Social Sciences.

author_distribution_field12.csv

contains number of works per number of authors in the field Arts & Humanities.

all_ssh_oa.csv

contains data for analyzing open access patterns for the domain Social Sciences (includes Arts & Humanities).
p
1. data all field studies CSV.csv
psycharchives.org
Updated Aug 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). 1. data all field studies CSV.csv [Dataset]. https://psycharchives.org/en/item/5bb80531-2812-4a0a-9b75-b396c8543d34
Explore at:
Dataset updated
Aug 5, 2022
License
https://doi.org/10.23668/psycharchives.4988https://doi.org/10.23668/psycharchives.4988
Description
Citizen Science (CS) projects play a crucial role in engaging citizens in conservation efforts. While implicitly mostly considered as an outcome of CS participation, citizens may also have a certain attitude toward engagement in CS when starting to participate in a CS project. Moreover, there is a lack of CS studies that consider changes over longer periods of time. Therefore, this research presents two-wave data from four field studies of a CS project about urban wildlife ecology using cross-lagged panel analyses. We investigated the influence of attitudes toward engagement in CS on self-related, ecology-related, and motivation-related outcomes. We found that positive attitudes toward engagement in CS at the beginning of the CS project had positive influences on participants’ psychological ownership and pride in their participation, their attitudes toward and enthusiasm about wildlife, and their internal and external motivation two months later. We discuss the implications for CS research and practice. Dataset for: Greving, H., Bruckermann, T., Schumann, A., Stillfried, M., Börner, K., Hagen, R., Kimmig, S. E., Brandt, M., & Kimmerle, J. (2023). Attitudes Toward Engagement in Citizen Science Increase Self-Related, Ecology-Related, and Motivation-Related Outcomes in an Urban Wildlife Project. BioScience, 73(3), 206–219. https://doi.org/10.1093/biosci/biad003: Data (CSV format) collected for all field studies
Dataset for algorithmic thinking skills assessment: Results from the virtual...
zenodo.org
data.niaid.nih.gov
csv
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giorgia Adorni; Giorgia Adorni (2025). Dataset for algorithmic thinking skills assessment: Results from the virtual CAT large-scale study in Swiss compulsory education [Dataset]. http://doi.org/10.5281/zenodo.10912340
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10912340
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Giorgia Adorni; Giorgia Adorni
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 3, 2024
Area covered
Switzerland
Description
Overview
This dataset was collected during a main study that evaluated the virtual Cross Array Task (CAT) platform as an assessment tool for algorithmic thinking (AT) skills among K-12 students in Swiss compulsory education.
As algorithmic thinking becomes increasingly vital in our digital age, this study bridges the gap between traditional assessments and the needs of today's learners by introducing a digital platform. The virtual CAT, a digital adaptation of an unplugged assessment activity, offers scalable, automated assessments with reduced human intervention.

Study Context, Location and Participants
To comprehensively investigate algorithmic competencies within compulsory education, exploring their variations and determining the factors influencing them, in Spring 2023 we conducted an experimental study with the virtual CAT's.
The sample comprises 129 students (65 girls and 64 boys), selected from nine classes across five public schools in Ticino and Solothurn cantons.

Data Collection
During the data collection process, session and participant details were manually recorded by the administrator.
Each session has been assigned a unique identifier, and specific details, such as the date, canton, school name and type, and the students’ HarmoS grade (HG) level, have been recorded.
Student information are limited to sex and date of birth, with birth dates used to calculate ages, a significant factor in our demographic analysis.
To protect student privacy, unique identifiers have been assigned to each participant, keeping the data anonymous and secure.
The assessment tool automatically tracked all user interaction within the platform.
All data collected have been pseudonymised, aligning with prevailing open science practices in Switzerland (SNSF, 2021).
Data collection was integrated into a validation module of the app.

Data Features
The dataset comprises the following files:

STUDENTS_SESSIONS.csv

RESULTS.csv

LOGS.csv

CANTONS.csv

ALGORITHMS.csv

These files collectively provide insights into the algorithmic actions of the students, demographic details, session logs, results, and more.

Usage & Ethics
In the spirit of open science, this dataset is made available to the public after meticulous anonymisation to ensure all participants' privacy and ethical treatment.
Initial authorisations were secured from school administrators, teachers, and parents.
Detailed communication regarding the study's nature, data handling, and objectives was transparently shared with all stakeholders.

REFERENCES

[1] A. Piatti, G. Adorni, L. El-Hamamsy, L. Negrini, D. Assaf, L. Gambardella & F. Mondada. (2022). The CT-cube: A framework for the design and the assessment of computational thinking activities. Computers in Human Behavior Reports, 5, 100166. https://doi.org/10.1016/j.chbr.2021.100166

[2] Adorni, G., & Piatti, S., & Karpenko, V. (2023). virtual CAT: An app for algorithmic thinking assessment within Swiss compulsory education. Zenodo Software. https://doi.org/10.5281/zenodo.10027851 On GitHub: https://github.com/GiorgiaAuroraAdorni/virtual-CAT-app/

[3] Adorni, G., & Karpenko, V. (2023). virtual CAT programming language interpreter. Zenodo Software. https://doi.org/10.5281/zenodo.10016535 On GitHub: https://github.com/GiorgiaAuroraAdorni/virtual-CAT-programming-language-interpreter/

[4] Adorni, G., & Karpenko, V. (2023). virtual CAT data infrastructure. Zenodo Software. https://doi.org/10.5281/zenodo.10015011 On GitHub: https://github.com/GiorgiaAuroraAdorni/virtual-CAT-data-infrastructure
Z
Dataset on the Human Body as a Signal Propagation Medium
data.niaid.nih.gov
zenodo.org
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. Ormanis; V. Medvedevs; V. Aristovs; V. Abolins; A. Sevcenko; A. Elsts (2024). Dataset on the Human Body as a Signal Propagation Medium [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8214496
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Institute of Electronics and Computer Science
Authors
J. Ormanis; V. Medvedevs; V. Aristovs; V. Abolins; A. Sevcenko; A. Elsts
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview: This is a large-scale dataset with impedance and signal loss data recorded on volunteer test subjects using low-voltage alternate current sine-shaped signals. The signal frequencies are from 50 kHz to 20 MHz.

Applications: The intention of this dataset is to allow to investigate the human body as a signal propagation medium, and capture information related to how the properties of the human body (age, sex, composition etc.), the measurement locations, and the signal frequencies impact the signal loss over the human body.

Overview statistics:

Number of subjects: 30

Number of transmitter locations: 6

Number of receiver locations: 6

Number of measurement frequencies: 19

Input voltage: 1 V

Load resistance: 50 ohm and 1 megaohm

Measurement group statistics:

Height: 174.10 (7.15)

Weight: 72.85 (16.26)

BMI: 23.94 (4.70)

Body fat %: 21.53 (7.55)

Age group: 29.00 (11.25)

Male/female ratio: 50%

Included files:

experiment_protocol_description.docx - protocol used in the experiments

electrode_placement_schematic.png - schematic of placement locations

electrode_placement_photo.jpg - visualization on the experiment, on a volunteer subject

RawData - the full measurement results and experiment info sheets

all_measurements.csv - the most important results extracted to .csv

all_measurements_filtered.csv - same, but after z-score filtering

all_measurements_by_freq.csv - the most important results extracted to .csv, single frequency per row

all_measurements_by_freq_filtered.csv - same, but after z-score filtering

summary_of_subjects.csv - key statistics on the subjects from the experiment info sheets

process_json_files.py - script that creates .csv from the raw data

filter_results.py - outlier removal based on z-score

plot_sample_curves.py - visualization of a randomly selected measurement result subset

plot_measurement_group.py - visualization of the measurement group

CSV file columns:

subject_id - participant's random unique ID

experiment_id - measurement session's number for the participant

height - participant's height, cm

weight - participant's weight, kg

BMI - body mass index, computed from the valued above

body_fat_% - body fat composition, as measured by bioimpedance scales

age_group - age rounded to 10 years, e.g. 20, 30, 40 etc.

male - 1 if male, 0 if female

tx_point - transmitter point number

rx_point - receiver point number

distance - distance, in relative units, between the tx and rx points. Not scaled in terms of participant's height and limb lengths!

tx_point_fat_level - transmitter point location's average fat content metric. Not scaled for each participant individually.

rx_point_fat_level - receiver point location's average fat content metric. Not scaled for each participant individually.

total_fat_level - sum of rx and tx fat levels

bias - constant term to simplify data analytics, always equal to 1.0

CSV file columns, frequency-specific:

tx_abs_Z_... - transmitter-side impedance, as computed by the process_json_files.py script from the voltage drop

rx_gain_50_f_... - experimentally measured gain on the receiver, in dB, using 50 ohm load impedance

rx_gain_1M_f_... - experimentally measured gain on the receiver, in dB, using 1 megaohm load impedance

Acknowledgments: The dataset collection was funded by the Latvian Council of Science, project “Body-Coupled Communication for Body Area Networks”, project No. lzp-2020/1-0358.

References: For a more detailed information, see this article: J. Ormanis, V. Medvedevs, A. Sevcenko, V. Aristovs, V. Abolins, and A. Elsts. Dataset on the Human Body as a Signal Propagation Medium for Body Coupled Communication. Submitted to Elsevier Data in Brief, 2023.

Contact information: info@edi.lv
d
Replication Data for: kluster: An Efficient Scalable Procedure for...
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Estiri, Hossein (2023). Replication Data for: kluster: An Efficient Scalable Procedure for Approximating the Number of Clusters in Unsupervised Learning [Dataset]. http://doi.org/10.7910/DVN/LLIOHM
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/LLIOHM
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Estiri, Hossein
Description
182 simulated datasets (first set contains small datasets and second set contains large datasets) with different cluster compositions – i.e., different number clusters and separation values – generated using clusterGeneration package in R. Each set of simulation datasets consists of 91 datasets in comma separated values (csv) format (total of 182 csv files) with 3-15 clusters and 0.1 to 0.7 separation values. Separation values can range between (−0.999, 0.999), where a higher separation value indicates cluster structure with more separable clusters. Size of the dataset, number of clusters, and separation value of the clusters in the dataset is printed in file name. size_X_n_Y_sepval_Z.csv: Size of the dataset = X number of clusters in the dataset = Y separation value of the clusters in the dataset = Z
Logs and Mined Sequential Patterns of Programming Processes from...
figshare.com
txt
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minji Kong; Lori Pollock (2023). Logs and Mined Sequential Patterns of Programming Processes from "Semi-Automatically Mining Students' Common Scratch Programming Behaviors" [Dataset]. http://doi.org/10.6084/m9.figshare.12100797.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12100797.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Minji Kong; Lori Pollock
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a ProgSnap2-based dataset containing anonymized logs of over 34,000 programming events exhibited by 81 programming students in Scratch, a visual programming environment, during our designed study as described in the paper "Semi-Automatically Mining Students' Common Scratch Programming Behaviors." We also include a list of approx. 3100 mined sequential patterns of programming processes that are performed by at least 10% of the 62 of the 81 students who are novice programmers, and represent maximal patterns generated by the MG-FSM algorithm while allowing a gap of one programming event. README.txt — overview of the dataset and its propertiesmainTable.csv — main event table of the dataset holding rows of programming eventscodeState.csv — table holding XML representations of code snapshots at the time of each programming eventdatasetMetadata.csv — describes features of the datasetScratch-SeqPatterns.txt — list of sequential patterns mined from the Main Event Table
m
Data from: Student grade prediction dataset
data.mendeley.com
Updated Jun 16, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nonso Nnamoko (2022). Student grade prediction dataset [Dataset]. http://doi.org/10.17632/wf8568hxb7.1
Explore at:
Unique identifier
https://doi.org/10.17632/wf8568hxb7.1
Dataset updated
Jun 16, 2022
Authors
Nonso Nnamoko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset provides a collection of 160 instances belonging to two classes (pass' = 136 andfail' = 24). The data is an anonymised, statistically sound and reliable representation of the original data collected from students studying computer science modules at a UK University. Each instance is made up of 19 features plus the class label. Eight of the features represent students' online behaviour including bio information retrieved from Virtual Learning Environment. Eleven of the features represent students' neighbourhood influence retrieved from Office for Students database. The data has been compiled and made available in de-facto/de-jure standard open formats (CSV and JSON).

This data was collected and used in a research study undertaken by academics and researchers at Computer Science Department, Edge Hill University, United Kingdom. To encourage reproducibility of the experiments and results reported, the data is provided in the exact training-validation-testing splits used in the experiments.
p
4. codebook all field studies CSV.csv
psycharchives.org
Updated Aug 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). 4. codebook all field studies CSV.csv [Dataset]. https://psycharchives.org/en/item/5bb80531-2812-4a0a-9b75-b396c8543d34
Explore at:
Dataset updated
Aug 5, 2022
License
https://doi.org/10.23668/psycharchives.4988https://doi.org/10.23668/psycharchives.4988
Description
Citizen Science (CS) projects play a crucial role in engaging citizens in conservation efforts. While implicitly mostly considered as an outcome of CS participation, citizens may also have a certain attitude toward engagement in CS when starting to participate in a CS project. Moreover, there is a lack of CS studies that consider changes over longer periods of time. Therefore, this research presents two-wave data from four field studies of a CS project about urban wildlife ecology using cross-lagged panel analyses. We investigated the influence of attitudes toward engagement in CS on self-related, ecology-related, and motivation-related outcomes. We found that positive attitudes toward engagement in CS at the beginning of the CS project had positive influences on participants’ psychological ownership and pride in their participation, their attitudes toward and enthusiasm about wildlife, and their internal and external motivation two months later. We discuss the implications for CS research and practice. Dataset for: Greving, H., Bruckermann, T., Schumann, A., Stillfried, M., Börner, K., Hagen, R., Kimmig, S. E., Brandt, M., & Kimmerle, J. (2023). Attitudes Toward Engagement in Citizen Science Increase Self-Related, Ecology-Related, and Motivation-Related Outcomes in an Urban Wildlife Project. BioScience, 73(3), 206–219. https://doi.org/10.1093/biosci/biad003: Codebook (CSV format) of the variables of all field studies
t
Trusted Research Environments: Analysis of Characteristics and Data...
researchdata.tuwien.ac.at
bin, csv
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Weise; Martin Weise; Andreas Rauber; Andreas Rauber (2024). Trusted Research Environments: Analysis of Characteristics and Data Availability [Dataset]. http://doi.org/10.48436/cv20m-sg117
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.48436/cv20m-sg117
Dataset updated
Jun 25, 2024
Dataset provided by
TU Wien
Authors
Martin Weise; Martin Weise; Andreas Rauber; Andreas Rauber
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study.
Methodology
We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material:
Peer-reviewed articles where available,
TRE websites,
TRE metadata catalogs.
The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months.
Technical details
This dataset consists of five comma-separated values (.csv) files describing our inventory:
countries.csv: Table of countries with columns id (number), name (text) and code (text, in ISO 3166-A3 encoding, optional)
tres.csv: Table of TREs with columns id (number), name (text), countryid (number, refering to column id of table countries), structureddata (bool, optional), datalevel (one of [1=de-identified, 2=pseudonomized, 3=anonymized], optional), outputcontrol (bool, optional), inceptionyear (date, optional), records (number, optional), datatype (one of [1=claims, 2=linked records]), optional), statistics_office (bool), size (number, optional), source (text, optional), comment (text, optional)
access.csv: Table of access modes of TREs with columns id (number), suf (bool, optional), physical_visit (bool, optional), external_physical_visit (bool, optional), remote_visit (bool, optional)
inclusion.csv: Table of included TREs into the literature study with columns id (number), included (bool), exclusion reason (one of [peer review, environment, duplicate], optional), comment (text, optional)
major_fields.csv: Table of data categorization into the major research fields with columns id (number), life_sciences (bool, optional), physical_sciences (bool, optional), arts_and_humanities (bool, optional), social_sciences (bool, optional).
Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases:
schema.sql: Schema definition file to create the tables and views used in the analysis.
The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb
[Dataset] Does Volunteer Engagement Pay Off? An Analysis of User...
zenodo.org
recerca.uoc.edu
+3more
zip
Updated Nov 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Krukowski; Simon Krukowski; Ishari Amarasinghe; Ishari Amarasinghe; Nicolás Felipe Gutiérrez-Páez; Nicolás Felipe Gutiérrez-Páez; H. Ulrich Hoppe; H. Ulrich Hoppe (2022). [Dataset] Does Volunteer Engagement Pay Off? An Analysis of User Participation in Online Citizen Science Projects [Dataset]. http://doi.org/10.5281/zenodo.7357747
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7357747
Dataset updated
Nov 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Simon Krukowski; Simon Krukowski; Ishari Amarasinghe; Ishari Amarasinghe; Nicolás Felipe Gutiérrez-Páez; Nicolás Felipe Gutiérrez-Páez; H. Ulrich Hoppe; H. Ulrich Hoppe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Explanation/Overview:

Corresponding dataset for the analyses and results achieved in the CS Track project in the research line on participation analyses, which is also reported in the publication "Does Volunteer Engagement Pay Off? An Analysis of User Participation in Online Citizen Science Projects", a conference paper for the conference CollabTech 2022: Collaboration Technologies and Social Computing and published as part of the Lecture Notes in Computer Science book series (LNCS,volume 13632) here. The usernames have been anonymised.

Purpose:

The purpose of this dataset is to provide the basis to reproduce the results reported in the associated deliverable, and in the above-mentioned publication. As such, it does not represent raw data, but rather files that already include certain analysis steps (like calculated degrees or other SNA-related measures), ready for analysis, visualisation and interpretation with R.

Relatedness:

The data of the different projects was derived from the forums of 7 Zooniverse projects based on similar discussion board features. The projects are: 'Galaxy Zoo', 'Gravity Spy', 'Seabirdwatch', 'Snapshot Wisconsin', 'Wildwatch Kenya', 'Galaxy Nurseries', 'Penguin Watch'.

Content:

In this Zenodo entry, several files can be found. The structure is as follows (files and folders and descriptions).

corresponding_calculations.html

Quarto-notebook to view in browser

corresponding_calculations.qmd

Quarto-notebook to view in RStudio

assets

data

annotations

annotations.csv

List of annotations made per day for each of the analysed projects

comments

comments.csv

Total list of comments with several data fields (i.e., comment id, text, reply_user_id)

rolechanges

478_rolechanges.csv

List of roles per user to determine number of role changes

1104_rolechanges.csv

...

...

totalnetworkdata

Edges

478_edges.csv

Network data (edge set) for the given projects (without time slices)

1104_edges.csv

...

...

Nodes

478_nodes.csv

Network data (node set) for the given projects (without time slices)

1104_nodes.csv

...

...

trajectories

Network data (edge and node sets) for the given projects and all time slices (Q1 2016 - Q4 2021)

478

Edges

edges_4782016_q1.csv

edges_4782016_q2.csv

edges_4782016_q3.csv

edges_4782016_q4.csv

...

Nodes

nodes_4782016_q1.csv

nodes_4782016_q4.csv

nodes_4782016_q3.csv

nodes_4782016_q2.csv

...

1104

Edges

...

Nodes

...

...

scripts

datavizfuncs.R

script for the data visualisation functions, automatically executed from within corresponding_calculations.qmd

import.R

script for the import of data, automatically executed from within corresponding_calculations.qmd

corresponding_calculations_files

files for the html/qmd view in the browser/RStudio

Grouping:

The data is grouped according to given criteria (e.g., project_title or time). Accordingly, the respective files can be found in the data structure

Facebook

Twitter

Click to copy link

Link copied

Cite

mexwell (2024). 👨‍🎓 Open University Learning Analytics [Dataset]. https://www.kaggle.com/datasets/mexwell/open-university-learning-analytics

👨‍🎓 Open University Learning Analytics

Anonymised Open University Learning Analytics Dataset (OULAD)

Explore at:

zip(44198573 bytes)Available download formats

Dataset updated

Mar 5, 2024

Authors

mexwell

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset introduces the anonymised Open University Learning Analytics Dataset (OULAD). It contains data about courses, students and their interactions with Virtual Learning Environment (VLE) for seven selected courses (called modules). Presentations of courses start in February and October - they are marked by “B” and “J” respectively. The dataset consists of tables connected using unique identifiers. All tables are stored in the csv format.

Database schema

https://analyse.kmi.open.ac.uk/resources/images/model.png" alt="">

courses.csv File contains the list of all available modules and their presentations. The columns are: - code_module – code name of the module, which serves as the identifier. - code_presentation – code name of the presentation. It consists of the year and “B” for the presentation starting in February and “J” for the presentation starting in October. - length - length of the module-presentation in days.

The structure of B and J presentations may differ and therefore it is good practice to analyse the B and J presentations separately. Nevertheless, for some presentations the corresponding previous B/J presentation do not exist and therefore the J presentation must be used to inform the B presentation or vice versa. In the dataset this is the case of CCC, EEE and GGG modules.

assessments.csv This file contains information about assessments in module-presentations. Usually, every presentation has a number of assessments followed by the final exam. CSV contains columns:

code_module – identification code of the module, to which the assessment belongs.
code_presentation - identification code of the presentation, to which the assessment belongs.
id_assessment – identification number of the assessment.
assessment_type – type of assessment. Three types of assessments exist: Tutor Marked Assessment (TMA), Computer Marked Assessment (CMA) and Final Exam (Exam).
date – information about the final submission date of the assessment calculated as the number of days since the start of the module-presentation. The starting date of the presentation has number 0 (zero).
weight - weight of the assessment in %. Typically, Exams are treated separately and have the weight 100%; the sum of all other assessments is 100%. If the information about the final exam date is missing, it is at the end of the last presentation week.

vle.csv The csv file contains information about the available materials in the VLE. Typically these are html pages, pdf files, etc. Students have access to these materials online and their interactions with the materials are recorded. The vle.csv file contains the following columns:

id_site – an identification number of the material.
code_module – an identification code for module.
code_presentation - the identification code of presentation.
activity_type – the role associated with the module material.
week_from – the week from which the material is planned to be used.
week_to – week until which the material is planned to be used.

studentInfo.csv This file contains demographic information about the students together with their results. File contains the following columns:

code_module – an identification code for a module on which the student is registered.
code_presentation - the identification code of the presentation during which the student is registered on the module.
id_student – a unique identification number for the student.
gender – the student’s gender.
region – identifies the geographic region, where the student lived while taking the module-presentation.
highest_education – highest student education level on entry to the module presentation.
imd_band – specifies the Index of Multiple Depravation band of the place where the student lived during the module-presentation.
age_band – band of the student’s age.
num_of_prev_attempts – the number times the student has attempted this module.
studied_credits – the total number of credits for the modules the student is currently studying.
disability – indicates whether the student has declared a disability.
final_result – student’s final result in the module-presentation.

studentRegistration.csv This file contains information about the time when the student registered for the module presentation. For students who unregistered the date of unregistration is also recorded. File contains five columns:

code_module – an identification code for a module.
code_presentation - the identification code of the presentation.
id_student – a unique identification number for the student.
date_registration – the date of student’s registration on the module presentation, this is the number of days measured relative to the start of the module-presentation (e.g. the negative value -30 means that the student registered to module presentation 30 days before it started).
date_unr...

Clear search

Close search

Google apps

Main menu

👨‍🎓 Open University Learning Analytics

Database schema

Data from: Dataset of Computer Science Course Queries from Students:...

U.S. Education Datasets: Unification Project

Context

Contents

Column Breakdown

Identification

Enrollment

Financials

Academic Achievement

Data Processing

Methodology Notes

Acknowledgements

Licensing Notes

Dataset for Paper "Towards Increased Diversity in STEM Education: Five...

Dataset for Paper "Towards Increased Diversity in STEM Education: Five archetypes Derived through a Data-Driven Approach Examining a Computer Science Student Cohort" - Rev #1

Survey 2019

Survey 2020

Interviews

XAI-FUNGI: Dataset from the user study on comprehensibility of XAI...

XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms

US Dept of Education: College Scorecard

Data Description

Drop Project Student Plugin for IntelliJ IDEA - Evaluation Survey

AP Computer Science A Exam Dataset

Context

Content

Acknowledgements

Inspiration

Data from: Automatic composition of descriptive music: A case study of the...

Annotated Benchmark of Real-World Data for Approximate Functional Dependency...

Data from: 2024 dataset on independent researchers collected from OpenAlex

1. data all field studies CSV.csv

Dataset for algorithmic thinking skills assessment: Results from the virtual...

Dataset on the Human Body as a Signal Propagation Medium

Replication Data for: kluster: An Efficient Scalable Procedure for...

Logs and Mined Sequential Patterns of Programming Processes from...

Data from: Student grade prediction dataset

4. codebook all field studies CSV.csv

Trusted Research Environments: Analysis of Characteristics and Data...

Methodology

Technical details

[Dataset] Does Volunteer Engagement Pay Off? An Analysis of User...

👨‍🎓 Open University Learning Analytics

Anonymised Open University Learning Analytics Dataset (OULAD)

Database schema