Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset introduces the anonymised Open University Learning Analytics Dataset (OULAD). It contains data about courses, students and their interactions with Virtual Learning Environment (VLE) for seven selected courses (called modules). Presentations of courses start in February and October - they are marked by “B” and “J” respectively. The dataset consists of tables connected using unique identifiers. All tables are stored in the csv format.
https://analyse.kmi.open.ac.uk/resources/images/model.png" alt="">
courses.csv File contains the list of all available modules and their presentations. The columns are: - code_module – code name of the module, which serves as the identifier. - code_presentation – code name of the presentation. It consists of the year and “B” for the presentation starting in February and “J” for the presentation starting in October. - length - length of the module-presentation in days.
The structure of B and J presentations may differ and therefore it is good practice to analyse the B and J presentations separately. Nevertheless, for some presentations the corresponding previous B/J presentation do not exist and therefore the J presentation must be used to inform the B presentation or vice versa. In the dataset this is the case of CCC, EEE and GGG modules.
assessments.csv This file contains information about assessments in module-presentations. Usually, every presentation has a number of assessments followed by the final exam. CSV contains columns:
vle.csv The csv file contains information about the available materials in the VLE. Typically these are html pages, pdf files, etc. Students have access to these materials online and their interactions with the materials are recorded. The vle.csv file contains the following columns:
studentInfo.csv This file contains demographic information about the students together with their results. File contains the following columns:
studentRegistration.csv This file contains information about the time when the student registered for the module presentation. For students who unregistered the date of unregistration is also recorded. File contains five columns:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of 3 .csv files - 1. Data_Structure.csv 2. Introduction_to_Computers_and_Research.csv 3. Irrelevant_Questions.csv.
Each of the files consists of questions asked by students of Independent University, Bangladesh on the Summer 2023 Semester in Computer Science Courses.
The questions have been manually pre-processed and categorized according to their course and topics. The questions have also been scored using Bloom's taxonomy's six levels of questions [remember (5 points), understand (10 points), apply (15 points), analyze (20 points), evaluate (20 points), create (30 points).].
File-1 consists of the scored and categorized questions from the "Data Structure" course. File-2 consists of the scored and categorized questions from the "Introduction to Computers and Research" course. File-3 consists of the irrelevant questions which do not belong to the courses above but were asked by the students from those courses.
Facebook
TwitterAuthor's Note 2019/04/20: Revisiting this project, I recently discovered the incredibly comprehensive API produced by the Urban Institute. It achieves all of the goals laid out for this dataset in wonderful detail. I recommend that users interested pay a visit to their site.
This dataset is designed to bring together multiple facets of U.S. education data into one convenient CSV (states_all.csv).
states_all.csv:
The primary data file. Contains aggregates from all state-level sources in one CSV.
output_files/states_all_extended.csv:
The contents of states_all.csv with additional data related to race and gender.
PRIMARY_KEY: A combination of the year and state name.YEARSTATEA breakdown of students enrolled in schools by school year.
GRADES_PK: Number of students in Pre-Kindergarten education.
GRADES_4: Number of students in fourth grade.
GRADES_8: Number of students in eighth grade.
GRADES_12: Number of students in twelfth grade.
GRADES_1_8: Number of students in the first through eighth grades.
GRADES 9_12: Number of students in the ninth through twelfth grades.
GRADES_ALL: The count of all students in the state. Comparable to ENROLL in the financial data (which is the U.S.
Census Bureau's estimate for students in the state).
The extended version of states_all contains additional columns that breakdown enrollment by race and gender. For example:
G06_A_A: Total number of sixth grade students.
G06_AS_M: Number of sixth grade male students whose ethnicity was classified as "Asian".
G08_AS_A_READING: Average reading score of eighth grade students whose ethnicity was classified as "Asian".
The represented races include AM (American Indian or Alaska Native), AS (Asian), HI (Hispanic/Latino), BL (Black or African American), WH (White), HP (Hawaiian Native/Pacific Islander), and TR (Two or More Races). The represented genders include M (Male) and F (Female).
A breakdown of states by revenue and expenditure.
ENROLL: The U.S. Census Bureau's count for students in the state. Should be comparable to GRADES_ALL (which is the
NCES's estimate for students in the state).
TOTAL REVENUE: The total amount of revenue for the state.
FEDERAL_REVENUESTATE_REVENUELOCAL_REVENUETOTAL_EXPENDITURE: The total expenditure for the state.
INSTRUCTION_EXPENDITURESUPPORT_SERVICES_EXPENDITURE
CAPITAL_OUTLAY_EXPENDITURE
OTHER_EXPENDITURE
A breakdown of student performance as assessed by the corresponding exams (math and reading, grades 4 and 8).
AVG_MATH_4_SCORE: The state's average score for fourth graders taking the NAEP math exam.
AVG_MATH_8_SCORE: The state's average score for eight graders taking the NAEP math exam.
AVG_READING_4_SCORE: The state's average score for fourth graders taking the NAEP reading exam.
AVG_READING_8_SCORE: The state's average score for eighth graders taking the NAEP reading exam.
The original sources can be found here:
# Enrollment https://nces.ed.gov/ccd/stnfis.asp # Financials https://www.census.gov/programs-surveys/school-finances/data/tables.html # Academic Achievement https://www.nationsreportcard.gov/ndecore/xplore/NDE
Data was aggregated using a Python program I wrote. The code (as well as additional project information) can be found [here][1].
Spreadsheets for NCES enrollment data for 2014, 2011, 2010, and 2009 were modified to place key data on the same sheet, making scripting easier.
The column 'ENROLL' represents the U.S. Census Bureau data value (financial data), while the column 'GRADES_ALL' represents the NCES data value (demographic data). Though the two organizations correspond on this matter, these values (which are ostensibly the same) do vary. Their documentation chalks this up to differences in membership (i.e. what is and is not a fourth grade student).
Enrollment data from NCES has seen a number of changes across survey years. One of the more notable is that data on student gender does not appear to have been collected until 2009. The information in states_all_extended.csv reflects this.
NAEP test score data is only available for certain years
The current version of this data is concerned with state-level patterns. It is the author's hope that future versions will allow for school district-level granularity.
Data is sourced from the U.S. Census Bureau and the National Center for Education Statistics (NCES).
The licensing of these datasets state that it must not be us...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset for the paper titled "Towards Increased Diversity in STEM Education: Five archetypes Derived through a Data-Driven Approach Examining a Computer Science Student Cohort".
In case of questions, feel free to contact the authors, anonymised, ORCID: https://orcid.org/*anonymised*, current affiliation and email: anonymised
The raw survey data for the initial 2019 survey is available in the file survey2019_anon.csv. Note that the data is anonymised as free-text comments have been removed. Explanations on the variables and their levels are given in the files variables_survey2019.csv and values_survey2019.csv. The questionnaire for the 2019 survey is contained in survey2019_instrument.pdf.
The raw survey data for the 2020 survey is available in the file rdata_anon_survey2020.csv. Additional scripts are supplied to reproduce the exploratory factor analysis. The main entry is the file EFA.R, which imports the data. The file contains some comments on the process. The questionnaire for the 2020 survey is contained in survey2020_instrument.pdf.
The interview guide used for the five interviews is available in the file interview_instrument.pdf.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present the dataset which was created during a user study on evaluation of explainability of artificial intelligence (AI) at the Jagielloninan University as a collaborative work of computer science (GEIST team) and information sciences research groups. The main goal of the research was to explore effective explanations of AI model patterns to diverse audiences.
The dataset contains material collected from 39 participants during the interviews conducted by the Information Sciences research group. The participants were recruited from 149 candidates to form three groups that represented domain experts in the field of mycology (DE), students with data science and visualization background (IT) and students from social sciences and humanities (SSH). Each group was given an explanation of a machine learning model trained to predict edible and non-edible mushrooms and asked to interpret the explanations and answer various questions during the interview. The machine learning model and explanations for its decision were prepared by the computer science research team.
The resulting dataset was constructed from the surveys obtained from the candidates, anonymized transcripts of the interviews, the results from thematic analysis, and original explanations with modifications suggested by the participants. The dataset is complemented with the source code allowing one to reproduce the initial machine leaning model and explanations.
The general structure of the dataset is described in the following table. The files that contain in their names [RR]_[SS]_[NN] contain the individual results obtained from particular participant. The meaning of the prefix is as follows:
| File | Description |
| SURVEY.csv | The results from a survey that was filled by 149 participants out of which 39 were selected to form a final group of particiapnts. |
| CODEBOOK.csv | The codebook used in thematic analysis and MAXQDA coding |
| QUESTIONS.csv | List of questions that the participants were asked during interviews. |
| SLIDES.csv | List of slides used in the study with their interpretation and reference to MAXQDA themes and VISUAL_MODIFICATIONS tables. |
| MAXQDA_SUMMARY.csv | Summary of thematic analysis performed with codes used in CODEBOOK for each participant |
| PROBLEMS.csv | List of problems that participants were asked to solve during interviews. They correspond to three instances from the dataset that the participants had to classify using knowledge gained from explanations. |
| PROBLEMS_RESPONSES.csv | The responses to the problems for each participant to the problems listed in PROBLEMS.csv |
| VISUALIZATION_MODIFICATIONS.csv | Information on how the order of the slides was modified by the participant, which slides (explanations) were removed, and what kind of additional explanation was suggested. |
| ORIGINAL_VISUZALIZATIONS.pdf | The PDF file containing the visualization of explanations presented to the participants during the interviews |
| VISUALIZATION_MODIFICATIONS.zip | The PDF file containing the original slides from ORIGINAL_VISUZALIZATIONS.pdf with the modifications suggested by the participant. Each file is a PDF file named with the participant ID, i.e. [RR]_[SS]_[NN].pdf |
| TRANSCRIPTS.zip | The anonymized transcripts of interviews for each given participant, zipped into one archive. Each transcript is named after the particiapnt ID, i.e. [RR]_[SS]_[NN].csv and contains text tagged with slide number that it related to, question number from QUESTIONS.csv, and problem number from PROBLEMS.csv. |
The detailed structure of the files presented in the previous Table is given in the Technical info section.
The source code used to train ML model and to generate explanations is available on Gitlab
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
It's no secret that US university students often graduate with debt repayment obligations that far outstrip their employment and income prospects. While it's understood that students from elite colleges tend to earn more than graduates from less prestigious universities, the finer relationships between future income and university attendance are quite murky. In an effort to make educational investments less speculative, the US Department of Education has matched information from the student financial aid system with federal tax returns to create the College Scorecard dataset.
Kaggle is hosting the College Scorecard dataset in order to facilitate shared learning and collaboration. Insights from this dataset can help make the returns on higher education more transparent and, in turn, more fair.
Here's a script showing an exploratory overview of some of the data.
college-scorecard-release-*.zip contains a compressed version of the same data available through Kaggle Scripts.
It consists of three components:
New to data exploration in R? Take the free, interactive DataCamp course, "Data Exploration With Kaggle Scripts," to learn the basics of visualizing data with ggplot. You'll also create your first Kaggle Scripts along the way.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains a CSV file with students replies to the survey used to evaluate the Drop Project Student plugin for IntelliJ IDEA.
To support international readers, the question names (CSV headers) were translated to English and/or match the numbering that appear in the paper. However, the student's textual replies to the open ended questions were left in their original language, Portuguese.
Facebook
TwitterThe datasets contain all the data for the number of CS AP A exam taken in each state from 1998 to 2013, and detailed data on pass rates, race, and gender from 2006-2013. The data was complied from the data available at http://research.collegeboard.org/programs/ap/data. This data was originally gathered by the CSTA board, but Barb Ericson of Georgia Tech keeps adding to it each year.
historical.csv contains data for the number of CS AP A exam taken in each state from 1998 to 2013:
state: US states
1998-2013
Pop: population
pass_06_13.csv contains exam pass rates, race and gender data from 2006 to 2013 for selected states.
pass_12_13.csv contains exam pass rates, race and gender information for every state for 2012 and 2013.
The original datasets can be found here and here.
Using the datasets, can you examine the temporal trends in the exam pass rates by race, gender, and geographical location?
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
FANTASIAThis repository contains the data related to image descriptors and sound associated with a selection of frames of the films Fantasia and Fantasia 2000 produced by DisneyAboutThis repository contains the data used in the article Automatic composition of descriptive music: A case study of the relationship between image and sound published in the 6th International Workshop on Computational Creativity, Concept Invention, and General Intelligence (C3GI). Data structure is explained in detail in the article. AbstractHuman beings establish relationships with the environment mainly through sight and hearing. This work focuses on the concept of descriptive music, which makes use of sound resources to narrate a story. The Fantasia film, produced by Walt Disney was used in the case study. One of its musical pieces is analyzed in order to obtain the relationship between image and music. This connection is subsequently used to create a descriptive musical composition from a new video. Naive Bayes, Support Vector Machine and Random Forest are the three classifiers studied for the model induction process. After an analysis of their performance, it was concluded that Random Forest provided the best solution; the produced musical composition had a considerably high descriptive quality. DataNutcracker_data.arff: Image descriptors and the most important sound of each frame from the fragment "The Nutcracker Suite" in film Fantasia. Data stored into ARFF format.Firebird_data.arff: Image descriptors of each frame from the fragment "The Firebird" in film Fantasia 2000. Data stored into ARFF format.Firebird_midi_prediction.csv: Frame number of the fragment "The Firebird" in film Fantasia 2000 and the sound predicted by the system encoded in MIDI. Data stored into CSV format.Firebird_prediction.mp3: Audio file with the synthesizing of the prediction data for the fragment "The Firebird" of film Fantasia 2000.LicenseData is available under MIT License. To make use of the data the article must be cited.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery
This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport.
The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs.
The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small.
Dataset References
Facebook
Twitterhttps://creativecommons.org/public-domainhttps://creativecommons.org/public-domain
This dataset belongs to a paper about independent researchers submitted for the STI conference 2024 (https://sti2024.org/). It consists of several files described below. The data is from OpenAlex, collected through the InSySPo instance of the february snapshot of OpenAlex, hosted on Google Cloud. Since Topics are a new feature of OpenAlex data and therefore not part of the snapshot, this data as well as some other data not available at the InSySPo instance at the time of collection have been collected through the OpenAlex API, and incorporated in the files. Data from Scopus and Web of Science may be retrieved by using the search string in the appendix of the article.
Files all domains
240307_open_alex_works.tsv
contains all works retrieved with the search string for Independent researchers in OpenAlex in the article's appendix.
Files Social Sciences and/or Arts & Humanities
240312_open_alex_works_soc_sci_arts_2010.tsv
contains articles by Independent researchers in Social Sciences and Humanities published from 2010 and retrieved from OpenAlex.
240312_open_alex_authors_soc_sci_arts_2010.tsv
contains authors who are Independent researchers in Social Sciences and Humanities published from 2010 and retrieved from OpenAlex.
240313_open_alex_authors_all_works_soc_sci_arts_2010.tsv
contains all works by Independent researchers in Social Sciences and Humanities published from 2010 and retrieved from OpenAlex. All works mean that the researcher has at least once indicated independent status in the affiliation, and the author's other works are also included.
author_distribution_domain1.csv
contains number of works per number of authors in the domain Social Sciences (includes Arts & Humanities).
author_distribution_field33.csv
contains number of works per number of authors in the field Social Sciences.
author_distribution_field12.csv
contains number of works per number of authors in the field Arts & Humanities.
all_ssh_oa.csv
contains data for analyzing open access patterns for the domain Social Sciences (includes Arts & Humanities).
Facebook
Twitterhttps://doi.org/10.23668/psycharchives.4988https://doi.org/10.23668/psycharchives.4988
Citizen Science (CS) projects play a crucial role in engaging citizens in conservation efforts. While implicitly mostly considered as an outcome of CS participation, citizens may also have a certain attitude toward engagement in CS when starting to participate in a CS project. Moreover, there is a lack of CS studies that consider changes over longer periods of time. Therefore, this research presents two-wave data from four field studies of a CS project about urban wildlife ecology using cross-lagged panel analyses. We investigated the influence of attitudes toward engagement in CS on self-related, ecology-related, and motivation-related outcomes. We found that positive attitudes toward engagement in CS at the beginning of the CS project had positive influences on participants’ psychological ownership and pride in their participation, their attitudes toward and enthusiasm about wildlife, and their internal and external motivation two months later. We discuss the implications for CS research and practice. Dataset for: Greving, H., Bruckermann, T., Schumann, A., Stillfried, M., Börner, K., Hagen, R., Kimmig, S. E., Brandt, M., & Kimmerle, J. (2023). Attitudes Toward Engagement in Citizen Science Increase Self-Related, Ecology-Related, and Motivation-Related Outcomes in an Urban Wildlife Project. BioScience, 73(3), 206–219. https://doi.org/10.1093/biosci/biad003: Data (CSV format) collected for all field studies
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This dataset was collected during a main study that evaluated the virtual Cross Array Task (CAT) platform as an assessment tool for algorithmic thinking (AT) skills among K-12 students in Swiss compulsory education.
As algorithmic thinking becomes increasingly vital in our digital age, this study bridges the gap between traditional assessments and the needs of today's learners by introducing a digital platform. The virtual CAT, a digital adaptation of an unplugged assessment activity, offers scalable, automated assessments with reduced human intervention.
Study Context, Location and Participants
To comprehensively investigate algorithmic competencies within compulsory education, exploring their variations and determining the factors influencing them, in Spring 2023 we conducted an experimental study with the virtual CAT's.
The sample comprises 129 students (65 girls and 64 boys), selected from nine classes across five public schools in Ticino and Solothurn cantons.
Data Collection
During the data collection process, session and participant details were manually recorded by the administrator.
Each session has been assigned a unique identifier, and specific details, such as the date, canton, school name and type, and the students’ HarmoS grade (HG) level, have been recorded.
Student information are limited to sex and date of birth, with birth dates used to calculate ages, a significant factor in our demographic analysis.
To protect student privacy, unique identifiers have been assigned to each participant, keeping the data anonymous and secure.
The assessment tool automatically tracked all user interaction within the platform.
All data collected have been pseudonymised, aligning with prevailing open science practices in Switzerland (SNSF, 2021).
Data collection was integrated into a validation module of the app.
Data Features
The dataset comprises the following files:
These files collectively provide insights into the algorithmic actions of the students, demographic details, session logs, results, and more.
Usage & Ethics
In the spirit of open science, this dataset is made available to the public after meticulous anonymisation to ensure all participants' privacy and ethical treatment.
Initial authorisations were secured from school administrators, teachers, and parents.
Detailed communication regarding the study's nature, data handling, and objectives was transparently shared with all stakeholders.
REFERENCES
[1] A. Piatti, G. Adorni, L. El-Hamamsy, L. Negrini, D. Assaf, L. Gambardella & F. Mondada. (2022). The CT-cube: A framework for the design and the assessment of computational thinking activities. Computers in Human Behavior Reports, 5, 100166. https://doi.org/10.1016/j.chbr.2021.100166
[2] Adorni, G., & Piatti, S., & Karpenko, V. (2023). virtual CAT: An app for algorithmic thinking assessment within Swiss compulsory education. Zenodo Software. https://doi.org/10.5281/zenodo.10027851 On GitHub: https://github.com/GiorgiaAuroraAdorni/virtual-CAT-app/
[3] Adorni, G., & Karpenko, V. (2023). virtual CAT programming language interpreter. Zenodo Software. https://doi.org/10.5281/zenodo.10016535 On GitHub: https://github.com/GiorgiaAuroraAdorni/virtual-CAT-programming-language-interpreter/
[4] Adorni, G., & Karpenko, V. (2023). virtual CAT data infrastructure. Zenodo Software. https://doi.org/10.5281/zenodo.10015011 On GitHub: https://github.com/GiorgiaAuroraAdorni/virtual-CAT-data-infrastructure
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview: This is a large-scale dataset with impedance and signal loss data recorded on volunteer test subjects using low-voltage alternate current sine-shaped signals. The signal frequencies are from 50 kHz to 20 MHz.
Applications: The intention of this dataset is to allow to investigate the human body as a signal propagation medium, and capture information related to how the properties of the human body (age, sex, composition etc.), the measurement locations, and the signal frequencies impact the signal loss over the human body.
Overview statistics:
Number of subjects: 30
Number of transmitter locations: 6
Number of receiver locations: 6
Number of measurement frequencies: 19
Input voltage: 1 V
Load resistance: 50 ohm and 1 megaohm
Measurement group statistics:
Height: 174.10 (7.15)
Weight: 72.85 (16.26)
BMI: 23.94 (4.70)
Body fat %: 21.53 (7.55)
Age group: 29.00 (11.25)
Male/female ratio: 50%
Included files:
experiment_protocol_description.docx - protocol used in the experiments
electrode_placement_schematic.png - schematic of placement locations
electrode_placement_photo.jpg - visualization on the experiment, on a volunteer subject
RawData - the full measurement results and experiment info sheets
all_measurements.csv - the most important results extracted to .csv
all_measurements_filtered.csv - same, but after z-score filtering
all_measurements_by_freq.csv - the most important results extracted to .csv, single frequency per row
all_measurements_by_freq_filtered.csv - same, but after z-score filtering
summary_of_subjects.csv - key statistics on the subjects from the experiment info sheets
process_json_files.py - script that creates .csv from the raw data
filter_results.py - outlier removal based on z-score
plot_sample_curves.py - visualization of a randomly selected measurement result subset
plot_measurement_group.py - visualization of the measurement group
CSV file columns:
subject_id - participant's random unique ID
experiment_id - measurement session's number for the participant
height - participant's height, cm
weight - participant's weight, kg
BMI - body mass index, computed from the valued above
body_fat_% - body fat composition, as measured by bioimpedance scales
age_group - age rounded to 10 years, e.g. 20, 30, 40 etc.
male - 1 if male, 0 if female
tx_point - transmitter point number
rx_point - receiver point number
distance - distance, in relative units, between the tx and rx points. Not scaled in terms of participant's height and limb lengths!
tx_point_fat_level - transmitter point location's average fat content metric. Not scaled for each participant individually.
rx_point_fat_level - receiver point location's average fat content metric. Not scaled for each participant individually.
total_fat_level - sum of rx and tx fat levels
bias - constant term to simplify data analytics, always equal to 1.0
CSV file columns, frequency-specific:
tx_abs_Z_... - transmitter-side impedance, as computed by the process_json_files.py script from the voltage drop
rx_gain_50_f_... - experimentally measured gain on the receiver, in dB, using 50 ohm load impedance
rx_gain_1M_f_... - experimentally measured gain on the receiver, in dB, using 1 megaohm load impedance
Acknowledgments: The dataset collection was funded by the Latvian Council of Science, project “Body-Coupled Communication for Body Area Networks”, project No. lzp-2020/1-0358.
References: For a more detailed information, see this article: J. Ormanis, V. Medvedevs, A. Sevcenko, V. Aristovs, V. Abolins, and A. Elsts. Dataset on the Human Body as a Signal Propagation Medium for Body Coupled Communication. Submitted to Elsevier Data in Brief, 2023.
Contact information: info@edi.lv
Facebook
Twitter182 simulated datasets (first set contains small datasets and second set contains large datasets) with different cluster compositions – i.e., different number clusters and separation values – generated using clusterGeneration package in R. Each set of simulation datasets consists of 91 datasets in comma separated values (csv) format (total of 182 csv files) with 3-15 clusters and 0.1 to 0.7 separation values. Separation values can range between (−0.999, 0.999), where a higher separation value indicates cluster structure with more separable clusters. Size of the dataset, number of clusters, and separation value of the clusters in the dataset is printed in file name. size_X_n_Y_sepval_Z.csv: Size of the dataset = X number of clusters in the dataset = Y separation value of the clusters in the dataset = Z
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a ProgSnap2-based dataset containing anonymized logs of over 34,000 programming events exhibited by 81 programming students in Scratch, a visual programming environment, during our designed study as described in the paper "Semi-Automatically Mining Students' Common Scratch Programming Behaviors." We also include a list of approx. 3100 mined sequential patterns of programming processes that are performed by at least 10% of the 62 of the 81 students who are novice programmers, and represent maximal patterns generated by the MG-FSM algorithm while allowing a gap of one programming event. README.txt — overview of the dataset and its propertiesmainTable.csv — main event table of the dataset holding rows of programming eventscodeState.csv — table holding XML representations of code snapshots at the time of each programming eventdatasetMetadata.csv — describes features of the datasetScratch-SeqPatterns.txt — list of sequential patterns mined from the Main Event Table
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset provides a collection of 160 instances belonging to two classes (pass' = 136 andfail' = 24). The data is an anonymised, statistically sound and reliable representation of the original data collected from students studying computer science modules at a UK University. Each instance is made up of 19 features plus the class label. Eight of the features represent students' online behaviour including bio information retrieved from Virtual Learning Environment. Eleven of the features represent students' neighbourhood influence retrieved from Office for Students database. The data has been compiled and made available in de-facto/de-jure standard open formats (CSV and JSON).
This data was collected and used in a research study undertaken by academics and researchers at Computer Science Department, Edge Hill University, United Kingdom. To encourage reproducibility of the experiments and results reported, the data is provided in the exact training-validation-testing splits used in the experiments.
Facebook
Twitterhttps://doi.org/10.23668/psycharchives.4988https://doi.org/10.23668/psycharchives.4988
Citizen Science (CS) projects play a crucial role in engaging citizens in conservation efforts. While implicitly mostly considered as an outcome of CS participation, citizens may also have a certain attitude toward engagement in CS when starting to participate in a CS project. Moreover, there is a lack of CS studies that consider changes over longer periods of time. Therefore, this research presents two-wave data from four field studies of a CS project about urban wildlife ecology using cross-lagged panel analyses. We investigated the influence of attitudes toward engagement in CS on self-related, ecology-related, and motivation-related outcomes. We found that positive attitudes toward engagement in CS at the beginning of the CS project had positive influences on participants’ psychological ownership and pride in their participation, their attitudes toward and enthusiasm about wildlife, and their internal and external motivation two months later. We discuss the implications for CS research and practice. Dataset for: Greving, H., Bruckermann, T., Schumann, A., Stillfried, M., Börner, K., Hagen, R., Kimmig, S. E., Brandt, M., & Kimmerle, J. (2023). Attitudes Toward Engagement in Citizen Science Increase Self-Related, Ecology-Related, and Motivation-Related Outcomes in an Urban Wildlife Project. BioScience, 73(3), 206–219. https://doi.org/10.1093/biosci/biad003: Codebook (CSV format) of the variables of all field studies
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study.
We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material:
The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months.
This dataset consists of five comma-separated values (.csv) files describing our inventory:
Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases:
The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Explanation/Overview:
Corresponding dataset for the analyses and results achieved in the CS Track project in the research line on participation analyses, which is also reported in the publication "Does Volunteer Engagement Pay Off? An Analysis of User Participation in Online Citizen Science Projects", a conference paper for the conference CollabTech 2022: Collaboration Technologies and Social Computing and published as part of the Lecture Notes in Computer Science book series (LNCS,volume 13632) here. The usernames have been anonymised.
Purpose:
The purpose of this dataset is to provide the basis to reproduce the results reported in the associated deliverable, and in the above-mentioned publication. As such, it does not represent raw data, but rather files that already include certain analysis steps (like calculated degrees or other SNA-related measures), ready for analysis, visualisation and interpretation with R.
Relatedness:
The data of the different projects was derived from the forums of 7 Zooniverse projects based on similar discussion board features. The projects are: 'Galaxy Zoo', 'Gravity Spy', 'Seabirdwatch', 'Snapshot Wisconsin', 'Wildwatch Kenya', 'Galaxy Nurseries', 'Penguin Watch'.
Content:
In this Zenodo entry, several files can be found. The structure is as follows (files and folders and descriptions).
corresponding_calculations.html
corresponding_calculations.qmd
annotations.csv
comments.csv
478_rolechanges.csv
1104_rolechanges.csv
...478_edges.csv
1104_edges.csv
...478_nodes.csv
1104_nodes.csv
...edges_4782016_q1.csv
edges_4782016_q2.csv
edges_4782016_q3.csv
edges_4782016_q4.csv
...
nodes_4782016_q1.csvnodes_4782016_q4.csv
nodes_4782016_q3.csv
nodes_4782016_q2.csv
...
1104
Edges
...
Nodes
...
...
datavizfuncs.R
corresponding_calculations.qmdimport.R
corresponding_calculations.qmdGrouping:
The data is grouped according to given criteria (e.g., project_title or time). Accordingly, the respective files can be found in the data structure
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset introduces the anonymised Open University Learning Analytics Dataset (OULAD). It contains data about courses, students and their interactions with Virtual Learning Environment (VLE) for seven selected courses (called modules). Presentations of courses start in February and October - they are marked by “B” and “J” respectively. The dataset consists of tables connected using unique identifiers. All tables are stored in the csv format.
https://analyse.kmi.open.ac.uk/resources/images/model.png" alt="">
courses.csv File contains the list of all available modules and their presentations. The columns are: - code_module – code name of the module, which serves as the identifier. - code_presentation – code name of the presentation. It consists of the year and “B” for the presentation starting in February and “J” for the presentation starting in October. - length - length of the module-presentation in days.
The structure of B and J presentations may differ and therefore it is good practice to analyse the B and J presentations separately. Nevertheless, for some presentations the corresponding previous B/J presentation do not exist and therefore the J presentation must be used to inform the B presentation or vice versa. In the dataset this is the case of CCC, EEE and GGG modules.
assessments.csv This file contains information about assessments in module-presentations. Usually, every presentation has a number of assessments followed by the final exam. CSV contains columns:
vle.csv The csv file contains information about the available materials in the VLE. Typically these are html pages, pdf files, etc. Students have access to these materials online and their interactions with the materials are recorded. The vle.csv file contains the following columns:
studentInfo.csv This file contains demographic information about the students together with their results. File contains the following columns:
studentRegistration.csv This file contains information about the time when the student registered for the module presentation. For students who unregistered the date of unregistration is also recorded. File contains five columns: