ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Dataset Overview: This dataset pertains to the examination results of students who participated in a series of academic assessments at a fictitious educational institution named "University of Exampleville." The assessments were administered across various courses and academic levels, with a focus on evaluating students' performance in general management and domain-specific topics.
Columns: The dataset comprises 12 columns, each representing specific attributes and performance indicators of the students. These columns encompass information such as the students' names (which have been anonymized), their respective universities, academic program names (including BBA and MBA), specializations, the semester of the assessment, the type of examination domain (general management or domain-specific), general management scores (out of 50), domain-specific scores (out of 50), total scores (out of 100), student ranks, and percentiles.
Data Collection: The examination data was collected during a standardized assessment process conducted by the University of Exampleville. The exams were designed to assess students' knowledge and skills in general management and their chosen domain-specific subjects. It involved students from both BBA and MBA programs who were in their final year of study.
Data Format: The dataset is available in a structured format, typically as a CSV file. Each row represents a unique student's performance in the examination, while columns contain specific information about their results and academic details.
Data Usage: This dataset is valuable for analyzing and gaining insights into the academic performance of students pursuing BBA and MBA degrees. It can be used for various purposes, including statistical analysis, performance trend identification, program assessment, and comparison of scores across domains and specializations. Furthermore, it can be employed in predictive modeling or decision-making related to curriculum development and student support.
Data Quality: The dataset has undergone preprocessing and anonymization to protect the privacy of individual students. Nevertheless, it is essential to use the data responsibly and in compliance with relevant data protection regulations when conducting any analysis or research.
Data Format: The exam data is typically provided in a structured format, commonly as a CSV (Comma-Separated Values) file. Each row in the dataset represents a unique student's examination performance, and each column contains specific attributes and scores related to the examination. The CSV format allows for easy import and analysis using various data analysis tools and programming languages like Python, R, or spreadsheet software like Microsoft Excel.
Here's a column-wise description of the dataset:
Name OF THE STUDENT: The full name of the student who took the exam. (Anonymized)
UNIVERSITY: The university where the student is enrolled.
PROGRAM NAME: The name of the academic program in which the student is enrolled (BBA or MBA).
Specialization: If applicable, the specific area of specialization or major that the student has chosen within their program.
Semester: The semester or academic term in which the student took the exam.
Domain: Indicates whether the exam was divided into two parts: general management and domain-specific.
GENERAL MANAGEMENT SCORE (OUT of 50): The score obtained by the student in the general management part of the exam, out of a maximum possible score of 50.
Domain-Specific Score (Out of 50): The score obtained by the student in the domain-specific part of the exam, also out of a maximum possible score of 50.
TOTAL SCORE (OUT of 100): The total score obtained by adding the scores from the general management and domain-specific parts, out of a maximum possible score of 100.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In "Sample Student Data", there are 6 sheets. There are three sheets with sample datasets, one for each of the three different exercise protocols described (CrP Sample Dataset, Glycolytic Dataset, Oxidative Dataset). Additionally, there are three sheets with sample graphs created using one of the three datasets (CrP Sample Graph, Glycolytic Graph, Oxidative Graph). Each dataset and graph pairs are from different subjects. · CrP Sample Dataset and CrP Sample Graph: This is an example of a dataset and graph created from an exercise protocol designed to stress the creatine phosphate system. Here, the subject was a track and field athlete who threw the shot put for the DeSales University track team. The NIRS monitor was placed on the right triceps muscle, and the student threw the shot put six times with a minute rest in between throws. Data was collected telemetrically by the NIRS device and then downloaded after the student had completed the protocol. · Glycolytic Dataset and Glycolytic Graph: This is an example of a dataset and graph created from an exercise protocol designed to stress the glycolytic energy system. In this example, the subject performed continuous squat jumps for 30 seconds, followed by a 90 second rest period, for a total of three exercise bouts. The NIRS monitor was place on the left gastrocnemius muscle. Here again, data was collected telemetrically by the NIRS device and then downloaded after he had completed the protocol. · Oxidative Dataset and Oxidative Graph: In this example, the dataset and graph are from an exercise protocol designed to stress the oxidative system. Here, the student held a sustained, light-intensity, isometric biceps contraction (pushing against a table). The NIRS monitor was attached to the left biceps muscle belly. Here, data was collected by a student observing the SmO2 values displayed on a secondary device; specifically, a smartphone with the IPSensorMan APP displaying data. The recorder student observed and recorded the data on an Excel Spreadsheet, and marked the times that exercise began and ended on the Spreadsheet.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This zip file contains data files for 3 activities described in the accompanying PPT slides 1. an excel spreadsheet for analysing gain scores in a 2 group, 2 times data array. this activity requires access to –https://campbellcollaboration.org/research-resources/effect-size-calculator.html to calculate effect size.2. an AMOS path model and SPSS data set for an autoregressive, bivariate path model with cross-lagging. This activity is related to the following article: Brown, G. T. L., & Marshall, J. C. (2012). The impact of training students how to write introductions for academic essays: An exploratory, longitudinal study. Assessment & Evaluation in Higher Education, 37(6), 653-670. doi:10.1080/02602938.2011.5632773. an AMOS latent curve model and SPSS data set for a 3-time latent factor model with an interaction mixed model that uses GPA as a predictor of the LCM start and slope or change factors. This activity makes use of data reported previously and a published data analysis case: Peterson, E. R., Brown, G. T. L., & Jun, M. C. (2015). Achievement emotions in higher education: A diary study exploring emotions across an assessment event. Contemporary Educational Psychology, 42, 82-96. doi:10.1016/j.cedpsych.2015.05.002andBrown, G. T. L., & Peterson, E. R. (2018). Evaluating repeated diary study responses: Latent curve modeling. In SAGE Research Methods Cases Part 2. Retrieved from http://methods.sagepub.com/case/evaluating-repeated-diary-study-responses-latent-curve-modeling doi:10.4135/9781526431592
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We used the standard and validated Pittsburgh Sleep Quality Index (PSQI), which was developed by researchers at the University of Pittsburgh in 1988 AD. The questionnaire included baseline variables like age, sex, academic year, and questions addressing participants’ sleep habits and quality i.e. PSQI. The PSQI assesses the sleep quality during the previous month and contains 19 self-rated questions that yield seven components: subjective sleep quality sleep, latency, sleep duration, sleep efficiency and sleep disturbance, and daytime dysfunction. Each component is to be assigned a scored that ranges from zero to three, yielding a PSQI score in a range that goes from 0 to 21. A total score of 0 to 4 is considered as normal sleep quality; whereas, scores greater than 4 are categorized as poor sleep quality.Data collected from students through the Google forms were extracted to Google sheets, cleaned in Excel, and then imported and analyzed using STATA 15. Simple descriptive analysis was performed to see the response for every PSQI variable. Then calculation performed following PSQI form administration instructions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset provides simulated insights into student engagement and performance within the THM platform. It outlines mathematical representations of student learning profiles, detailing behaviors ranging from high achievers to inconsistent performers. Additionally, the dataset includes key performance indicators, offering metrics like room completion, points earned, and time spent to gauge student progress and interaction within the platform's modules. Here are definitions of the learning profiles, along with mathematical representations of their behaviors:
High Achiever: These are students who consistently perform well across all modules. Their performance can be described as a normal distribution centered at a high mean value. Their performance P in a given module can be modelled as: P = N(90, 5) where N is the normal distribution function, 90 is the mean, and 5 is the standard deviation. Average Performer: These are students who typically perform at the average level across all modules. Their performance can be described as a normal distribution centered at a medium mean value: P = N(70, 10), where 70 is the mean, and 10 is the standard deviation. Late Bloomer: These are students whose performance improves as they progress through the modules. Their performance can be modelled as: P = N(50 + i*10, 10), where i is the module index and shows an increasing trend. Specialized Talent: These are students who have average performance in most modules but excel in a particular module (e.g., module5). Their performance can be described as: P = N(90, 5) if the module is module 5, else P = N(70, 10). Inconsistent Performer: These are students whose performance varies significantly across modules. Their performance can be described as a normal distribution with a high standard deviation: P = N(70, 30), where 70 is the mean, and 30 is the high standard deviation, reflecting inconsistency. Note that the actual performances are bounded between 0 and 100 using the function max(0, min(100, performance)) to ensure valid percentages. In these formulas, the np.random.normal function is used to simulate the variability in student performance around the mean values. The first argument to this function is the mean, and the second argument is the standard deviation, reflecting the level of variability around the mean. The function returns a number drawn from the normal distribution described by these parameters. Note that the proposed method is experimental and has not been validated.
List of Key Performance Indicators (KPIs) for Student Engagement and Progress within the Platform:
Room Name: This represents the unique identifier or name of a specific room (or module). Think of each room as a separate module or lesson within an educational platform. For example, Room1, Room2, etc. Total rooms completed: Indicates the cumulative number of rooms that a student has fully completed. Completion is typically determined by meeting certain criteria, like answering all questions or achieving a certain score. Rooms registered in: Represents the number of rooms a student has registered or enrolled in. This could be different from the total number of rooms they've completed. Ratio of Questions completed per room: This gives an insight into a student's progress in a particular room. For instance, a ratio of 7/10 suggests the student has completed 7 out of 10 available questions in that room. Room Completed (yes no): Indicates whether a student has fully completed a specific room or not. This could be determined by the percentage of material covered, questions answered, or a certain score achieved. Room Last deploy (count of days): Refers to the number of days since the last update or deployment was made to that room. It can give an idea about the effort of the student. Points in room used for the leaderboard (range 0-560): Each room assigns points based on student performance, and these points contribute to leaderboards. The range suggests that a student can earn anywhere from 0 to 560 points in a particular room. Last answered question in a room (27th Jan 2023): This indicates the date when a student last answered a question in a specific room. It can provide insights into a student's recent activity and engagement. Total points in all rooms (range 0-560): The cumulative score a student has achieved across all rooms. Path Percentage completed (range 0-100): Indicates the percentage of the overall learning path that the student has completed. A path could consist of multiple modules or rooms. Module Percentage completed (range 0-100): Represents how much of a specific module (which could have multiple lessons or topics) a student has completed. Room Percentage completed (range 0-100): Shows the percentage of a specific room that has been completed by a student. Time Spent on the platform (seconds): This provides an aggregate of the total time a student has spent on the entire educational platform. Time spent on each room (seconds): Represents the amount of time a student has dedicated to a specific room. This can give insights into which rooms or modules are the most time-consuming or engaging for students.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
METHODS
Topic determination
The project was developed as a team science exercise during a course on Nutrient Biology (New Mexico Institute of Mining and Technology, New Mexico, USA; BIOL 4089/5089). Students were all women pursuing degrees in Biology and Earth Science, with extensive internet search acumen developed from coursework and personal experience. We (students and professor) devoted ~5 hours to discussing women’s health topics prior to searching, defining search criteria, and developing a scoring system. These discussions led to a list of 12, non-cancer health topics particular to women’s health associated with human cis-gender female biology. Considerations of transgender health were discussed, with the consensus decision that those issues are scientifically relevant but deserving of a separate analysis not included here.
Search protocol
After agreeing on search terms, we experimented with settings in the Advanced Search feature in Google (www.google.com), and collectively agreed to the following settings: Language (English); search terms appearing in the “text” of the page; ANY of the terms “woman”, “women” ,“female”; ALL terms when using a single topic from list above with the addition of the word “nutrient”. Figure 1 shows a screenshot for how a search was conducted for endometriosis as an example. To standardize data collection among investigators, all results from the first 5 pages of results were collected. Search result URLs were followed, where a suite of data were gathered (variables in Table 2) and entered into a shared database (Appendix 1). Definitions for each variable (Table 2) were articulated following a 1-week trial period and further group discussion. Variables were defined to minimize subjectivity across investigators, clarify the reporting of results, and standardize data collection.
Scoring metric
The scoring metric was developed to allow for mean and variation (standard deviation, SD; standard error, SE) to be calculated from each topic, and compare among topics, and answer how much variation in quality is likely to be encountered across categories of women’s health issues. We report both variation metrics as SD encompasses the variation of the data set, while SE scales for sample size variation among categorical variables. When searching topics using the same criteria:
Are some topics more likely to result in results for pages with scientifically verifiable information?
Does the variation of quality vary between topics?
Peer-reviewed journal articles were included in the database if encountered in the searches but were removed before statistical analysis. The justification for removing those sources was that it is possible the Google algorithm included those sources disproportionately for our group of college students and a professor who regularly searches for academic articles. We also assume those sources are consulted less frequently by lay audiences searching for health information.
Scores were based on six binary (presence/absence) attributes of each web page evaluated. These were: Author (name present/absent), author credentials given, reviewer, reviewer credentials, sources listed, peer-reviewed sources listed. A score of 1 was given if the attribute was present, and 0 if absent. The total number of references cited on a webpage, as well as the number of those that were peer-reviewed (Table 2) were recorded, but for scoring purposes, a 1 or 0 was assigned if there were or were not references and peer-reviewed references, respectively. Potential scores thus ranged from 0 to 6.
We performed a simple validation experiment via anonymous surveys sent to students at our institution (New Mexico Tech), a predominantly STEM-focused public university. Using the final scores from the search result webpages, a single website from each score was selected at random using the RAND() function in Microsoft Excel to assign a random variable as an identifier to each URL, then sorting by that variable and selecting the first article in a given score category. Webpages with scores of 0 or 6 were excluded from the validation experiment. Following institutional review, a survey was sent to the “all student” email list, and recipients were directed to a web survey that asked participants to give a score of 1-5 to each of the 5 random (but previously scored) web pages, without repeating a score. Participants were given minimal information about the project and had no indication the pages had already been assigned scores. Survey results were collected anonymously by having responses routed to a spreadsheet, and no personally identifiable data were collected from participants.
Statistical analysis
Differences in mean scores within each health topic and the mean number of sources per evaluated webpage were evaluated by calculating Bayes Factors; response variables (mean score, number of sources) for each topic were compared to a null model of no difference across topics (y ~ category + error). Equal prior weight was given to each potential model. Variance inequality was tested via Levene’s test, and normality was assessed using quartile-quartile plots. Correlation analysis was used to test the strength of the association between individual scores per website and the number of sources cited per website. Because only the presence or absence of sources was considered in the score calculation, the number of sources is independent of score, and justifies correlation analysis. Statistical analyses were conducted in the open-source software package JASP version 0.19.2 (JASP, 2024).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:
Context:
Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.
Inspiration:
The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.
Dataset Information:
The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:
Use Cases:
Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set is for the research entitled"Group Discussions in Secondary School Chemistry: Unveiling Pedagogical Alchemy for Academic Advancement".
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The three sheets in the Excel file correspond to Fig.5, Fig.6 and Fig.10 respectively in the paper. The output data is the PSM (policy similarity measure) scores (sheet: rule set 4 test, sheet: rule set 8 test) generated by our proposed algorithm and an existing algorithm (sheet: set 4 test in Appendix).
Creation of data: 07/2015
Type of data: processed data
Software used: Eclipse
Source: Experiment
Number of samples: 120
Total size of samples: 20 KB
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset contains test results from a digital intervention study of the CALLIDUS Project in a high school in Berlin. 13 Students were randomly sampled in two groups and completed various linguistic tasks. The focus of the study was to find out whether learning Latin vocabulary in authentic contexts leads to higher lexical competence, compared to memorizing traditional vocabulary lists.
The data is available in JSON format as provided by the H5P implementation of XAPI. File names indicate the time of test completion, in the concatenated form of "year-month-day-hour-minute-second-millisecond". This allows us to trace the development of single learners who were fast enough to perform the test twice in a row.
Changelog:
Version 2.0: Each exercise now has a unique ID that is consistent in the whole dataset, so evaluation/visualization can refer to specific exercises more easily.
Version 3.0: A simplified Excel Spreadsheet has been added to enhance the reusability of the dataset. It contains a slightly reduced overview of the data, but the core information (user ID, task statement, correct solution, given answer, score, duration) is still present.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset is a Scottish Fuel Poverty Index created in the summer of 2023 by EDINA@University of Edinburgh as part of their student internship programme. The user guide provides descriptions of each data variable used in creating the index. The basic rationale was to replicate for Scotland work that had been conducted previously but only in respect to England and Wales. The two indices are not strictly directly comparable due to data availability and spatial granularity but provide standalone snapshots of relative fuel poverty across Great Britain. The Scottish Index is fully open source and for purposes of transparency and repeatability this guide provides an open methodology and is accompanied by the underlying data. Data are provided in good faith 'as is' and is the sole product of student effort as part of mentoring activities conducted by EDINA at the University. Each variable that was used in the Index was normalised relative to the individual values for that variable - which means the values presented in the underlying FPI data table do not represent the actual numbers for each local authority - merely the percentage relative to the other local authorities in Scotland. A separate file 'Fuel-poverty-index-raw-data-with-calc.csv' is available which contains the raw percentages used for the index along with a table containing the calculations used to obtain the final score and the main FPI data table. Fuel Poverty Index Excel: This file contains each Scottish local authority's ability to pay score, demand score and final score which were all obtained from the several different variables. The raw data for these variables can be found in the Raw Data file and an explanation for each variable can be found in the User Guide document. The scores are between 1 to 100 and are normalised relative to each other. This means the final scores do not represent the actual physical values for each area. Fuel Poverty Index csv: This file contains the normalised processed data that makes up the Scottish fuel poverty index with variables being in range of 1 to 100. Some variables have been weighted depending on how important they are to the index. The final scores rating each Scottish local authority from 1 to 100 are also included. Raw data: This file contains the raw unprocessed data that the index was created from for all Scottish local authorities. User Guide: This file contains the documentation of the process to create the index as well as descriptions of what each column in the Fuel Poverty Index csv file contain. This file also provides some examples of the visualisation created from the index Fuel Poverty Index Shapefile: This folder contains the .shp shape file comprising all the data from Fuel Poverty Index csv, in addition to also having the geospatial polygons associated with each local authority boundary. For the best viewing, the British National Grid EPSG 27700 coordinate system should be used.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset description of the “ICDAR2023 Competition on Detection and Recognition of Greek Letters on Papyri”
Prof. Dr. Isabelle Marthot-Santaniello, Dr. Olga Serbaeva
2024.09.16
Introduction
The present dataset stems from the ICDAR2023 Competition on Detection and Recognition of Greek Letters on Papyri (original links to the competition are provided in the file “1b.CompetitionLinks.”)
The aim of this competition was to investigate the performance of glyph detection and recognition in a very challenging type of historical document: Greek papyri. The detection and recognition of Greek letters on papyri is a preliminary step for computational analysis of handwriting that can lead to major steps forward in our understanding of this important source of information on Antiquity. Such detection and recognition can be done manually by trained papyrologists. It is, however, a time-consuming task that would need automatising.
We provide here the documents related to two different tasks: localisation and classification. The document images are provided by several institutions and are representative of the diversity of book hands on papyri (a millennium time span, various script styles, provenance, states of preservation, means of digitization and resolution).
How the dataset was constructed
In the frame of D-Scribes project lead by Prof. Dr. Isabelle Marthot-Santaniello, 2018-2023, around 150 papyri fragments containing Iliad were manually annotated at a letter-level in READ.
The editions were taken, for the major part, from papyri.info, and were simplified, i.e. the accents, editorial marks, and other additional information were removed to be as close as possible to what is to be found on papyri. When the text was not available on papyri.info, the relevant passage was extracted from the Homer Iliad of Perseus.
From those, 150 plus papyri fragments, 185 surfaces (sides of fragments) belonging to 136 different manuscript identified by their Trismegistos numbers, (further TMs) were selected to serve as a material for Competition. These 185 surfaces were separated into the “training set” and the “test set” provided for the competition as a set of images and corresponding data in JSON format.
Details on the competition summarised in "ICDAR 2023 Competition on Detection and Recognition of Greek Letters on Papyri", by Mathias Seuret, Isabelle Marthot-Santaniello, Stephen A. White, Olga Serbaeva Saraogi, Selaudin Agolli, Guillaume Carrière, Dalia Rodriguez-Salas, and Vincent Christlein; edited by G. A. Fink et al. (Eds.): ICDAR 2023, LNCS 14188, pp. 498–507, 2023. https://doi.org/10.1007/978-3-031-41679-8_29.
After the competition ended, the decision was taken to release manually annotated dataset for the “test set” as well. Please find the description of each included document below.
Dataset Structure
“1. CompetitionOverview.xlsx” contains the metadata of the used images in Excel file, state 2024.09.19. Here is the structure of the Excel file:
Excel columns
Name
Content
Notes
A
TM
Trismegistos number is internationally used for papyri identification
With READ item name in ().
B
Papyri.info link
link
C
Fragments' Owning Institution (from papyri.info)
Institution’s name
Institution that physically stores the papyri
D
Availability (of metadata, papyri.info)
link
Metadata reuse clarification
E
text ID (READ)
Number from READ SQL database that was used to link the images and the editions.
Serves to locate the attached images and understand the JSON structure.
F
Test/Training
I.e. the image was originally included in the training or in the test set of the dataset.
G
Image Name (for orientation)
As in READ
H
Cedopal link
link
Contains additional metadata and includes the links to all available online images.
I
License from the Institution webpage.
Either license or usage summary.
If no precise licence has been given, the summary of the reuse rights is provided with a link to the regulations in column K
J
Image URL
link
Not all images are available online. Please contact the owning institution directly if the image is not available.
K
Information on the image usage from the institution
link
In case of any doubt, please contact the owning institution directly.
L
Notes
For the purpose of an easy overview, the items with special problems, i.e. images not online or missing links, have been marked in red.
2a. “Training file” (containing 150 papyri images separated into 108 texts and HomerCompTraining.json). The images are those of papyri containing Iliad of Homer in JPG-format. These were processed in READ, namely, each visible letter on a given papyri was linked to the edition of the Iliad, through this process, each linked letter of the edition was linked to its coordinates in pixels on the HTML-surface of the image. All that information is provided in the JSON-file.
The JSON file contains the “annotations” (b-boxes of each letter/sign), “categories” (Greek letters), “images” (Image IDs), and “licenses”. The links between image and bboxes is defined via the “id” in the “images” part (for example, "id": 6109). This same id is encoded as “"image_id": 6109” in the “annotations”. Alternatively, “text_id” which can be found in the “images” URL and in the file-names provided here and containing images, can be used for data linking.
Let us now describe the content of each part of the JSON file:Each “annotation” contains“area" characterised as “bbox" with coordinates, “category_id”, that allows to identify which Greek letter in categories is represented by the number; “id”, which is a unique number of the cliplet, i.e. area; “image_id”, that links cliplet to the surface of the image having the same id; “iscrowd" and “seg_id" are useful to find the information back in READ database; and, finally, “tags”.
In tags, “BaseType" was used to annotate quality as described below. “FootMarkType”, ft1, etc., was used for clustering tests, but played no role for the Competition.“BaseType” ot bt-tags were assigned to the letters to mark the quality of preservation: bt-1: well-preserved letter that should allows easy identification for both human eyes and the Computer-vision; bt-2: Partially preserved letter that might also have some background damage (holes, additional ink, etc), but remains readable, and has one interpretation. bt-3: Letters damaged to such an extant that they cannot be identified without reading an edition. These are treated as traces of ink. bt-4: The letters that have some damage, but this damage is of such kind that it makes possible multiple interpretations. For example, missing/defaced horizontal stroke makes alpha indistinguishable from damaged delta or lambda.
Each “category” contains “id”, this is a number references also in “annotations” and it allows to identify which Greek letter was in the bbox; ”name”, for example, “χ”; and “supercategory”, i.e. “Greek”.
Each “image” contains the following sub fields: “bln_id" is an internal READ number of the html surface; "date_captured": null - is another READ field; "file_name": “./images/homer2/txt1/P.Corn.Inv.MSS.A.101.XIII.jpg", allows to link easy image and text, i.e. for the image in question the JPG will be in the file called “txt1”, it is very similar by structure and function to "img_url": "./images/homer2/txt1/P.Corn.Inv.MSS.A.101.XIII.jpg"; each image has “height" and “width" expressed in pixels. Each image has “id”, and this id is referenced in the “annotations” under “image_id”. Finally, each image contains a link to “license”, expressed as a number.
Each “licence” lists a license as it was found during the time of competition, i.e. in February 2023.
2b. “Test file” contains 34 papyri image sides separated into 31 TMs and HomerCompTesting.json The JSON file here only allows to connect the images with the “categories”, “images”, “licenses”, but without the “annotations”. The structure and logic is otherwise the same like in “Training” JSON.
2c. “Answers file” Containing the “annotations” and other information for the 34 papyri of the “Testing” dataset. The structure and logic is the same like in “Training” JSON.
“Additional files” Containing lists of duplicate segments id (multiple possible readings or tags), respectively 6 items for “Training”, 17 for “Testing” and 15 for “Answers”.
“Dataset Description”This same description included for completeness.
References
The Dataset was reused or mentioned in a number of publications (state September 2024)
Mohammed, H., Jampour, M. (2024). "From Detection to Modelling: An End-to-End Paleographic System for Analysing Historical Handwriting Styles". In: Sfikas, G., Retsinas, G. (eds) Document Analysis Systems. DAS 2024. Lecture Notes in Computer Science, vol 14994. Springer, Cham, pp. 363–376. https://doi.org/10.1007/978-3-031-70442-0_22
De Gregorio, G., Perrin, S., Pena, R.C.G., Marthot-Santaniello, I., Mouchère, H. (2024). "NeuroPapyri: A Deep Attention Embedding Network for Handwritten Papyri Retrieval". In: Mouchère, H., Zhu, A. (eds) Document Analysis and Recognition – ICDAR 2024 Workshops. ICDAR 2024. Lecture Notes in Computer Science, vol 14936. Springer, Cham, pp. 71–86. https://doi.org/10.1007/978-3-031-70642-4_5
Vu, M. T., Beurton-Aimar, M. "PapyTwin net: a Twin network for Greek letters detection on ancient Papyri". HIP '23: 7th International Workshop on Historical Document Imaging and Processing, San Jose, CA, USA, August
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Objective: This study investigates the Euclidean geometry learning opportunities presented in Further Education and Training (FET) mathematics textbooks. Specifically, it examines the alignment of textbook content with the Curriculum and Assessment Policy Statement (CAPS) curriculum, levels of geometric thinking promoted, representational forms, contextual features, and expected responses.Methodology: The research analyzed three FET mathematics textbook series to identify strengths and weaknesses in Euclidean geometry content. This study adopted the interpretivist paradigm. The study used a qualitative research approach and a case study research design. Purposive sampling techniques were used to select the textbooks currently used for teaching. This study used textbook analysis as the data collection method. Deductive content analysis was used as a data analysis strategy. In this study, interrater reliability was used to preserve the quality of data coding and reporting among coders as a percentage of agreement between three coders (Belur et al., 2021).Data collectionThis study employed various textbook analysis instruments that were specifically designed within its framework, including the content coverage instrument, mathematical activity instrument, geometric thinking levels instrument, representation forms instrument, contextual features instrument, and answer forms instrument. 1.1.1 Content coverage instrumentThe study employed a content coverage instrument as a data collection tool, with a focus on textbook topics and subtopics. The content coverage instrument, in the form of a checklist, listed all the topics and subtopics of Euclidean geometry in the grade 10–12 curriculum and assessed whether each content was covered in the respective textbooks based on their corresponding grade levels. The aim was to provide a comprehensive assessment of the extensive range of content knowledge that students are required to acquire at each school level, specifically Grades 10-12, using a rubric. The rubric for assessment was designed to gather data and emphasised the extent of Euclidean geometry content coverage. The rubric focused on content coverage and provided a space to indicate if a subtopic was covered (by ticking) or not covered (-).A checklist form was used to gather data from the textbook tasks by indicating the topics and subtopics covered in each textbook series. The checklist was developed from the CAPS guideline document for Grades 10–12. This instrument was used to examine the selected textbook content coverage to determine the extent to which the textbooks align with the CAPS Mathematics guideline document. This instrument divided the Euclidean geometry content into three categories: Grade 10, Grade 11, and Grade 12, as stipulated in the CAPS Mathematics guideline document for FET-level mathematics. To bolster results objectivity, all CAPS checklist items were quantified using dichotomous (yes/no) responses, summarised by scoring rubrics to justify different responses. A mathematical activity form tool was developed to collect data regarding the nature of mathematical activities in both worked examples and exercise tasks within each textbook. The form was designed in the format of a rubric based on Gracin’s (2018) mathematical activity framework: representation and modelling, calculation and operation, interpretation, and argumentation and reasoning. The rubric consists of five major sections, with the first section focusing on the nature of the mathematical activities required to successfully engage with geometry questions. A rubric was provided for the nature of mathematical activities for each geometry task, which was broken down into four categories to explore the nature of tasks more clearly. The categories of mathematical activities focused on representation and modelling, calculation and operation, interpretation, and argumentation and reasoning.As this study intended to investigate the students’ OTL afforded by textbooks, an evaluation form was used to gather data. A form containing the four kinds of Euclidean geometry task types was included in the evaluation form used to examine the nature of each Euclidean geometry task. This form consisted of a list of the characteristics of each mathematical activity required to carry out the geometry tasks: Representation and modelling (R), Calculation and operation (C), Interpretation (I), and Argumentation and reasoning (A).” This form serves as a classification template, categorising tasks according to the competence the tasks demand of the students. Table 4.5 presents exemplary geometric tasks, categorised by skill, alongside corresponding evaluation indicators used to assess mathematical proficiency. A representation form instrument was utilised as a data collection instrument regarding the type type of representation used in presenting of the geometry ideas in each textbook sries (see section 3.3). A rubric was utilised to capture the type of representation, with a designated space for each task. This rubric provided a space for documenting the representation format for the tasks. To make the captured data clear, we divided the rubric into four distinct sections: pure mathematics, verbal, visual, and combined forms of problem presentation.Data analysisThis study used a qualitative deductive content analysis (QDCA) approach to analyse the collected data. In a DCA, research findings are allowed to emerge from the textbooks examined (Pertiwi & Wahidin, 2020). A deductive approach was appropriate because the codes and categories were drawn from theoretical considerations, not from the text itself (Islam & Asadullah, 2018).The researcher created nine Excel files, each with a four-column table, as shown in the figure below. Every column represents the type of mathematical activity category: Representation (R), Calculation (C), Interpretation (I), and Argumentation (A). Based on the Gracin (2018) framework, the researcher and two scorers read every worked example task and exercise task in each textbook examined in this study, extracted the mathematical activity required to complete the task successfully, and recorded it in the corresponding Excel file. If the tasks required more than one activity, the researcher considered the one that was dominantly required by the task author. The figure below shows the Excel sheet used to score the mathematical tasks for this study. To examine the geometric thinking embedded in textbook tasks, a comprehensive analysis framework was employed. This involved utilising a rubric to categorise tasks according to their corresponding geometric thinking levels, spanning from Level 0 to Level 4. For instance, tasks requiring students to define properties of a geometric figure were classified as informal deduction, whereas tasks demanding formal proofs were coded as formal deduction.The analysis process commenced with a meticulous review of worked examples and exercise tasks to identify the embedded level of geometric thinking. Subsequently, Excel tables were utilised to record the geometry levels present in Euclidean geometry tasks, and their frequencies were calculated. The results, which highlighted the predominant levels in the textbook series, were then subjected to in-depth analysis. This study classified each task based on the dimensions of Zhu and Fan's (2006) answer forms and subsequently coded the problem as depicted in Figure 4.13. In this study, the researcher conducted the process of classifying the tasks based on the answers to the question forms by reading the task questions and coding them as either open-ended or closed-ended problems.The researcher examined the types of tasks within the Euclidean geometry content in terms of their representation form and contextual features. This study used Zhu and Fan's (2006) framework to classify and code Euclidean geometry tasks found in textbooks. This study analysed the following classification of tasks: "Pure mathematical (R1), verbal (R2), visual (R3), and combined form (R4), based on Zhu and Fan's (2006) theoretical framework. In particular, each task was analysed against these representation-type categories in each textbook. An Excel table, as shown in the figure above, recorded the analysis of the representation forms.To investigate the contextual features of mathematical tasks, the researcher systematically collected tasks from each textbook and created an Excel sheet to score the type of context presented in each problem. Zhu and Fan's (2006) theoretical framework provided the foundation for categorising and coding tasks, enabling a comprehensive analysis. This study classified the tasks into two distinct categories: Zhu and Fan (2006) define application problems (C1) as tasks presented in real-life situations, illustrating practical applications of mathematical concepts. Non-application problems (C2) are tasks that lack context and solely concentrate on mathematical procedures and calculations. We coded tasks presented in situations mirroring real-life scenarios as application tasks and tasks lacking context as non-application tasks. The coded data was meticulously counted, and the frequencies were recorded in tables using Microsoft Excel, as depicted in Figure 4.13. This systematic analysis facilitated a nuanced understanding of the contextual features of mathematical tasks across the examined textbooks. This study used the CAPS Mathematics guidelines as the foundation for developing an OTL analytical tool to classify the mathematical content. The CAPS Mathematics analytical tool encompasses the content areas that students should master in all grades. Next, I outlined the OTL categories, offering comprehensive details on the interpretation and analysis of the data. To analyse the data, I used a rubric for each textbook series. The researchers conducted a thorough review of each textbook task, utilising the CAPS Mathematics document as a benchmark to
US Test Preparation Market Size 2025-2029
US test preparation market size is forecast to increase by USD 18.4 billion, at a CAGR of 7.9% between 2024 and 2029.
US Test Preparation Market is experiencing significant growth, driven by the increasing emphasis on online mode of test preparation and technological advances in test preparation services. With the shift towards digital learning, test preparation companies are leveraging technology to offer flexible, accessible, and personalized solutions to students. This trend is particularly prominent in regions with a high prevalence of online education and remote learning. However, regulatory hurdles impact adoption of test preparation services in certain markets, as governments and educational institutions grapple with issues of standardization and quality control. Additionally, the availability of open-source test preparation materials poses a challenge to market players, as students increasingly turn to free resources to prepare for exams. Educational institutions utilize cloud-based Learning Management Systems (LMS) like Talent LMS to deliver test prep courses.
To capitalize on market opportunities and navigate these challenges effectively, test preparation companies must focus on offering differentiated value propositions, such as personalized coaching, interactive learning tools, and comprehensive study resources. By staying abreast of regulatory developments and collaborating with educational institutions, these companies can build trust and credibility, ensuring long-term growth in the Test Preparation Market.
What will be the size of the US Test Preparation Market during the forecast period?
Request Free Sample
US test preparation market is dynamic and diverse, encompassing various elements to help students excel in assessments. Content creation plays a pivotal role, with test preparation forums, blogs, and workshops serving as valuable resources for learners. Curriculum development in test preparation focuses on enhancing reading comprehension, mathematics skills, writing skills, cognitive skills, time management, test analysis, critical thinking, and language proficiency. Test preparation research and innovation continue to shape the industry, with an emphasis on addressing the unique needs of different learning styles. Test preparation events, seminars, and webinars offer opportunities for score improvement and stress management. The application process and study skills are also crucial aspects of test preparation, ensuring test-takers are well-equipped to navigate the challenges of various assessments.
Science skills are increasingly gaining importance in test preparation, as they are essential for success in many fields. Test preparation communities foster collaboration and knowledge sharing among learners, fostering a supportive environment for test-takers.
How is this market segmented?
The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
End-user
Higher education
K-12
Product
University exams
Certifications exams
High school exams
Elementary exams
Others
Learning Method
Blended
Online
Geography
North America
US
By End-user Insights
The higher education segment is estimated to witness significant growth during the forecast period. The test preparation market experiences continuous growth due to the increasing number of students aiming for entrance exams and professional certifications. In the higher education sector, there is a surging demand for specialized courses, leading companies to offer certifications in fields like medicine, nursing, law, and wealth management. Test preparation institutes employ innovative technologies such as artificial intelligence and virtual reality to enhance learning experiences. Online test preparation resources, including practice tests, diagnostic testing, and adaptive learning, are increasingly popular.
Test prep professionals provide personalized learning strategies and test-taking techniques to students. Test prep apps, courses, and software are accessible on mobile learning platforms for convenience. Education technology and data analytics are integral to improving student performance and test scores. Test prep websites and online platforms offer test prep advice and resources to students dealing with test anxiety. Test prep services and providers cater to the diverse needs of students, ensuring they are well-prepared for entrance exams and college admissions. The test preparation industry is evolving, integrating technologies and learning strategies to keep a balance between effectiveness and affordability.
Get a glance at the market share of various segments Request Free Sample
The Higher educati
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This spreadsheet contains anonymised raw differences scores (pre- to post-intervention) obtained from 17 UK college students, via three tests of cognitive function (Simon Task, Stroop Test and Task Switching Paradigm) and a mood questionnaire (Activation-Deactivation Adjective Check List; AD ACL), to illustrate pre-to-post-intervention changes in mood and executive function. A figure to illustrate changes in mood across time, relative to baseline, is also included in the Mood sheet.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Hypothetically, a student could attend a class, listen to lectures, and pass the class without knowing or interacting with other students. What happens to the network when the classroom expectations change? For example, there is a coursework expectation that students exchange contact information, or the instructor uses collaborative learning practices. Or what if the principal investigator (PI) of a scientific team goes on a sabbatical? This study uses the framework of classrooms because of their relatability across science. We asked how do different instructor coursework expectations change network structures within a classroom or other learning environments. A social network survey was administered at the start and end of the semester (pre- and post-test) in six university sociology classrooms to explore how expectations impacted the communication and learning networks. We found practical changes in course expectations impact the communication and learning networks, suggesting that instructors, facilitators, and others could be the archintorÔ (architect+instructor+facilitator) of the network. Understanding that expectations can impact a network’s structure marks a paradigm shift in educational assessment approaches. If the archintorÔ has identified the “optimal” network structure, then their task is to design expectations that result in specific interactions that ultimately improve student achievement and success. This work provides recommendations for classroom archintorsÔ to create the most impactful classroom networks. Future research should extend beyond education and classroom networks and identify the best or desired networks in other areas like public policy, urban planning, and more. If these “optimal” networks were identified, an archintorÔ could design a social network to solve wicked problems, manage a crisis, and create social change. Methods Data was collected on hardcopy surveys. Data was input to Microsoft excel and saved in CVS files.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Research in assessing the Mental Health Problems (MHPs), e.g., stress, anxiety, and depression of university students has had much interest worldwide for the last decade. This article provides a large and comprehensive dataset concerning the MHPs of 2028 students from 15 top-ranked universities in Bangladesh, including 9 government/public universities and 6 private universities. To collect the data, the GAD-7 (for Anxiety), PSS-10 (for Stress), and PHQ-9 (for Depression) models are adopted to reflect equivalent academic perspectives. Additionally, student sociodemographic data are collected. The adoption of these three models are done by a team of five professors and a student psychologist to best capture the academic and socio-demographic factors that influence MHPs among university students. To conduct the survey, a google form is developed and circulated among the 15 faculty representatives from the participating universities who further circulated and conducted the survey with the students. Collected data is evaluated to ensure the sufficiency of sample size, and internal consistency and reliability of the response. Furthermore, the levels of anxiety, stress, and depression are calculated using the data to demonstrate its' applicability. This dataset can be used to measure the trajectory of students’ the mental and psychosocial stressors, to adopt required mental health and counselling services, and to conduct data intensive Machine Learning (ML) model development to predictive MPH assessment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Worked Example. Worked example in excel for the group featured in Fig. 3 The rawdata tab contains the raw score (out of six) awarded by each individual to each other individual (individuals’ names are replaced by majuscule Roman or Greek letters) in blue tinted cells. These are converted to a modified score (orange tinted cells) by scaling by the donor’s overall mean donated score (his or her ‘generosity’). The processed tab converts these to a final score out of 100 for each student. (XLSX 43 kb)
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Excel work book with data collected and collated from a speech-in-noise test, and associated statistical analyses.
Background music is cited as one of the four main problems that listeners complain about with regards to foreground speech audibility in broadcast audio. Yet broadcasters, such as the BBC, only provide limited guidance to content producers regarding background music. For example, turn the background music down by 3 dB when co-present with foreground speech, and avoid lyrics and heavily percussive beats.
This quantitative, subjective listening experiment investigated whether or not background music arrangement and tempo have any effect on foreground speech intelligibility, such that additional broadcasting guidelines can be written if there are any genuine effects.
Full details of the listening experiment, results and analyses are reported in the PhD thesis by P. Demonte (2022).
KEY
5 x background music pieces (created with Apple Loops in Garage Band) - M1: legato string quartet - M2: solo cello; single note in a bowed, staccato style - M3: cello + lightly percussive instrumentation - M4: cello + heavily percussive instrumentation
3 x tempi - T1: 60 beats per minute (BPM) - T2: 100 bpm - T3: 140 bpm
Control condition: M5_T0 - purely energetic masking noise of speech-shaped noise - for comparison against music arrangement effects.
This speech-in-noise test used the R-SPIN speech corpus, which contains end-of-sentence target words in two semantic levels:
2 x spoken sentence semantic levels - HP: high predictability, e.g. "His plan meant taking a big RISK." - LP: low predictability, e.g. "He wants to talk about the RISK."
PID = (anonymised) participant ID #
Spreadsheet pages
total_CWS: Split into several tables, including:
Music_Tempo_Pred: Word recognition percentages by participant and combination of the independent variables (music arrangement, tempo, and semantic level of sentence predictability), excluding the control conditions: M5_T0_HP and M5_T0_LP. Statistical analyses conducted using IBM's SPSS, including: - checks of the criteria for using 3-way Repeated Measures ANOVA;
Since not all of the criteria for use of 3-way RMANOVA are fulfilled, and the outcomes of the non-parametric testing were not useful, attempts were also made to transform the data (square root, squared, and arcsine transformations) and statistically re-analyse them. See spreadsheet pages: - SQRT_Transformed_MTP - ^2_Transformed_MTP - Arcsine_Transformed_MTP
The spreadsheet pages thereafter group the data in different ways to do 2-way and 1-way RMANOVA statistical analyses: - Tempo-Pred: summation across all background music pieces; - Music: summation across all tempi and semantic levels; - Tempo: summation across all music pieces and semantic levels; - SentencePredictability: summation across all music pieces and tempi
The final page in this Excel work book - 'Deleted_Test' - contains data that were collected in an initial version of the listening experiment but not used towards the thesis. A quality check revealed that although all participants had completed the same total number of trials, there had been an imbalance in the number of trials per combination of independent variable. The problem was rectified in order to then conduct the listening experiment correctly. These 'Deleted_Test' data have neverthess been retained on this page in the Excel work book such that a researcher with more in-depth knowledge of other statistical methods may one day be able to analyse them for comparison.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MISTO scoring template. After obtaining MISTO responses either through the online survey or the MISTO video scoring workbook, this Excel template can be used to calculate MISTO and MISTO subcategory scores for each perspective measured. Note: This template is designed for use with the MISTO question set (not the full MIST question set). (XLSX 2499 kb)
Not seeing a result you expected?
Learn how you can add new datasets to our index.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Dataset Overview: This dataset pertains to the examination results of students who participated in a series of academic assessments at a fictitious educational institution named "University of Exampleville." The assessments were administered across various courses and academic levels, with a focus on evaluating students' performance in general management and domain-specific topics.
Columns: The dataset comprises 12 columns, each representing specific attributes and performance indicators of the students. These columns encompass information such as the students' names (which have been anonymized), their respective universities, academic program names (including BBA and MBA), specializations, the semester of the assessment, the type of examination domain (general management or domain-specific), general management scores (out of 50), domain-specific scores (out of 50), total scores (out of 100), student ranks, and percentiles.
Data Collection: The examination data was collected during a standardized assessment process conducted by the University of Exampleville. The exams were designed to assess students' knowledge and skills in general management and their chosen domain-specific subjects. It involved students from both BBA and MBA programs who were in their final year of study.
Data Format: The dataset is available in a structured format, typically as a CSV file. Each row represents a unique student's performance in the examination, while columns contain specific information about their results and academic details.
Data Usage: This dataset is valuable for analyzing and gaining insights into the academic performance of students pursuing BBA and MBA degrees. It can be used for various purposes, including statistical analysis, performance trend identification, program assessment, and comparison of scores across domains and specializations. Furthermore, it can be employed in predictive modeling or decision-making related to curriculum development and student support.
Data Quality: The dataset has undergone preprocessing and anonymization to protect the privacy of individual students. Nevertheless, it is essential to use the data responsibly and in compliance with relevant data protection regulations when conducting any analysis or research.
Data Format: The exam data is typically provided in a structured format, commonly as a CSV (Comma-Separated Values) file. Each row in the dataset represents a unique student's examination performance, and each column contains specific attributes and scores related to the examination. The CSV format allows for easy import and analysis using various data analysis tools and programming languages like Python, R, or spreadsheet software like Microsoft Excel.
Here's a column-wise description of the dataset:
Name OF THE STUDENT: The full name of the student who took the exam. (Anonymized)
UNIVERSITY: The university where the student is enrolled.
PROGRAM NAME: The name of the academic program in which the student is enrolled (BBA or MBA).
Specialization: If applicable, the specific area of specialization or major that the student has chosen within their program.
Semester: The semester or academic term in which the student took the exam.
Domain: Indicates whether the exam was divided into two parts: general management and domain-specific.
GENERAL MANAGEMENT SCORE (OUT of 50): The score obtained by the student in the general management part of the exam, out of a maximum possible score of 50.
Domain-Specific Score (Out of 50): The score obtained by the student in the domain-specific part of the exam, also out of a maximum possible score of 50.
TOTAL SCORE (OUT of 100): The total score obtained by adding the scores from the general management and domain-specific parts, out of a maximum possible score of 100.