Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Initial data analysis checklist for data screening in longitudinal studies.
As part of the planning for stimulation of the Newberry Volcano Enhanced Geothermal Systems (EGS) Demonstration project in Oregon, a high-resolution borehole televiewer (BHTV) log was acquired using the ALT ABI85 BHTV tool in the slightly deviated NWG 55-29 well. The image log reveals an extensive network of fractures in a conjugate set striking approximately N-S and dipping 50 deg that are well oriented for normal slip and are consistent with surface-breaking regional normal faults in the vicinity. Similarly, breakouts indicate a consistent minimum horizontal stress, Shmin, azimuth of 092.3 +/- 17.3 deg. In conjunction with a suite of geophysical logs, a model of the stress magnitudes constrained by the width of breakouts at depth and a model of rock strength independently indicates a predominantly normal faulting stress regime.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The study is mixed methods research.Quantitative Data: Datasets are of sociodemographic data of women accessing cervical cancer screening at a woman's clinic. The datasets and do files can be opened in analytic software, STATA . Qualitative data: Qualitative data consists of preliminary analysis tables and reflective notes from in-depth interviews with female patients and healthcare providers. .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset for the paper "Are data papers cited as research data? Preliminary analysis on interdisciplinary data paper citations" submitted to iConference 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead of
urban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BACKGROUND: The Health Insurance Institute of Slovenia (ZZZS) began publishing service-related data in May 2023, following a directive from the Ministry of Health (MoH). The ZZZS website provides easily accessible information about the services provided by individual doctors, including their names. The user is provided relevant information about the doctor's employer, including whether it is a public or private institution. The data provided is useful for studying the public system's operations and identifying any errors or anomalies.
METHODS: The data for services provided in May 2023 was downloaded and analysed. The published data were cross-referenced using the provider's RIZDDZ number with the daily updated data on ambulatory workload from June 9, 2023, published by ZZZS. The data mentioned earlier were found to be inaccurate and were improved using alerts from the zdravniki.sledilnik.org portal. Therefore, they currently provide an accurate representation of the current situation. The total number of services provided by each provider in a given month was determined by adding up the individual services and then assigning them to the corresponding provider.
RESULTS: A pivot table was created to identify 307 unique operators, with 15 operators not appearing in both lists. There are 66 public providers, which make up about 72% of the contractual programme in the public system. There are 241 private providers, accounting for about 28% of the contractual programme. In May 2023, public providers accounted for 69% (n=646,236) of services in the family medicine system, while private providers contributed 31% (n=291,660). The total number of services provided by public and private providers was 937,896. Three linear correlations were analysed. The initial analysis of the entire sample yielded a high R-squared value of .998 (adjusted R-squared value of .996) and a significant level below 0.001. The second analysis of the data from private providers showed a high R Squared value of .904 (Adjusted R Squared = .886), indicating a strong correlation between the variables. Furthermore, the significance level was < 0.001, providing additional support for the statistical significance of the results. The third analysis used data from public providers and showed a strong level of explanatory power, with a R Squared value of 1.000 (Adjusted R Squared = 1.000). Furthermore, the statistical significance of the findings was established with a p-value < 0.001.
CONCLUSION: Our analysis shows a strong linear correlation between contract size of the program signed and number services rendered by family medicine providers. A stronger linear correlation is observed among providers in the public system compared to those in the private system. Our study found that private providers generally offer more services than public providers. However, it is important to acknowledge that the evaluation framework for assessing services may have inherent flaws when examining the data. Prescribing a prescription and resuscitating a patient are both assigned a rating of one service. It is crucial to closely monitor trends and identify comparable databases for pairing at the secondary and tertiary levels.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Preliminary results from an ongoing analysis of citation practices in quantitative studies that analyze data from Altmetric.com (supporting data for our submission to http://altmetrics.org/altmetrics20/).Data sources: https://web.archive.org/web/20200929163109/https://www.altmetric.com/blog/altmetrics-research-2019/ and https://web.archive.org/web/20200929163146/https://www.altmetric.com/blog/altmetric-supported-research-2018-in-review/The dataset shows that only 32% of quantitative studies that build upon Altmetric’s attention score mention the day on which the data was collected, and that 50% mention no version information at all.
First satellite images of areas affected by flooding in El Salvador in November 2009. Using Formosat-2 satellite images*, this preliminary analysis identified the following populated areas as affected: San Vicente, Verapaz, Tepetitan and Guadalupe. Floods were detected along Quebrada Seca, Quebrada la Quebradona, Quebrada Pozo Caliente, Quebrada el Amante Blanco, Rio Acahuapa, Quebrada Paso Hondo, and Quebrada Ticuisa. Formosat images © 2010 Dr. Cheng-Chien Liu, National Cheng Kung University; Dr. An-Ming Wu, National Space Organization, Taiwan; Global Earth Observation and Data Analysis Center (GEODAC), Taiwan.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains article metadata and information about Open Science Indicators for approximately 139,000 research articles published in PLOS journals from 1 January 2018 to 30 March 2025 and a set of approximately 28,000 comparator articles published in non-PLOS journals. This is the tenth release of this dataset, which will be updated with new versions on an annual basis.This version of the Open Science Indicators dataset shares the indicators seen in the previous versions as well as fully operationalised protocols and study registration indicators, which were previously only shared in preliminary forms. The v10 dataset focuses on detection of five Open Science practices by analysing the XML of published research articles:Sharing of research data, in particular data shared in data repositoriesSharing of codePosting of preprintsSharing of protocolsSharing of study registrationsThe dataset provides data and code generation and sharing rates, the location of shared data and code (whether in Supporting Information or in an online repository). It also provides preprint, protocol and study registration sharing rates as well as details of the shared output, such as publication date, URL/DOI/Registration Identifier and platform used. Additional data fields are also provided for each article analysed. This release has been run using an updated preprint detection method (see OSI-Methods-Statement_v10_Jul25.pdf for details). Further information on the methods used to collect and analyse the data can be found in Documentation.Further information on the principles and requirements for developing Open Science Indicators is available in https://doi.org/10.6084/m9.figshare.21640889.Data folders/filesData Files folderThis folder contains the main OSI dataset files PLOS-Dataset_v10_Jul25.csv and Comparator-Dataset_v10_Jul25.csv, which containdescriptive metadata, e.g. article title, publication data, author countries, is taken from the article .xml filesadditional information around the Open Science Indicators derived algorithmicallyand the OSI-Summary-statistics_v10_Jul25.xlsx file contains the summary data for both PLOS-Dataset_v10_Jul25.csv and Comparator-Dataset_v10_Jul25.csv.Documentation folderThis file contains documentation related to the main data files. The file OSI-Methods-Statement_v10_Jul25.pdf describes the methods underlying the data collection and analysis. OSI-Column-Descriptions_v10_Jul25.pdf describes the fields used in PLOS-Dataset_v10_Jul25.csv and Comparator-Dataset_v10_Jul25.csv. OSI-Repository-List_v1_Dec22.xlsx lists the repositories and their characteristics used to identify specific repositories in the PLOS-Dataset_v10_Jul25.csv and Comparator-Dataset_v10_Jul25.csv repository fields.The folder also contains documentation originally shared alongside the preliminary versions of the protocols and study registration indicators in order to give fuller details of their detection methods.Contact details for further information:Iain Hrynaszkiewicz, Director, Open Research Solutions, PLOS, ihrynaszkiewicz@plos.org / plos@plos.orgLauren Cadwallader, Open Research Manager, PLOS, lcadwallader@plos.org / plos@plos.orgAcknowledgements:Thanks to Allegra Pearce, Tim Vines, Asura Enkhbayar, Scott Kerr and parth sarin of DataSeer for contributing to data acquisition and supporting information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A supplementary report for the paper "A Preliminary Analysis on the Effect of Randomness in a CEGAR Framework" by Ákos Hajdu and Zoltán Micskei, presented at the 25th PhD Mini-Symposium (2018), organized by the Department of Measurement and Information Systems at the Budapest University of Technology and Economics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Preliminary Parcels’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/86b4e1d0-2e40-4042-8e91-be7efca1015f on 27 January 2022.
--- Dataset description provided by original source is as follows ---
--- Original source retains full ownership of the source dataset ---
This project delves into the workflow and results of regression models on monthly and daily utility data (meter readings of electricity consumption), outlining a process for screening and gathering useful results from inverse models. Energy modeling predictions created in Building Energy Optimization software (BEopt) Version 2.0.0.3 (BEopt 2013) are used to infer causes of differences among similar homes. This simple data analysis is useful for the purposes of targeting audits and maximizing the accuracy of energy savings predictions with minimal costs. The data for this project are from two adjacent military housing communities of 1,166 houses in the southeastern United States. One community was built in the 1970s, and the other was built in the mid-2000s. Both communities are all electric; the houses in the older community were retrofitted with ground source heat pumps in the early 1990s, and the newer community was built to an early version of ENERGY STAR with air source heat pumps. The houses in the older community will receive phased retrofits (approximately 10 per month) in the coming years. All houses have had daily electricity metering readings since early 2011. This project explores a dataset at a simple level and describes applications of a utility data normalization. There are far more sophisticated ways to analyze a dataset of dynamic, high resolution data; however, this report focuses on simple processes to create big-picture overviews of building portfolios as an initial step in a community-scale analysis. TO4 9.1.2: Comm. Scale Military Housing Upgrades
The results of the class separation of Kentucky and Colorado shale oils indicate that the separation scheme developed is effective in separating whole shale oils into their component saturate, olefinic, aromatic, and polar fractions. The effectiveness of the separation is indicated by the proton NMR and FTIR analysis of the fractions, while the reproducibility is given by the agreement of duplicate runs. The types of problems encountered, such as column channeling, solvent schemes, elution rates, elution volumes, sample transfer, and adsorbent preparation were corrected and/or modified to provide maximum separation while at the same time minimizing separation time. Nuclear magnetic resonance has proved valuable in analyzing the distribution of hydrogen in the individual fractions. This will allow optimization of operating conditions to yield the desired products. Carbon (/sup 13/C) NMR of the fractions will provide additional structural information on the fractions, such as average carbon chain lengths, branched/straight chain and aliphatic/aromatic ratios, which will complement the information gained through proton NMR. 3 refs., 4 figs., 3 tabs.
No Publication Abstract is Available
no abstract provided
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.
no abstract provided
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Diagram of the process starting with collating the survey data files, these were pre-processed for analysis as shown with an image of a white baby with respondent text as well as the demographic details, a sanpshot shows how these were analysed using a miro board with lines and sticky notes, then these analysis was visualised as data portraits, data quilts and quilted bar charts.
No Publication Abstract is Available
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Data Collection and Data Pre-Processing of Data Analytics Skills for Managers, 5th Semester , Bachelor in Business Administration 2020 - 2021
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Initial data analysis checklist for data screening in longitudinal studies.