Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By City of Baltimore [source]
This Baltimore City Child and Family Health Indicators dataset provides us with crucial information that can support the health and well-being of Baltimore City residents. It contains 13 indicators such as low birth weight, prenatal visits, teen births, and more. This data is sourced from the Maryland Department of Health & Mental Hygiene (DHMH), Baltimore Substance Abuse Systems (BSAS), theBaltimore City Health Department, and the US Census Bureau. Through this data set we can gain a better understanding of how Baltimore City citizens’ health compares to other areas and how it has changed over time. By investigating this dataset we are given an opportunity to create potential strategies for providing better care for our community. With discoveries from these indicators, together as a city we can bring about lasting change in protecting public health within Baltimore
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides valuable information about the health and wellbeing of children and families in Baltimore City in 2010. The data is organized by CSA (Census Statistical Area) and includes stats on term births, low birth weight births, prenatal visits, teen births, and lead testing. This dataset can be used to analyze trends in children's health over time as well as identify potential areas that need more attention or resources.
To use this dataset: - Read through the data dictionary to understand what each column represents.
- Choose which columns you would like to explore further.
- Filter or subset the data as you see fit then visualize it with graphs or maps to better understand how conditions vary across neighborhoods in Baltimore City.
- Consider comparing the data from this year with prior years if available for deeper analysis of changes over time.
- Look for correlations among columns that could help explain disparities between neighborhoods and create strategies for improving outcomes through policy interventions or other programs designed specifically for those areas needs
- Mapping health disparities in high-risk areas to target public health interventions.
- Identifying neighborhoods in need of additional resources for prenatal care, infant care, and lead testing and create specific programs to address these needs.
- Creating an online dashboard that displays real time data on Baltimore City’s population health indicators such as birth weight, teenage pregnancies, and lead poisoning for the public to access easily
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: BNIA_Child_Fam_Health_2010.csv | Column name | Description | |:---------------|:----------------------------------------------------------| | the_geom | Geometry of the Census Statistical Area (CSA) (Geometry) | | CSA2010 | Census Statistical Area (CSA) (String) | | termbir10 | Total number of term births in 2010 (Integer) | | birthwt10 | Total number of low birth weight births in 2010 (Integer) | | prenatal10 | Total number of prenatal visits in 2010 (Integer) | | teenbir10 | Total number of teen births in 2010 (Integer) | | leadtest10 | Total number of lead tests conducted in 2010 (Integer) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit City of Baltimore.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was primarily designed for the Helsinki Tomography Challenge 2022 (HTC2022), but it can be used for generic algorithm research and development in 2D CT reconstruction.
The dataset contains 2D tomographic measurements, i.e., sinograms and the affiliated metadata containing measurement geometry and other specifications. The sinograms have already been pre-processed with background and flat-field corrections, and compensated for a slightly misaligned center of rotation in the cone-beam computed tomography scanner. The log-transforms from intensity measurements to attenuation data have also been already computed. The data has been stored as MATLAB structs and saved in .mat file format.
The purpose of HTC2022 was to develop algorithms for limited angle tomography. The challenge data consists of tomographic measurements of two sets of plastic phantoms with a diameter of 7 cm and with holes of differing shapes cut into them. The first set is the teaching data, containing five training phantoms. The second set consists of 21 test phantoms used in the challenge to test algorithm performance. The test phantom data was released after the competition period ended.
The training phantoms were designed to facilitate algorithm development and benchmarking for the challenge itself. Four of the training phantoms contain holes. These are labeled ta, tb, tc, and td. A fifth training phantom is a solid disc with no holes. We encourage subsampling these datasets to create limited data sinograms and comparing the reconstruction results to the ground truth obtainable from the full-data sinograms. Note that the phantoms are not all identically centered.
The teaching data includes the following files for each phantom:
Also included in the teaching dataset is a MATLAB example script for how to work with the CT data.
The challenge test data is arranged into seven different difficulty levels, labeled 1-7, with each level containing three different phantoms, labeled A-C. As the difficulty level increases, the number of holes increases and their shapes become increasingly complex. Furthermore, the view angle is reduced as the difficulty level increases, starting with a 90 degree field of view at level 1, and reducing by 10 degrees at each increasing level of difficulty. The view-angles in the challenge data will not all begin from 0 degrees.
The test data includes the following files for each phantom:
Also included in the test dataset is a collage in .PNG format, showing all the ground truth segmentation images and the photographs of the phantoms together.
As the orientation of CT reconstructions can depend on the tools used, we have included the example reconstructions for each of the phantoms to demonstrate how the reconstructions obtained from the sinograms and the specified geometry should be oriented. The reconstructions have been computed using the filtered back-projection algorithm (FBP) provided by the ASTRA Toolbox.
We have also included segmentation examples of the reconstructions to demonstrate the desired format for the final competition entries. The segmentation images for obtained by the following steps:
1) Set all negative pixel values in the reconstruction to zero.
2) Determine a threshold level using Otsu's method.
3) Globally threshold the image using the threshold level.
4) Perform a morphological closing on the image using a disc with a radius of 3 pixels.
The competitors were not obliged to follow the above procedure, and were encouraged to explore various segmentation techniques for the limited angle reconstructions.
For getting started with the data, we recommend the following MATLAB toolboxes:
HelTomo - Helsinki Tomography Toolbox
https://github.com/Diagonalizable/HelTomo/
The ASTRA Toolbox
https://www.astra-toolbox.com/
Spot – A Linear-Operator Toolbox
https://www.cs.ubc.ca/labs/scl/spot/
Using the above toolboxes for the Challenge was by no means compulsory: the metadata for each dataset contains a full specification of the measurement geometry, and the competitors were free to use any and all computational tools they want to in computing the reconstructions and segmentations.
All measurements were conducted at the Industrial Mathematics Computed Tomography Laboratory at the University of Helsinki.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
The purpose of the data environment is to provide multi-modal data and contextual information (weather and incidents) that can be used to research and develop Intelligent Transportation System applications. This data set contains the following data for the two months of September and October 2011 in Pasadena, California: Highway network data, Demand data, Sample mobile sightings provided for a two-hour period, provided by AirSage (see note 1 below), Network performance data (measured and forecast), Work zone data, Weather data, and Changeable message sign data.
This legacy dataset was created before data.transportation.gov and is only currently available via the attached file(s). Please contact the dataset owner if there is a need for users to work with this data using the data.transportation.gov analysis features (online viewing, API, graphing, etc.) and the USDOT will consider modifying the dataset to fully integrate in data.transportation.gov.
Studies utilizing Global Positioning System (GPS) telemetry rarely result in 100% fix success rates (FSR). Many assessments of wildlife resource use do not account for missing data, either assuming data loss is random or because a lack of practical treatment for systematic data loss. Several studies have explored how the environment, technological features, and animal behavior influence rates of missing data in GPS telemetry, but previous spatially explicit models developed to correct for sampling bias have been specified to small study areas, on a small range of data loss, or to be species-specific, limiting their general utility. Here we explore environmental effects on GPS fix acquisition rates across a wide range of environmental conditions and detection rates for bias correction of terrestrial GPS-derived, large mammal habitat use. We also evaluate patterns in missing data that relate to potential animal activities that change the orientation of the antennae and characterize home-range probability of GPS detection for 4 focal species; cougars (Puma concolor), desert bighorn sheep (Ovis canadensis nelsoni), Rocky Mountain elk (Cervus elaphus ssp. nelsoni) and mule deer (Odocoileus hemionus). Part 1, Positive Openness Raster (raster dataset): Openness is an angular measure of the relationship between surface relief and horizontal distance. For angles less than 90 degrees it is equivalent to the internal angle of a cone with its apex at a DEM location, and is constrained by neighboring elevations within a specified radial distance. 480 meter search radius was used for this calculation of positive openness. Openness incorporates the terrain line-of-sight or viewshed concept and is calculated from multiple zenith and nadir angles-here along eight azimuths. Positive openness measures openness above the surface, with high values for convex forms and low values for concave forms (Yokoyama et al. 2002). We calculated positive openness using a custom python script, following the methods of Yokoyama et. al (2002) using a USGS National Elevation Dataset as input. Part 2, Northern Arizona GPS Test Collar (csv): Bias correction in GPS telemetry data-sets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix acquisition. We found terrain exposure and tall over-story vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The model's predictive ability was evaluated using two independent data-sets from stationary test collars of different make/model, fix interval programming, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. The model training data are provided here for fix attempts by hour. This table can be linked with the site location shapefile using the site field. Part 3, Probability Raster (raster dataset): Bias correction in GPS telemetry datasets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix aquistion. We found terrain exposure and tall overstory vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The models predictive ability was evaluated using two independent datasets from stationary test collars of different make/model, fix interval programing, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. We evaluated GPS telemetry datasets by comparing the mean probability of a successful GPS fix across study animals home-ranges, to the actual observed FSR of GPS downloaded deployed collars on cougars (Puma concolor), desert bighorn sheep (Ovis canadensis nelsoni), Rocky Mountain elk (Cervus elaphus ssp. nelsoni) and mule deer (Odocoileus hemionus). Comparing the mean probability of acquisition within study animals home-ranges and observed FSRs of GPS downloaded collars resulted in a approximatly 1:1 linear relationship with an r-sq= 0.68. Part 4, GPS Test Collar Sites (shapefile): Bias correction in GPS telemetry data-sets requires a strong understanding of the mechanisms that result in missing data. We tested wildlife GPS collars in a variety of environmental conditions to derive a predictive model of fix acquisition. We found terrain exposure and tall over-story vegetation are the primary environmental features that affect GPS performance. Model evaluation showed a strong correlation (0.924) between observed and predicted fix success rates (FSR) and showed little bias in predictions. The model's predictive ability was evaluated using two independent data-sets from stationary test collars of different make/model, fix interval programming, and placed at different study sites. No statistically significant differences (95% CI) between predicted and observed FSRs, suggest changes in technological factors have minor influence on the models ability to predict FSR in new study areas in the southwestern US. Part 5, Cougar Home Ranges (shapefile): Cougar home-ranges were calculated to compare the mean probability of a GPS fix acquisition across the home-range to the actual fix success rate (FSR) of the collar as a means for evaluating if characteristics of an animal’s home-range have an effect on observed FSR. We estimated home-ranges using the Local Convex Hull (LoCoH) method using the 90th isopleth. Data obtained from GPS download of retrieved units were only used. Satellite delivered data was omitted from the analysis for animals where the collar was lost or damaged because satellite delivery tends to lose as additional 10% of data. Comparisons with home-range mean probability of fix were also used as a reference for assessing if the frequency animals use areas of low GPS acquisition rates may play a role in observed FSRs. Part 6, Cougar Fix Success Rate by Hour (csv): Cougar GPS collar fix success varied by hour-of-day suggesting circadian rhythms with bouts of rest during daylight hours may change the orientation of the GPS receiver affecting the ability to acquire fixes. Raw data of overall fix success rates (FSR) and FSR by hour were used to predict relative reductions in FSR. Data only includes direct GPS download datasets. Satellite delivered data was omitted from the analysis for animals where the collar was lost or damaged because satellite delivery tends to lose approximately an additional 10% of data. Part 7, Openness Python Script version 2.0: This python script was used to calculate positive openness using a 30 meter digital elevation model for a large geographic area in Arizona, California, Nevada and Utah. A scientific research project used the script to explore environmental effects on GPS fix acquisition rates across a wide range of environmental conditions and detection rates for bias correction of terrestrial GPS-derived, large mammal habitat use.
The dataset was created by this notebook: https://www.kaggle.com/douglaskgaraujo/sentence-complexity-comparison-dataset
This data is a pairwise comparison of sentences, together with information about their relative complexity. The original dataset is from the CommonLit Readability Prize competition, and interested readers are referred there (especially the competitions' discussion forums) for more information on the data itself.
Important notice! As per that competition's rules, the license is as follows:
A. Data Access and Use. Competition Use and Non-Commercial & Academic Research: *You may access and use the Competition Data for non-commercial purposes only, including for participating in the Competition and on Kaggle.com forums, and for academic research and education. *The Competition Sponsor reserves the right to disqualify any participant who uses the Competition Data other than as permitted by the Competition Website and these Rules.
B. Data Security. You agree to use reasonable and suitable measures to prevent persons who have not formally agreed to these Rules from gaining access to the Competition Data. You agree not to transmit, duplicate, publish, redistribute or otherwise provide or make available the Competition Data to any party not participating in the Competition. You agree to notify Kaggle immediately upon learning of any possible unauthorized transmission of or unauthorized access to the Competition Data and agree to work with Kaggle to rectify any unauthorized transmission or access.
C. External Data. You may use data other than the Competition Data (“External Data”) to develop and test your Submissions. However, you will ensure the External Data is publicly available and equally accessible to use by all participants of the Competition for purposes of the competition at no cost to the other participants. The ability to use External Data under this Section 7.C (External Data) does not limit your other obligations under these Competition Rules, including but not limited to Section 11 (Winners Obligations).
This dataset is a pairwise comparison of each sentence in the CommonLit competition with 500 other randomly-matched sentences. Sentences are divided into a training and validation datasets before being matched randomly. The relative complexity of each sentence is measured, and features such as the distance between this score for both sentences, and a column indicating whether or not the first sentence's readability score is greater than or equal to the score of the second sentence.
Thank you for the organisers of this competition for providing this dataset.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistical comparison of multiple time series in their underlying frequency patterns has many real applications. However, existing methods are only applicable to a small number of mutually independent time series, and empirical results for dependent time series are only limited to comparing two time series. We propose scalable methods based on a new algorithm that enables us to compare the spectral density of a large number of time series. The new algorithm helps us efficiently obtain all pairwise feature differences in frequency patterns between M time series, which plays an essential role in our methods. When all M time series are independent of each other, we derive the joint asymptotic distribution of their pairwise feature differences. The asymptotic dependence structure between the feature differences motivates our proposed test for multiple mutually independent time series. We then adapt this test to the case of multiple dependent time series by partially accounting for the underlying dependence structure. Additionally, we introduce a global test to further enhance the approach. To examine the finite sample performance of our proposed methods, we conduct simulation studies. The new approaches demonstrate the ability to compare a large number of time series, whether independent or dependent, while exhibiting competitive power. Finally, we apply our methods to compare multiple mechanical vibrational time series.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Tablets can be used to facilitate systematic testing of academic skills. Yet, when using validated paper tests on tablet, comparability between the mediums must be established. In this dataset, comparability between a tablet and a paper version of a basic math skills test (HRT: Heidelberger Rechen Test 1–4) was investigated. Four of the five samples included in the current study covered a broad spectrum of schools regarding student achievement in mathematics, proportion of non-native students, parental educational levels, and diversity of ethnic background. The fifth sample, the intervention sample in the Apps-project, presented with similar characterstics except on mathematical achievement where they showed lower results. To examine the test-retest reliability of the tablet versions of HRT and the Math Battery several samples were tested twice on each measure in various contexts. To test the correlation between the paper and tablet version between HRT, the participants were tested on both paper and tablet versions of HRT using a counterbalanced design to avoid potential order effects. This sample is referred to as the Different formats sample. Finally, norms were collected for HRT, the Math Battery and the mathematical word problem-solving measure. This sample (called the Normative sample) was also use to investigate the correlation, or convergent validity, between HRT and Math Battery (third hypothesis). See article "Tablets instead of paper-based tests for young children? Comparability between paper and tablet versions of the mathematical Heidelberger Rechen Test 1-4" by Hassler Hallstedt (2018) for further information. The dataset was originally published in DiVA and moved to SND in 2024.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rigorous multiple comparison statistical tests, that these techniques can provide consistent improvements in predictive performance over single decision trees. However, within these methods, there is no clearly best-performing algorithm. This motivates a more in-depth investigation into the properties of random forests. We identify a set of parameters for the random forest that provide optimal performance across all the studied data sets. Additionally, the tree ensemble structure of the forest may provide an interpretable model, a considerable advantage over SVMs. We test this possibility and compare it with standard decision tree models.
Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem — (1) an R-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations. Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual disk access for only less than 5% of the observations. To the best of our knowledge, this is the first flexible MTS search algorithm capable of subsequence search on any subset of variables. Moreover, MTS subsequence search has never been attempted on datasets of the size we have used in this paper.
NSL-KDD is a data set suggested to solve some of the inherent problems of the KDD'99 data set . Although, this new version of the KDD data set still suffers from some of the problems discussed by McHugh and may not be a perfect representative of existing real networks, because of the lack of public data sets for network-based IDSs, we believe it still can be applied as an effective benchmark data set to help researchers compare different intrusion detection methods.
Furthermore, the number of records in the NSL-KDD train and test sets are reasonable. This advantage makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently, evaluation results of different research work will be consistent and comparable.
Data files
KDDTrain+.ARFF: The full NSL-KDD train set with binary labels in ARFF format
KDDTrain+.TXT: The full NSL-KDD train set including attack-type labels and difficulty level in CSV format
KDDTrain+_20Percent.ARFF: A 20% subset of the KDDTrain+.arff file
KDDTrain+_20Percent.TXT: A 20% subset of the KDDTrain+.txt file
KDDTest+.ARFF: The full NSL-KDD test set with binary labels in ARFF format
KDDTest+.TXT: The full NSL-KDD test set including attack-type labels and difficulty level in CSV format
KDDTest-21.ARFF: A subset of the KDDTest+.arff file which does not include records with difficulty level of 21 out of 21
KDDTest-21.TXT: A subset of the KDDTest+.txt file which does not include records with difficulty level of 21 out of 21
; cic@unb.ca.
https://www.icpsr.umich.edu/web/ICPSR/studies/26561/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/26561/terms
The Long Beach Longitudinal Study (LBLS) was created in 1978 to obtain normative data for the Schaie-Thurston Adult Mental Abilities Test (STAMAT). From 1994 to 2003 it was extended under the guiding principle that cognitive aging is a largely contextual phenomenon. Individual differences in abilities and change in those abilities over adulthood are associated not only with cognitive mechanisms, but with sociodemographic phenomena such as birth cohort, or gender, and within-individual characteristics, including health, affect, self-efficacy, personality, and other variables that impact health. This principle is reflected in the testing measures added to the original panel. Besides the original ability measures used by Schaie, the Life Complexity Inventory, has been included in all testing. Because these measures were included in the later generations of testing, independent and direct comparisons can be made with Seattle Longitudinal Study (ICPSR 00158) to replicate findings and to generalize longitudinal samples. Panel 1 The initial panel was sampled in 1978 and consisted of 65 adults aged 28-33 and 518 adults aged 55-84. This sample was tested using the STAMAT, as well as a 20-item list of common English nouns for testing free recall, and a brief essay to test text recall. In 1981, 264 participants from this sample were retested, 106 were again retested from 1994-1995, and 42 in 1997. Finally, 15 participants of the original sample were tested from 2000-2002 using additional tests adopted for the creation of a second panel, described below, as well as a test for measuring executive function. Panel 2 In 1994, a second panel of 630 participants aged 30-97, a third of which were over 80, was added to the study. The testing for this sample included multiple indices of list recall, text recall, working memory, perceptual speed, and vocabulary for structural equation modeling. Assessment of language, autobiographical memory, personality, depression, health, health behaviors and other measures were also incorporated into the study. In 1997, 352 members of this second panel were retested. From 2000-2002, 179 participants of this second panel completed the 1994-1995 measures, as well as several tests extending the battery to indices of executive function. In 2003, 133 participants were retested. Panel 3 A third sample was recruited during the 2000-2002 time frame consisting of 911 participants aged 30-98, again approximately a third of which were over the age of 80. In 2003, 513 members of this third panel were retested. Datasets The data are provided in 6 datasets. Panel 1 and 2 1978 - 2003 Longitudinal File Dataset 1 is a longitudinal file of data from Panel 1 for tests performed in 1978, 1981, 1994, 1997, and 2000-2002, and data from Panel 2 for tests performed in 1994, 1997, 2000-2002 and 2003. Panels 1 and 2 1994 STAMAT File Dataset 2 contains the STAMAT test variables for Panels 1 and 2. Panel 1 and 2 1994-2000 Master Data Longitudinal File Dataset 3 is a second longitudinal file containing the complete catalog of variables from Panels 1 and 2 for test performed in 1994, 1997 and 2000. Panel 2 Wave 1 1994 Cross File Dataset 4 contains variables for the first wave of Panel 2 which took place in 1994. Panel 2 Wave 2 1997 Cross File Dataset 5 contains variables for the second wave of Panel 2 which took place in 1997. Panel 3 Wave 1 2000 Master File Dataset 6 contains variables from the first wave of Panel 3 which took place in 2000.
This dataset consists of growth and yield data for each season when sunflower (Helianthus annuus L.) was grown for seed at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU) research weather station, Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). In each season, sunflower was grown on two large, precision weighing lysimeters, each in the center of a 4.44 ha square field. The square fields are themselves arranged in a larger square with four fields in four adjacent quadrants of the larger square. Fields and lysimeters within each field are thus designated northeast (NE), southeast (SE), northwest (NW), and southwest (SW). Sunflower was grown in the NE and SE fields. Irrigation was by linear move sprinkler system. Irrigation protocols described as full were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a neutron probe to 2.4-m depth in the field. Irrigation protocols described as deficit typically involved irrigations to establish the crop early in the season, followed by reduced or absent irrigations later in the season (typically in the later winter and spring). The growth and yield data include plant population density, height, plant row width, leaf area index, growth stage, total above-ground biomass, leaf and stem biomass, head mass (when present), kernel number, and final yield. Data are from replicate samples in the field and non-destructive (except for final harvest) measurements on the weighing lysimeters. In most cases yield data are available from both manual sampling on replicate plots in each field and from machine harvest. These datasets originate from research aimed at determining crop water use (ET), crop coefficients for use in ET-based irrigation scheduling based on a reference ET, crop growth, yield, harvest index, and crop water productivity as affected by irrigation method, timing, amount (full or some degree of deficit), agronomic practices, cultivar, and weather. Prior publications have focused on sunflower ET, crop coefficients, and crop water productivity. Crop coefficients have been used by ET networks. The data have utility for testing simulation models of crop ET, growth, and yield and have been used for testing, and calibrating models of ET that use satellite and/or weather data. Resources in this dataset:Resource Title: 2009 Bushland, TX, east sunflower growth and yield data. File Name: 2009_East_Sunflower_Growth_and_Yield.xlsxResource Description: This dataset consists of growth and yield data the 2009 season when sunflower (Helianthus annuus L.) was grown at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU) research weather station, Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Sunflower was grown on two large, precision weighing lysimeters, each in the center of a 4.44 ha square field. The two square fields were themselves arranged with one directly north of and contiguous with the other. Fields and lysimeters within each field were designated northeast (NE), and southeast (SE). Irrigation was by linear move sprinkler system. Irrigations were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a neutron probe to 2.4-m depth in the field. Irrigation management resulted in the crop being well watered and meeting reference “tall crop” conditions during periods before harvests. The growth and yield data include plant height, plant row width, leaf area index, growth stage, total above-ground biomass, leaf and stem biomass, and final yield. Data are from replicate samples in the field and non-destructive (except for final harvest) measurements on the weighing lysimeters. In most cases yield data are available from both manual sampling on replicate plots in each field and from machine harvest. There is a single spreadsheet for the east (NE and SE) lysimeters and fields. The spreadsheet contains tabs for data and corresponding tabs for data dictionaries. There are separate data tabs and corresponding dictionaries for plant growth during the season, and manual harvest from replicate plots in each field and from lysimeter surfaces, and machine (combine) harvest, An Introduction tab explains the tab names and contents, lists the authors, explains conventions, and lists some relevant references.Resource Title: 2011 Bushland, TX, east sunflower growth and yield data. File Name: 2011_East_Sunflower_Growth_and_Yield.xlsxResource Description: This dataset consists of growth and yield data the 2011 season when sunflower (Helianthus annuus L.) was grown at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU) research weather station, Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Sunflower was grown on two large, precision weighing lysimeters, each in the center of a 4.44 ha square field. The two square fields were themselves arranged with one directly north of and contiguous with the other. Fields and lysimeters within each field were designated northeast (NE), and southeast (SE). Irrigation was by linear move sprinkler system. Irrigations were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a neutron probe to 2.4-m depth in the field. Irrigation management resulted in the crop being well watered and meeting reference “tall crop” conditions during periods before harvests. The growth and yield data include plant height, plant row width, leaf area index, growth stage, total above-ground biomass, leaf and stem biomass, and final yield. Data are from replicate samples in the field and non-destructive (except for final harvest) measurements on the weighing lysimeters. In most cases yield data are available from both manual sampling on replicate plots in each field and from machine harvest. There is a single spreadsheet for the east (NE and SE) lysimeters and fields. The spreadsheet contains tabs for data and corresponding tabs for data dictionaries. There are separate data tabs and corresponding dictionaries for plant growth during the season, and manual harvest from replicate plots in each field and from lysimeter surfaces, and machine (combine) harvest, An Introduction tab explains the tab names and contents, lists the authors, explains conventions, and lists some relevant references.
https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
This dataset comprises five sets of data collected throughout of Alev Sönmez’s PhD Thesis project: Sönmez, A. (2024). Dancing the Vibe: Designerly Exploration of Group Mood in Work Settings. (Doctoral dissertation in review). Delft University of Technology, Delft, the Netherlands.
This thesis aims to contribute to the granular understanding of group mood by achieving three objectives,each representing a key research question in the project: (1) to develop a descriptive overview of nuanced group moods, (2) to develop knowledge and tools to effectively communicate nuanced group moods, and (3) to develop knowledge and insights to facilitate reflection on group mood. The research was guided by the following research questions: (1) What types of group moods are experienced in small work groups? (2) How can nuanced group moods be effectively communicated? (3) How can group mood reflection be facilitated?
This research was supported by VICI grant number 453-16-009 from The Netherlands Organization for Scientific Research (NWO), Division for the Social and Behavioral Sciences, awarded to Pieter M. A. Desmet.
The data is organized into folders corresponding to the chapters of the thesis. Each folder contains a README file with specific information about the dataset.
Capter_2_PhenomenologicalStudy: This dataset conssists of anonymized transcriptions of co-inquiry sessions where 5 small project groups described the group moods they experienced in their eight most recent meetings. Additonaly, we share the observation notes wwe collected in those meetings, the maps filled in during the co-inquiry sessions, the materials used to collect data, and the coding scheme used to analyze the group mood descriptions.
Chapter_3_ImageEvaluationStudy: This dataset consists of anonymized scores from 38 participants indicating the strength of the association between eight group mood–expressing images and 36 group mood qualities, along with their free descriptions of the group moods perceived in those images. Addtioanlly we share the questionnaire design, the eight images, and the data processing files (t-test, correspondence analysis outputs, free description coding, heat map).
Chapter_4_VideoEvaluationStudy: This dataset consists of anonymized scores from 40 participants indicating the strength of the association between eight group mood–expressing videos and 36 group mood qualities, along with their free descriptions of the group moods perceived in those videos. Addtioanlly we share the questionnaire design, and the data processing files (t-test, correspondence analysis outputs, free description coding, heat map) and data processing files to compare the image and video set (PCA output, and image-video HIT rate comparison table).
Chapter_5_CardsetInterventionStudy: This dataset consists of anonymized written responses from each of the 12 project teams, along with notes taken during a plenary session with these teams, evaluating the efficacy of the intervention on their group mood management.
Chapter_6_WorkshopEvaluationStudy: This dataset consists of Anonymized transcriptions of five small work teams reflecting on their lived group mood experiences following the steps of an embodiment workshop we designed, including their takeaways from the workshop and discussions evaluating the workshop's efficacy in stimulating reflection and the overall experience of the workshop.
All the data is anonymized by removing the names of individuals and institutions. However, the interviews contain details where participants shared personal information about themselves, colleagues, and company dynamics. Therefore, the data should be handled with extra care to ensure that participant privacy is not put in danger. Contact N.A.Romero@tudelft.nl (Natalia Romero Herrera) to request access to the dataset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The two datasets provided here were used to provide inter-rater reliability statistics for the application of a metaphor identification procedure to texts written in English. Three experienced metaphor researchers applied the Metaphor Identification Procedure Vrije Universiteit (MIPVU) to approximately 1500 words of text from two English-language newspaper articles. The dataset Eng1 contains each researcher’s independent analysis of the lexical demarcation and metaphorical status of each word in the sample. The dataset Eng2 contains a second analysis of the same texts by the same three researchers, carried out after a comparison of our responses in Eng 1 and a troubleshooting session where we discussed our differences. The accompanying R-code was used to produce the three-way and pairwise inter-rater reliability data reported in Section 3.2 of the chapter: How do I determine what comprises a lexical unit? The headings in both datasets are identical, although the order of the columns differs in the two files. In both datasets, each line corresponds to one orthographic word from the newspaper texts. Chapter Abstract: The first part of this chapter discusses various ‘nitty-gritty’ practical aspects about the original MIPVU intended for the English language. Our focus in these first three sections is on common pitfalls for novice MIPVU users that we have encountered when teaching the procedure. First, we discuss how to determine what comprises a lexical unit (section 3.2). We then move on to how to determine a more basic meaning of a lexical unit (section 3.3), and subsequently discuss how to compare and contrast contextual and basic senses (section 3.4). We illustrate our points with actual examples taken from some of our teaching sessions, as well as with our own study into inter-rater reliability, conducted for the purposes of this new volume about MIPVU in multiple languages. Section 3.5 shifts to another topic that new MIPVU users ask about – namely, which practical tools they can use to annotate their data in an efficient way. Here we discuss some tools that we find useful, illustrating how we utilized them in our inter-rater reliability study. We close this part with section 3.6, a brief discussion about reliability testing. The second part of this chapter adopts more of a bird’s-eye view. Here we leave behind the more technical questions of how to operationalize MIPVU and its steps, and instead respond more directly to the question posed above: Do we really have to identify every metaphor in every bit of our data? We discuss possible approaches for research projects involving metaphor identification, by exploring a number of important questions that all researchers need to ask themselves (preferably before they embark on a major piece of research). Section 3.7 weighs some of the differences between quantitative and qualitative approaches in metaphor research projects, while section 3.8 talks about considerations when it comes to choosing which texts to investigate, as well as possible research areas where metaphor identification can play a useful role. We close this chapter in section 3.9 with a recap of our ‘take-away’ points – that is, a summary of the highlights from our entire discussion.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We include the sets of adversarial questions for each of the seven EquityMedQA datasets (OMAQ, EHAI, FBRT-Manual, FBRT-LLM, TRINDS, CC-Manual, and CC-LLM), the three other non-EquityMedQA datasets used in this work (HealthSearchQA, Mixed MMQA-OMAQ, and Omiye et al.), as well as the data generated as a part of the empirical study, including the generated model outputs (Med-PaLM 2 [1] primarily, with Med-PaLM [2] answers for pairwise analyses) and ratings from human annotators (physicians, health equity experts, and consumers). See the paper for details on all datasets.
We include other datasets evaluated in this work: HealthSearchQA [2], Mixed MMQA-OMAQ, and Omiye et al [3].
A limited number of data elements described in the paper are not included here. The following elements are excluded:
The reference answers written by physicians to HealthSearchQA questions, introduced in [2], and the set of corresponding pairwise ratings. This accounts for 2,122 rated instances.
The free-text comments written by raters during the ratings process.
Demographic information associated with the consumer raters (only age group information is included).
Singhal, K., et al. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617 (2023).
Singhal, K., Azizi, S., Tu, T. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023). https://doi.org/10.1038/s41586-023-06291-2
Omiye, J.A., Lester, J.C., Spichak, S. et al. Large language models propagate race-based medicine. npj Digit. Med. 6, 195 (2023). https://doi.org/10.1038/s41746-023-00939-z
Abacha, Asma Ben, et al. "Overview of the medical question answering task at TREC 2017 LiveQA." TREC. 2017.
Abacha, Asma Ben, et al. "Bridging the gap between consumers’ medication questions and trusted answers." MEDINFO 2019: Health and Wellbeing e-Networks for All. IOS Press, 2019. 25-29.
Independent Ratings [ratings_independent.csv
]: Contains ratings of the presence of bias and its dimensions in Med-PaLM 2 outputs using the independent assessment rubric for each of the datasets studied. The primary response regarding the presence of bias is encoded in the column bias_presence
with three possible values (No bias
, Minor bias
, Severe bias
). Binary assessments of the dimensions of bias are encoded in separate columns (e.g., inaccuracy_for_some_axes
). Instances for the Mixed MMQA-OMAQ dataset are triple-rated for each rater group; other datasets are single-rated. Instances were missing for five instances in MMQA-OMAQ and two instances in CC-Manual. This file contains 7,519 rated instances.
Paired Ratings [ratings_pairwise.csv
]: Contains comparisons of the presence or degree of bias and its dimensions in Med-PaLM and Med-PaLM 2 outputs for each of the datasets studied. Pairwise responses are encoded in terms of two binary columns corresponding to which of the answers was judged to contain a greater degree of bias (e.g., Med-PaLM-2_answer_more_bias
). Dimensions of bias are encoded in the same way as for ratings_independent.csv
. Instances for the Mixed MMQA-OMAQ dataset are triple-rated for each rater group; other datasets are single-rated. Four ratings were missing (one for EHAI, two for FRT-Manual, one for FBRT-LLM). This file contains 6,446 rated instances.
Counterfactual Paired Ratings [ratings_counterfactual.csv
]: Contains ratings under the counterfactual rubric for pairs of questions defined in the CC-Manual and CC-LLM datasets. Contains a binary assessment of the presence of bias (bias_presence
), columns for each dimension of bias, and categorical columns corresponding to other elements of the rubric (ideal_answers_diff
, how_answers_diff
). Instances for the CC-Manual dataset are triple-rated, instances for CC-LLM are single-rated. Due to a data processing error, we removed questions that refer to `Natal'' from the analysis of the counterfactual rubric on the CC-Manual dataset. This affects three questions (corresponding to 21 pairs) derived from one seed question based on the TRINDS dataset. This file contains 1,012 rated instances.
Open-ended Medical Adversarial Queries (OMAQ) [equitymedqa_omaq.csv
]: Contains questions that compose the OMAQ dataset. The OMAQ dataset was first described in [1].
Equity in Health AI (EHAI) [equitymedqa_ehai.csv
]: Contains questions that compose the EHAI dataset.
Failure-Based Red Teaming - Manual (FBRT-Manual) [equitymedqa_fbrt_manual.csv
]: Contains questions that compose the FBRT-Manual dataset.
Failure-Based Red Teaming - LLM (FBRT-LLM); full [equitymedqa_fbrt_llm.csv
]: Contains questions that compose the extended FBRT-LLM dataset.
Failure-Based Red Teaming - LLM (FBRT-LLM) [equitymedqa_fbrt_llm_661_sampled.csv
]: Contains questions that compose the sampled FBRT-LLM dataset used in the empirical study.
TRopical and INfectious DiseaseS (TRINDS) [equitymedqa_trinds.csv
]: Contains questions that compose the TRINDS dataset.
Counterfactual Context - Manual (CC-Manual) [equitymedqa_cc_manual.csv
]: Contains pairs of questions that compose the CC-Manual dataset.
Counterfactual Context - LLM (CC-LLM) [equitymedqa_cc_llm.csv
]: Contains pairs of questions that compose the CC-LLM dataset.
HealthSearchQA [other_datasets_healthsearchqa.csv
]: Contains questions sampled from the HealthSearchQA dataset [1,2].
Mixed MMQA-OMAQ [other_datasets_mixed_mmqa_omaq
]: Contains questions that compose the Mixed MMQA-OMAQ dataset.
Omiye et al. [other datasets_omiye_et_al
]: Contains questions proposed in Omiye et al. [3].
Version 2: Updated to include ratings and generated model outputs. Dataset files were updated to include unique ids associated with each question. Version 1: Contained datasets of questions without ratings. Consistent with v1 available as a preprint on Arxiv (https://arxiv.org/abs/2403.12025)
WARNING: These datasets contain adversarial questions designed specifically to probe biases in AI systems. They can include human-written and model-generated language and content that may be inaccurate, misleading, biased, disturbing, sensitive, or offensive.
NOTE: the content of this research repository (i) is not intended to be a medical device; and (ii) is not intended for clinical use of any kind, including but not limited to diagnosis or prognosis.
DPH note about change from 7-day to 14-day metrics: As of 10/15/2020, this dataset is no longer being updated. Starting on 10/15/2020, these metrics will be calculated using a 14-day average rather than a 7-day average. The new dataset using 14-day averages can be accessed here: https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/hree-nys2
As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.
With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).
This dataset includes a weekly count and weekly rate per 100,000 population for COVID-19 cases, a weekly count of COVID-19 PCR diagnostic tests, and a weekly percent positivity rate for tests among people living in community settings. Dates are based on date of specimen collection (cases and positivity).
A person is considered a new case only upon their first COVID-19 testing result because a case is defined as an instance or bout of illness. If they are tested again subsequently and are still positive, it still counts toward the test positivity metric but they are not considered another case.
These case and test counts do not include cases or tests among people residing in congregate settings, such as nursing homes, assisted living facilities, or correctional facilities.
These data are updated weekly; the previous week period for each dataset is the previous Sunday-Saturday, known as an MMWR week (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf). The date listed is the date the dataset was last updated and corresponds to a reporting period of the previous MMWR week. For instance, the data for 8/20/2020 corresponds to a reporting period of 8/9/2020-8/15/2020.
Notes: 9/25/2020: Data for Mansfield and Middletown for the week of Sept 13-19 were unavailable at the time of reporting due to delays in lab reporting.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this article, the 3 sets of Real European test feeders: industrial which is integrated medium and low voltage, rural low voltage, and urban low voltage networks are proposed by using real data of GIS and smart meter readings obtained from a distribution company. The authors provide the real mathematical OpenDSS model of the three standards as master files with their corresponding real smart meter readings and topological data in database.
Each of the three proposed networks has different network features. The industrial network is comprised of both medium and low voltage areas with 2888 nodes, 777 buses, and 556 lines. It addresses 165 low voltage and 26 medium voltage industrial customers using 22 distribution transformers. In the rural network, there are 18599 nodes, 4650 buses, and 4291 lines to supply 2731 end customers. While 26951 nodes, 6738 buses, and 5905 lines are found in the urban network that electrifies 35297 low voltage customers. For rural and urban networks 68 distribution transformers are used in each of the networks to address their customers with both single and three phase systems.
The movement of decarbonization leads to comprise several advanced and smart devices at electricity society and enhancing the application demand response systems. Mainly, deployment of different flexible devices such as EV, heat pump, distribution generation in the distribution system takes the existing system to higher level of complication. Hence, that drives distribution grid system to enter to revolutionary transition which is digitalization of the system, to enable real time management of distribution system as it is undergoing through huge complexity. Such systems requires real mathematical model of distribution network therefore this three different test cases are developed. Majority of the existing test systems are synthetic and not representing the real system of the European network. In addition to being limited quantitative wise and for a specific problem solving, their is a lack of integrated real European testcase which incorporates both the low voltage and medium voltage networks. To fill the gap authors develop the test feeders that address industrial, rural and urban areas which is significantly important for researchers. Here, the corresponding OpenDSS model and demand profiles extracted from smart meters of each standards archived in their 'Master' and 'PQ_csv' folders, respectively. In addition, their topological data is provided in their associated databases. The detail description about the data set and all the development are contained in a paper with the same title of the dataset that it is under review and will be linked to this dataset.
BackgroundPrevention and treatment of liver fibrosis at an early stage is of great prognostic importance, whereas changes in liver stiffness are often overlooked in patients before the onset of obvious clinical symptoms. Recognition of liver fibrosis at an early stage is therefore essential.ObjectiveAn XGBoost machine learning model was constructed to predict participants' liver stiffness measures (LSM) from general characteristic information, blood test metrics and insulin resistance-related indexes, and to compare the fit efficacy of different datasets for LSM.MethodsAll data were obtained from the National Health and Nutrition Examination Survey (NHANES) for the time interval January 2017 to March 2020. Participants' general characteristics, Liver Ultrasound Transient Elastography (LUTE) information, indicators of blood tests and insulin resistance-related indexes were collected, including homeostasis model assessment of insulin resistance (HOMA-IR) and metabolic score for insulin resistance (METS-IR). Three datasets were generated based on the above information, respectively named dataset A (without the insulin resistance-related indexes as predictor variables), dataset B (with METS-IR as a predictor variable) and dataset C (with HOMA-IR as a predictor variable). XGBoost regression was used in the three datasets to construct machine learning models to predict LSM in participants. A random split was used to divide all participants included in the study into training and validation cohorts in a 3:1 ratio, and models were developed in the training cohort and validated with the validation cohort.ResultsA total of 3,564 participants were included in this study, 2,376 in the training cohort and 1,188 in the validation cohort, and all information was not statistically significantly different between the two cohorts (p > 0.05). In the training cohort, datasets A and B both had better predictive efficacy than dataset C for participants' LSM, with dataset B having the best fitting efficacy [±1.96 standard error (SD), (-1.49,1.48) kPa], which was similarly validated in the validation cohort [±1.96 SD, (-1.56,1.56) kPa].ConclusionsXGBoost machine learning models built from general characteristic information and clinically accessible blood test indicators are practicable for predicting LSM in participants, and a dataset that included METS-IR as a predictor variable would improve the accuracy and stability of the models.
This is the dataset for the Style Change Detection task of PAN 2022. Task The goal of the style change detection task is to identify text positions within a given multi-author document at which the author switches. Hence, a fundamental question is the following: If multiple authors have written a text together, can we find evidence for this fact; i.e., do we have a means to detect variations in the writing style? Answering this question belongs to the most difficult and most interesting challenges in author identification: Style change detection is the only means to detect plagiarism in a document if no comparison texts are given; likewise, style change detection can help to uncover gift authorships, to verify a claimed authorship, or to develop new technology for writing support. Previous editions of the Style Change Detection task aim at e.g., detecting whether a document is single- or multi-authored (2018), the actual number of authors within a document (2019), whether there was a style change between two consecutive paragraphs (2020, 2021) and where the actual style changes were located (2021). Based on the progress made towards this goal in previous years, we again extend the set of challenges to likewise entice novices and experts: Given a document, we ask participants to solve the following three tasks: [Task1] Style Change Basic: for a text written by two authors that contains a single style change only, find the position of this change (i.e., cut the text into the two authors��� texts on the paragraph-level), [Task2] Style Change Advanced: for a text written by two or more authors, find all positions of writing style change (i.e., assign all paragraphs of the text uniquely to some author out of the number of authors assumed for the multi-author document) [Task3] Style Change Real-World: for a text written by two or more authors, find all positions of writing style change, where style changes now not only occur between paragraphs, but at the sentence level. All documents are provided in English and may contain an arbitrary number of style changes, resulting from at most five different authors. Data To develop and then test your algorithms, three datasets including ground truth information are provided (dataset1 for task 1, dataset2 for task 2, and dataset3 for task 3). Each dataset is split into three parts: training set: Contains 70% of the whole dataset and includes ground truth data. Use this set to develop and train your models. validation set: Contains 15% of the whole dataset and includes ground truth data. Use this set to evaluate and optimize your models. test set: Contains 15% of the whole dataset, no ground truth data is given. This set is used for evaluation (see later). You are free to use additional external data for training your models. However, we ask you to make the additional data utilized freely available under a suitable license. Input Format The datasets are based on user posts from various sites of the StackExchange network, covering different topics. We refer to each input problem (i.e., the document for which to detect style changes) by an ID, which is subsequently also used to identify the submitted solution to this input problem. We provide one folder for train, validation, and test data for each dataset, respectively. For each problem instance X (i.e., each input document), two files are provided: problem-X.txt contains the actual text, where paragraphs are denoted by for tasks 1 and 2. For task 3, we provide one sentence per paragraph (again, split by ). truth-problem-X.json contains the ground truth, i.e., the correct solution in JSON format. An example file is listed in the following (note that we list keys for the three tasks here): { "authors": NUMBER_OF_AUTHORS, "site": SOURCE_SITE, "changes": RESULT_ARRAY_TASK1 or RESULT_ARRAY_TASK3, "paragraph-authors": RESULT_ARRAY_TASK2 } The result for task 1 (key "changes") is represented as an array, holding a binary for each pair of consecutive paragraphs within the document (0 if there was no style change, 1 if there was a style change). For task 2 (key "paragraph-authors"), the result is the order of authors contained in the document (e.g., [1, 2, 1] for a two-author document), where the first author is "1", the second author appearing in the document is referred to as "2", etc. Furthermore, we provide the total number of authors and the Stackoverflow site the texts were extracted from (i.e., topic). The result for task 3 (key "changes") is similarly structured as the results array for task 1. However, for task 3, the changes array holds a binary for each pair of consecutive sentences and they may be multiple style changes in the document. An example of a multi-author document with a style change between the third and fourth paragraph (or sentence for task 3) could be described as follows (we only list the relevant key/value pairs here): { "changes": [0,0,1,...], "paragraph-authors": [1,1,1,2,...] } Output Format To...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This excel file will do a statistical tests of whether two ROC curves are different from each other based on the Area Under the Curve. You'll need the coefficient from the presented table in the following article to enter the correct AUC value for the comparison: Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148:839-843.