Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The shorthand symbol for the stimulus starts with the C/S/K (for contrast, skew, kurtosis) and is followed by −,−−,+,++ (small magnitude and negative, large magnitude and negative, small magnitude and positive, large magnitude and positive); therefore, C+,C++,S−−,S−,S+,S++,K−−,K−,K+. Parameters in the table denoted in bold were varied in each of the three stimulus categories.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
This repository hosts the Testing Roads for Autonomous VEhicLes (TRAVEL) dataset. TRAVEL is an extensive collection of virtual roads that have been used for testing lane assist/keeping systems (i.e., driving agents) and data from their execution in state of the art, physically accurate driving simulator, called BeamNG.tech. Virtual roads consist of sequences of road points interpolated using Cubic splines.
Along with the data, this repository contains instructions on how to install the tooling necessary to generate new data (i.e., test cases) and analyze them in the context of test regression. We focus on test selection and test prioritization, given their importance for developing high-quality software following the DevOps paradigms.
This dataset builds on top of our previous work in this area, including work on
test generation (e.g., AsFault, DeepJanus, and DeepHyperion) and the SBST CPS tool competition (SBST2021),
test selection: SDC-Scissor and related tool
test prioritization: automated test cases prioritization work for SDCs.
Dataset Overview
The TRAVEL dataset is available under the data folder and is organized as a set of experiments folders. Each of these folders is generated by running the test-generator (see below) and contains the configuration used for generating the data (experiment_description.csv), various statistics on generated tests (generation_stats.csv) and found faults (oob_stats.csv). Additionally, the folders contain the raw test cases generated and executed during each experiment (test..json).
The following sections describe what each of those files contains.
Experiment Description
The experiment_description.csv contains the settings used to generate the data, including:
Time budget. The overall generation budget in hours. This budget includes both the time to generate and execute the tests as driving simulations.
The size of the map. The size of the squared map defines the boundaries inside which the virtual roads develop in meters.
The test subject. The driving agent that implements the lane-keeping system under test. The TRAVEL dataset contains data generated testing the BeamNG.AI and the end-to-end Dave2 systems.
The test generator. The algorithm that generated the test cases. The TRAVEL dataset contains data obtained using various algorithms, ranging from naive and advanced random generators to complex evolutionary algorithms, for generating tests.
The speed limit. The maximum speed at which the driving agent under test can travel.
Out of Bound (OOB) tolerance. The test cases' oracle that defines the tolerable amount of the ego-car that can lie outside the lane boundaries. This parameter ranges between 0.0 and 1.0. In the former case, a test failure triggers as soon as any part of the ego-vehicle goes out of the lane boundary; in the latter case, a test failure triggers only if the entire body of the ego-car falls outside the lane.
Experiment Statistics
The generation_stats.csv contains statistics about the test generation, including:
Total number of generated tests. The number of tests generated during an experiment. This number is broken down into the number of valid tests and invalid tests. Valid tests contain virtual roads that do not self-intersect and contain turns that are not too sharp.
Test outcome. The test outcome contains the number of passed tests, failed tests, and test in error. Passed and failed tests are defined by the OOB Tolerance and an additional (implicit) oracle that checks whether the ego-car is moving or standing. Tests that did not pass because of other errors (e.g., the simulator crashed) are reported in a separated category.
The TRAVEL dataset also contains statistics about the failed tests, including the overall number of failed tests (total oob) and its breakdown into OOB that happened while driving left or right. Further statistics about the diversity (i.e., sparseness) of the failures are also reported.
Test Cases and Executions
Each test..json contains information about a test case and, if the test case is valid, the data observed during its execution as driving simulation.
The data about the test case definition include:
The road points. The list of points in a 2D space that identifies the center of the virtual road, and their interpolation using cubic splines (interpolated_points)
The test ID. The unique identifier of the test in the experiment.
Validity flag and explanation. A flag that indicates whether the test is valid or not, and a brief message describing why the test is not considered valid (e.g., the road contains sharp turns or the road self intersects)
The test data are organized according to the following JSON Schema and can be interpreted as RoadTest objects provided by the tests_generation.py module.
{ "type": "object", "properties": { "id": { "type": "integer" }, "is_valid": { "type": "boolean" }, "validation_message": { "type": "string" }, "road_points": { §\label{line:road-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "interpolated_points": { §\label{line:interpolated-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "test_outcome": { "type": "string" }, §\label{line:test-outcome}§ "description": { "type": "string" }, "execution_data": { "type": "array", "items": { "$ref" : "schemas/simulationdata" } } }, "required": [ "id", "is_valid", "validation_message", "road_points", "interpolated_points" ] }
Finally, the execution data contain a list of timestamped state information recorded by the driving simulation. State information is collected at constant frequency and includes absolute position, rotation, and velocity of the ego-car, its speed in Km/h, and control inputs from the driving agent (steering, throttle, and braking). Additionally, execution data contain OOB-related data, such as the lateral distance between the car and the lane center and the OOB percentage (i.e., how much the car is outside the lane).
The simulation data adhere to the following (simplified) JSON Schema and can be interpreted as Python objects using the simulation_data.py module.
{ "$id": "schemas/simulationdata", "type": "object", "properties": { "timer" : { "type": "number" }, "pos" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel_kmh" : { "type": "number" }, "steering" : { "type": "number" }, "brake" : { "type": "number" }, "throttle" : { "type": "number" }, "is_oob" : { "type": "number" }, "oob_percentage" : { "type": "number" } §\label{line:oob-percentage}§ }, "required": [ "timer", "pos", "vel", "vel_kmh", "steering", "brake", "throttle", "is_oob", "oob_percentage" ] }
Dataset Content
The TRAVEL dataset is a lively initiative so the content of the dataset is subject to change. Currently, the dataset contains the data collected during the SBST CPS tool competition, and data collected in the context of our recent work on test selection (SDC-Scissor work and tool) and test prioritization (automated test cases prioritization work for SDCs).
SBST CPS Tool Competition Data
The data collected during the SBST CPS tool competition are stored inside data/competition.tar.gz. The file contains the test cases generated by Deeper, Frenetic, AdaFrenetic, and Swat, the open-source test generators submitted to the competition and executed against BeamNG.AI with an aggression factor of 0.7 (i.e., conservative driver).
Name
Map Size (m x m)
Max Speed (Km/h)
Budget (h)
OOB Tolerance (%)
Test Subject
DEFAULT
200 × 200
120
5 (real time)
0.95
BeamNG.AI - 0.7
SBST
200 × 200
70
2 (real time)
0.5
BeamNG.AI - 0.7
Specifically, the TRAVEL dataset contains 8 repetitions for each of the above configurations for each test generator totaling 64 experiments.
SDC Scissor
With SDC-Scissor we collected data based on the Frenetic test generator. The data is stored inside data/sdc-scissor.tar.gz. The following table summarizes the used parameters.
Name
Map Size (m x m)
Max Speed (Km/h)
Budget (h)
OOB Tolerance (%)
Test Subject
SDC-SCISSOR
200 × 200
120
16 (real time)
0.5
BeamNG.AI - 1.5
The dataset contains 9 experiments with the above configuration. For generating your own data with SDC-Scissor follow the instructions in its repository.
Dataset Statistics
Here is an overview of the TRAVEL dataset: generated tests, executed tests, and faults found by all the test generators grouped by experiment configuration. Some 25,845 test cases are generated by running 4 test generators 8 times in 2 configurations using the SBST CPS Tool Competition code pipeline (SBST in the table). We ran the test generators for 5 hours, allowing the ego-car a generous speed limit (120 Km/h) and defining a high OOB tolerance (i.e., 0.95), and we also ran the test generators using a smaller generation budget (i.e., 2 hours) and speed limit (i.e., 70 Km/h) while setting the OOB tolerance to a lower value (i.e., 0.85). We also collected some 5, 971 additional tests with SDC-Scissor (SDC-Scissor in the table) by running it 9 times for 16 hours using Frenetic as a test generator and defining a more realistic OOB tolerance (i.e., 0.50).
Generating new Data
Generating new data, i.e., test cases, can be done using the SBST CPS Tool Competition pipeline and the driving simulator BeamNG.tech.
Extensive instructions on how to install both software are reported inside the SBST CPS Tool Competition pipeline Documentation;
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead of
urban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In scientific research, assessing the impact and influence of authors is crucial for evaluating their scholarly contributions. Whereas in literature, multitudinous parameters have been developed to quantify the productivity and significance of researchers, including the publication count, citation count, well-known h index and its extensions and variations. However, with a plethora of available assessment metrics, it is vital to identify and prioritize the most effective metrics. To address the complexity of this task, we employ a powerful deep learning technique known as the Multi-Layer Perceptron (MLP) classifier for the classification and the ranking purposes. By leveraging the MLP’s capacity to discern patterns within datasets, we assign importance scores to each parameter using the proposed modified recursive elimination technique. Based on the importance scores, we ranked these parameters. Furthermore, in this study, we put forth a comprehensive statistical analysis of the top-ranked author assessment parameters, encompassing a vast array of 64 distinct metrics. This analysis gives us treasured insights in between these parameters, shedding light on the potential correlations and dependencies that may affect assessment outcomes. In the statistical analysis, we combined these parameters by using seven well-known statistical methods, such as arithmetic means, harmonic means, geometric means etc. After combining the parameters, we sorted the list of each pair of parameters and analyzed the top 10, 50, and 100 records. During this analysis, we counted the occurrence of the award winners. For experimental proposes, data collection was done from the field of Mathematics. This dataset consists of 525 individuals who are yet to receive their awards along with 525 individuals who have been recognized as potential award winners by certain well known and prestigious scientific societies belonging to the fields’ of mathematics in the last three decades. The results of this study revealed that, in ranking of the author assessment parameters, the normalized h index achieved the highest importance score as compared to the remaining sixty-three parameters. Furthermore, the statistical analysis results revealed that the Trigonometric Mean (TM) outperformed the other six statistical models. Moreover, based on the analysis of the parameters, specifically the M Quotient and FG index, it is evident that combining these parameters with any other parameter using various statistical models consistently produces excellent results in terms of the percentage score for returning awardees.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
This release presents experimental statistics from the Mental Health Services Data Set (MHSDS). This replaces the Mental Health and Learning Disabilities Dataset (MHLDDS). As well as analysis of waiting times this release includes elements of the reports that were previously included in monthly reports produced from final MHLDDS submissions. It also includes some new measures. New measures are noted in the accompanying metadata file. The changes incorporate requirements in support of Children and Young People's Improving Access to Psychological Therapies (CYP IAPT), elements of the Learning Disabilities Census (LDC) and elements of the Assuring Transformation (AT) Information Standard. Information provided in this release therefore covers mental health, learning disability and autism services for all ages ('services'). From January 2016 the release includes information on people in children's and young people's mental health services, including CAMHS, for the first time. Learning disabilities and autism services have been included since September 2014. The expansion in the scope of the dataset means that many of the basic measures in this release now cover a wider set of services. We have introduced service level breakdowns for some measures to provide new information to users, but also, importantly, to provide comparability with key measures that were part of the previous monthly release. Full details of the measures included in this publication can be found in the Further information about this publication section of the executive summary. Because of the scope of the changes to the dataset it will take time to re-introduce all possible measures that were previously part of the MHLDS Monthly Reports. Additional measures will be added to this report in the coming months. Because the dataset is new these measures are currently experimental statistics. We will release the reports as experimental statistics until the characteristics of data flowed using the new data standard are understood. The MHSDS Monthly Data File was updated on 14 February 2017 with a correction to provider level figures for uses of the Mental Health Act, Out of Area Treatment and perinatal mental health activity. The measures affected are: AMH09a, LDA08, LDA09, LDA10, MH08, MH08a, MH09, MH09a, MH09b, MH09c, MH10, MH10a, MH11, MHS08, MHS08a, MHS09, MHS10, MHS11, AMH34a, AMH35a, MHS23a and MHS29a. Full details for these measures can be found in the metadata file which accompanies this publication. A correction has been made to this publication on 10 September 2018. This amendment relates to statistics in the monthly CSV data file; the specific measures effected are listed in the “Corrected Measures” CSV. All listed measures have now been corrected. NHS Digital apologises for any inconvenience caused.
Since the 1940's, hydrologists have used aquifer tests to estimate the hydrogeologic properties near test wells. Results from these tests are recorded in various files, databases, reports and scientific publications. The U.S. Geological Survey (USGS), Lower Mississippi-Gulf Water Science Center (LMG) is aggregating all aquifer test results from Alabama, Arkansas, Louisiana, Mississippi and Tennessee into a single dataset that is publicly available in a machine-readable format. This dataset contains information and results from 2,245 aquifer tests compiled for aquifers located in the LMG-Hydrogeologic Aquifer Test Dataset - December 2020. Descriptive statistics for the December 2020 dataset are presented in Table 1 (below) and in the Summary_Readme.pdf. Additionally, this dataset contains 6 attribute tables (.txt files) with additional information for various fields, a zip file containing the geospatial data, and the companion attribute table as a .txt file. THE LMG-HYDROGEOLOGIC AQUIFER TEST DATASET – DECEMBER 2020 IS AVAILABLE IN TWO FORMATS: 1) a tab delimited text (.txt) UTF-8 file and 2) an ESRI GIS point shapefile. FIELDS INCLUDED IN THE LMG-HYDROGEOLOGIC AQUIFER TEST DATASET – DECEMBER 2020: [a complete list of field names, their definitions and units are listed in the Summary_Readme.pdf file] Location data: USGS site identification number, local identification name, Public Land Survey System number, latitude, longitude, State and county. Well construction data: Construction date, well depth, Diameter of well, diameter of casing, depth to top of opening (screen) interval, depth to bottom of opening interval and length of the open interval. Aquifer data: Local aquifer name and code, national aquifer name and code, top of aquifer (altitude), bottom of aquifer, and thickness of aquifer. Groundwater test data: Test date, yield/discharge, length of time associated with yield, static water-level in feet below land surface, production water-level in feet below land surface associated with yield, drawdown associated with yield. Hydrogeologic data: Specific capacity, transmissivity, horizontal Conductivity, vertical conductivity, permeability and storage coefficient. Ancillary data: Method of test analysis and data source reference. DESCRIPTIONS OF ATTACHED FILES: Summary_Readme.pdf: a Portable Document Format (PDF) file with field names, definitions and units for the aquifer test dataset and the associated attribute tables. This file also contains summary statistics for aquifer test compiled through December 2020. LMG-HydrogeologicAqfrTestDataset_Dec2020.txt: a tab-delimited, UTF-8 text file of the attribute table associated with the LMG-HydrogeologicTestData_Dec2020 geospatial dataset. AtbtTbl_AqfrCd_Readme.txt: an UTF-8 text file containing information from the National Water Information System: Help System web page about USGS groundwater codes. (accessed December 4, 2019 at https://help.waterdata.usgs.gov/codes-and-parameters) AtbtTbl_FipsGeographyCodes.txt: a tab-delimited, UTF-8 text file of FIPS (Federal Information Processing Standards) codes, uniquely identifying States, counties and county equivalents in the United States. Note: to reduce the size of this file, city codes were removed. (accessed January 8, 2020 at https://www.census.gov/geographies/reference-files/2017/demo/popest/2017-fips.html). AtbtTbl_LocalAqfrCodes.txt: a tab-delimited, UTF-8 text file of eight-character string identifying an aquifer. Codes are defined by the "Catalog of Aquifer Names and Geologic Unit Codes used by the USGS. (accessed December 4, 2019 at https://help.waterdata.usgs.gov/aqfr_cd) AtbtTbl_NatAqfrCodes.txt: a tab-delimited, UTF-8 text file of ten-character strings identifying a National aquifer, or principal aquifer of the United States, that are defined as regionally extensive aquifers or aquifer systems that have the potential to be used as a source of potable water. (accessed December 4, 2019 at https://water.usgs.gov/ogw/NatlAqCode-reflist.html) AtbtTbl_TstMthdCodes.txt: a tab-delimited, UTF-8 text file of codes identifying the aquifer test analysis method when reported in the associated reference. AtbtTbl_DataRefNo.txt: a tab-delimited, UTF-8 text file of references for the source of the associated aquifer test result. CAVEAT: Some hydrogeologic test results reported in this dataset have not been through the USGS data review and approval process to receive the Director’s approval. Any such data are considered PROVISIONAL and subject to revision. PROVISIONAL data are released on the condition that neither the USGS nor the United States Government may be held liable for any damages resulting from its use. NOTE: -- If you have data you would like added to this dataset or have found an error, please contact the USGS so we may incorporate them into the next version of the LMG- Hydrogeologic Aquifer Test dataset. Table 1. Summary-descriptive statistics for the LMG-HYDROGEOLOGIC AQUIFER TEST DATASET – December 2020. [USGS, U.S. Geological Survey; NWIS, National Water Information System; n, number of wells] USGS-NWIS NATIONAL STANDARD AQUIFER NAME AND CODE n MAXIMUM MINIMUM MEAN MEDIUM DEVIATION Specific capacity (gallons per minute per foot) All well data 1733 15000 0.0025 84 8.7 552 Alluvial aquifers (N100ALLUVL) 21 723 0.98 57 12 161 Mississippi River Valley alluvial aquifer (N100MSRVVL) 185 10000 0.06 265 72 864 Other aquifers (N9999OTHER) 3 50 1.20 18 2.1 28 Coastal lowlands aquifer system (S100CSLLWD) 913 15000 0.05 93 12 645 Mississippi embayment aquifer system (S100MSEMBM) 429 641 0.01 13 4 44 Southeastern Coastal Plain aquifer system (S100SECSLP) 99 71 0.10 6.2 3.7 8.7 Ozark Plateaus aquifer system (S400OZRKPL) 30 16 0.16 3.6 1.7 4.2 Edwards-Trinity aquifer system (S500EDRTRN) 0 -- -- -- -- -- Unknown National aquifer 53 972 0.0025 59 10 151 Transmissivity (square feet per day) All well data 1549 260678 1.3 12366 5080 20711 Alluvial aquifers (N100ALLUVL) 26 41700 450 9294 8422 8420 Mississippi River Valley alluvial aquifer (N100MSRVVL) 146 171800 236 31934 24431 28074 Other aquifers (N9999OTHER) 4 26000 24 8506 4000 11822 Coastal lowlands aquifer system (S100CSLLWD) 703 260678 1.5 15585 8000 23875 Mississippi embayment aquifer system (S100MSEMBM) 456 36000 1.3 4618 2406 6006 Southeastern Coastal Plain aquifer system (S100SECSLP) 114 80000 5.00 3652 1340 8838 Ozark Plateaus aquifer system (S400OZRKPL) 36 4983 42 1056 534 1262 Edwards-Trinity aquifer system (S500EDRTRN) 1 161 161 161 161 -- Unknown National aquifer 63 84486 5.9 11103 4345 16908 Horizontal hydraulic conductivity (feet per day) All well data 749 1077 0.01 72 50 82 Alluvial aquifers (N100ALLUVL) 6 321 39.88 160 176 106 Mississippi River Valley alluvial aquifer (N100MSRVVL) 46 400 6.88 182 190 134 Other aquifers (N9999OTHER) 4 269 92.00 183 185 95 Coastal lowlands aquifer system (S100CSLLWD) 268 1077 1.00 93 81 85 Mississippi embayment aquifer system (S100MSEMBM) 271 370 0.02 54 43 52 Southeastern Coastal Plain aquifer system (S100SECSLP) 109 230 0.30 31 14 36 Ozark Plateaus aquifer system (S400OZRKPL) 33 1.9 0.01 0.54 0.31 0.58 Edwards-Trinity aquifer system (S500EDRTRN) 0 -- -- -- -- -- Unknown National aquifer 12 267 16.00 104 54 99 Permeability (gallons per day per square feet) All well data 497 8375 0.12 736 400 947 Alluvial aquifers (N100ALLUVL) 12 2400 328 1307 1270 602 Mississippi River Valley alluvial aquifer (N100MSRVVL) 43 7891 110 1926 1785 1174 Other aquifers (N9999OTHER) 0 -- -- -- -- -- Coastal lowlands aquifer system (S100CSLLWD) 263 8375 11 796 636 973 Mississippi embayment aquifer system (S100MSEMBM) 165 1300 0.12 235 177 237 Southeastern Coastal Plain aquifer system (S100SECSLP) 0 -- -- -- -- -- Ozark Plateaus aquifer system (S400OZRKPL) 0 -- -- -- -- -- Edwards-Trinity aquifer system (S500EDRTRN) 0 -- -- -- -- -- Unknown National aquifer 14 4158 201 1390 1204 963 Storage coefficient (dimensionless) All well data 490 1.62 6.30E-10 0.0083 0.00051 0.081 Alluvial aquifers (N100ALLUVL) 21 0.08 0.0002 0.0053 0.00054 0.017 Mississippi River Valley alluvial aquifer (N100MSRVVL) 82 0.09 0.0001 0.0081 0.0013 0.016 Other aquifers (N9999OTHER) 1 0.0006 0.0006 0.0006 0.0006 -- Coastal lowlands aquifer system (S100CSLLWD) 233 0.72 6.30E-10 0.0054 0.0005 0.048 Mississippi embayment aquifer system (S100MSEMBM) 100 1.62 0.000012 0.0180 0.00027 0.16 Southeastern Coastal Plain aquifer system (S100SECSLP) 16 0.006 0.00003 0.0005 0.0002 0.0015 Ozark Plateaus aquifer system (S400OZRKPL) 0 -- -- -- -- -- Edwards-Trinity aquifer system (S500EDRTRN) 0 -- -- -- -- -- Unknown National aquifer 37 0.05 0.000078 0.0062 0.00067 0.014 This dataset was developed as part of the U.S. Geological Survey, Mississippi Alluvial Plain Regional Water-Availability Study.
Each day, Backblaze takes a snapshot of each operational hard drive that includes basic hard drive information (e.g., capacity, failure) and S.M.A.R.T. statistics reported by each drive. This dataset contains data from the first two quarters in 2016.
This dataset contains basic hard drive information and 90 columns or raw and normalized values of 45 different S.M.A.R.T. statistics. Each row represents a daily snapshot of one hard drive.
date: Date in yyyy-mm-dd format
serial_number: Manufacturer-assigned serial number of the drive
model: Manufacturer-assigned model number of the drive
capacity_bytes: Drive capacity in bytes
failure: Contains a “0” if the drive is OK. Contains a “1” if this is the last day the drive was operational before failing.
90 variables that begin with 'smart': Raw and Normalized values for 45 different SMART stats as reported by the given drive
Some items to keep in mind as you process the data:
S.M.A.R.T. statistic can vary in meaning based on the manufacturer and model. It may be more informative to compare drives that are similar in model and manufacturer
Some S.M.A.R.T. columns can have out-of-bound values
When a drive fails, the 'failure' column is set to 1 on the day of failure, and starting the day after, the drive will be removed from the dataset. Each day, new drives are also added. This means that total number of drives each day may vary.
S.M.A.R.T. 9 is the number of hours a drive has been in service. To calculate a drive's age in days, divide this number by 24.
Given the hints above, below are a couple of questions to help you explore the dataset:
What is the median survival time of a hard drive? How does this differ by model/manufacturer?
Can you calculate the probability that a hard drive will fail given the hard drive information and statistics in the dataset?
The original collection of data can be found here. When using this data, Backblaze asks that you cite Backblaze as the source; you accept that you are solely responsible for how you use the data; and you do not sell this data to anyone.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The Health Index is an Experimental Statistic to measure a broad definition of health, in a way that can be tracked over time and compared between different areas. These data are the provisional results of the Health Index for upper-tier local authorities in England, 2015 to 2018, to illustrate the type of analysis the Health Index can enable.
This release is for quarters 1 to 4 of 2019 to 2020.
Local authority commissioners and health professionals can use these resources to track how many pregnant women, children and families in their local area have received health promoting reviews at particular points during pregnancy and childhood.
The data and commentaries also show variation at a local, regional and national level. This can help with planning, commissioning and improving local services.
The metrics cover health reviews for pregnant women, children and their families at several stages which are:
Public Health England (PHE) collects the data, which is submitted by local authorities on a voluntary basis.
See health visitor service delivery metrics in the child and maternal health statistics collection to access data for previous years.
Find guidance on using these statistics and other intelligence resources to help you make decisions about the planning and provision of child and maternal health services.
See health visitor service metrics and outcomes definitions from Community Services Dataset (CSDS).
Since publication in November 2020, Lewisham and Leicestershire councils have identified errors in the new birth visits within 14 days data it submitted to Public Health England (PHE) for 2019 to 2020 data. This error has caused a statistically significant change in the health visiting data for 2019 to 2020, and so the Office for Health Improvement and Disparities (OHID) has updated and reissued the data in OHID’s Fingertips tool.
A correction notice has been added to the 2019 to 2020 annual statistical release and statistical commentary but the data has not been altered.
Please consult OHID’s Fingertips tool for corrected data for Lewisham and Leicestershire, the London and East Midlands region, and England.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The National Transit Map - Routes dataset was compiled on June 02, 2025 from the Bureau of Transportation Statistics (BTS) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). The National Transit Map (NTM) is a nationwide catalog of fixed-guideway and fixed-route transit service in America. It is compiled using General Transit Feed Specification (GTFS) Schedule data. The NTM Routes dataset shows transit routes, which is a group of trips that are displayed to riders as a single service. To display the route alignment and trips for each route, this dataset combines the following GTFS files: routes.txt, trips.txt, and shapes.txt. The GTFS Schedule documentation is available at, https://gtfs.org/schedule/. To improve the spatial accuracy of the NTM Routes, the Bureau of Transportation Statistics (BTS) adjusts transit routes using context from the submitted GTFS source data and/or from other publicly available information about the transit service.
This is a monthly report on publicly funded community services for children, young people and adults using data from the Community Services Data Set (CSDS) reported in England. The CSDS is a patient-level dataset and has been developed to help achieve better outcomes for children, young people and adults. It provides data that will be used to commission services in a way that improves health, reduces inequalities, and supports service improvement and clinical quality. These services can include NHS Trusts, health centres, schools, mental health trusts, and local authorities. The data collected in CSDS includes personal and demographic information, diagnoses including long-term conditions and disabilities and care events plus screening activities. These statistics are classified as experimental and should be used with caution. Experimental statistics are new official statistics undergoing evaluation. They are published in order to involve users and stakeholders in their development and as a means to build in quality at an early stage. More information about experimental statistics can be found on the UK Statistics Authority website. We hope this information is helpful and would be grateful if you could spare a couple of minutes to complete a short customer satisfaction survey. Please use the survey in the related links to provide us with any feedback or suggestions for improving the report.
Enterprises by means of services supply and NACE rev.2 activity aggregate - experimental statistics
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In scientific research, assessing the impact and influence of authors is crucial for evaluating their scholarly contributions. Whereas in literature, multitudinous parameters have been developed to quantify the productivity and significance of researchers, including the publication count, citation count, well-known h index and its extensions and variations. However, with a plethora of available assessment metrics, it is vital to identify and prioritize the most effective metrics. To address the complexity of this task, we employ a powerful deep learning technique known as the Multi-Layer Perceptron (MLP) classifier for the classification and the ranking purposes. By leveraging the MLP’s capacity to discern patterns within datasets, we assign importance scores to each parameter using the proposed modified recursive elimination technique. Based on the importance scores, we ranked these parameters. Furthermore, in this study, we put forth a comprehensive statistical analysis of the top-ranked author assessment parameters, encompassing a vast array of 64 distinct metrics. This analysis gives us treasured insights in between these parameters, shedding light on the potential correlations and dependencies that may affect assessment outcomes. In the statistical analysis, we combined these parameters by using seven well-known statistical methods, such as arithmetic means, harmonic means, geometric means etc. After combining the parameters, we sorted the list of each pair of parameters and analyzed the top 10, 50, and 100 records. During this analysis, we counted the occurrence of the award winners. For experimental proposes, data collection was done from the field of Mathematics. This dataset consists of 525 individuals who are yet to receive their awards along with 525 individuals who have been recognized as potential award winners by certain well known and prestigious scientific societies belonging to the fields’ of mathematics in the last three decades. The results of this study revealed that, in ranking of the author assessment parameters, the normalized h index achieved the highest importance score as compared to the remaining sixty-three parameters. Furthermore, the statistical analysis results revealed that the Trigonometric Mean (TM) outperformed the other six statistical models. Moreover, based on the analysis of the parameters, specifically the M Quotient and FG index, it is evident that combining these parameters with any other parameter using various statistical models consistently produces excellent results in terms of the percentage score for returning awardees.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Abstract: This dissertation challenges the traditional idealized model of allomorphy by confronting it with comprehensive data on 15 Russian aspectual prefixes (RAZ-, RAS-, RAZO-, S-, SO-, PERE-, PRE-, VZ-, VOZ-, O-, OB-, OBO-, U-, VY-, IZ-) collected from corpus and linguistic experiments. The traditional definition narrows allomorphy down to a mere variation of form where the meaning remains constant and variants are distributed complementarily. My findings show that submorphemic semantic differences and distributional overlap are not uncommon properties of morpheme variants. I suggest that allomorphy is a broader phenomenon that goes beyond the axioms of complementary distribution and identical meaning. I examine non-trivial cases of prefix polysemy and multifactorial conditioning of prefix distribution that make it difficult to assess the traditional criteria for allomorphy. Moreover, I present studies of semantic dissimilation of allomorphs and overlap in distribution that violate the absolute criteria for allomorphic relationship. I take the perspective of Cognitive Linguistics and propose an alternative usage-based model of allomorphy that is flexible enough to capture both standard exemplars and non-standard deviations. This model offers detailed applications of several advanced statistical models that optimize the criteria of both semantic “sameness” and distributional complementarity. According to this model, allomorphy is a scalar relationship between morpheme variants – a relationship that can vary in terms of closeness and regularity. Statistical modeling turns the concept of allomorphy into a measurable and verifiable correspondence of form-meaning variation. This makes it possible to measure semantic simi larity and divergence and distinguish robust patterns of distribution from random effects. The set of files includes tagged databases, their versions used in statistical analyses and R codes for the statistical analyses described in the dissertation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Experiment 1 means and statistics for age and baseline assessments indicating experimental conditions did not differ.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains raw and pre-processed EEG data from a mobile EEG study investigating the additive effects of task load, motor demand, and environmental complexity on attention. More details will be provided once the manuscript has passed peer-review.
All preprocessing and analysis code is deposited in the code
directory. The entire MATLAB pipeline can be reproduced by executing the run_pipeline.m
script. In order to run these scripts, you will need to ensure you have the required MATLAB toolboxes and R packages on your system. You will also need to adapt def_local.m
to specify local paths to MATLAB and EEGLAB. Descriptive statistics and mixed-effects models can be reproduced in R by running the stat_analysis.R
script.
See below for software details.
For more information, see the dataset_description.json
file.
Dataset is formatted according to the EEG-BIDS extension (Pernet et al., 2019) and the BIDS extension proposal for common electrophysiological derivatives (BEP021) v0.0.1, which can be found here:
Note that BEP021 is still a work in progress as of 2021-03-01.
Generally, you can find data in the .tsv files and descriptions in the accompanying .json files.
An important BIDS definition to consider is the "Inheritance Principle" (see 3.5 in the BIDS specification: http://bids.neuroimaging.io/bids_spec.pdf), which states:
Any metadata file (.json,.bvec,.tsv, etc.) may be defined at any directory level. The values from the top level are inherited by all lower levels unless they are overridden by a file at the lower level.
Forty-four healthy adults aged 18-40 performed an oddball task involving complex tone (piano and horn) stimuli in three settings: (1) sitting in a quiet room in the lab (LAB); (2) walking around a sports field (FIELD); (3) navigating a route through a university campus (CAMPUS).
Participants performed each environmental condition twice: once while attending to oddball stimuli (i.e. counting the number of presented deviant tones; COUNT), and once while disregarding or ignoring the tone stimuli (IGNORE).
EEG signals were recorded from 32 active electrodes using a Brain Vision LiveAmp 32 amplifier. See manuscript for further details.
MATLAB Version: 9.7.0.1319299 (R2019b) Update 5 MATLAB License Number: 678256 Operating System: Microsoft Windows 10 Enterprise Version 10.0 (Build 18363) Java Version: Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
The following toolboxes/helper functions were also used:
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale: _LC_COLLATE=English_Australia.1252_, _LC_CTYPE=English_Australia.1252_, _LC_MONETARY=English_Australia.1252_, _LC_NUMERIC=C_ and _LC_TIME=English_Australia.1252_
attached base packages:
other attached packages:
loaded via a namespace (and not attached):
As a subset of the Japanese 55-year Reanalysis (JRA-55) project, an experiment using the global atmospheric model of the JRA-55 was conducted by the Meteorological Research Institute of the Japan Meteorological Agency. The experiment, named the JRA-55AMIP, has been carried out by prescribing the same boundary conditions and radiative forcing of JRA-55, including the historical observed sea surface temperature, sea ice concentration, greenhouse gases, etc., with no use of atmospheric observational data. This project is intended to assess systematic errors of the model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents information from 2016 at the household level; the percentage of households within each Index of Household Advantage and Disadvantage (IHAD) quartile for Statistical Area Level 1 (SA1) 2016 boundaries. The IHAD is an experimental analytical index developed by the Australian Bureau of Statistics (ABS) that provides a summary measure of relative socio-economic advantage and disadvantage for households. It utilises information from the 2016 Census of Population and Housing. IHAD quartiles: All households are ordered from lowest to highest disadvantage, the lowest 25% of households are given a quartile number of 1, the next lowest 25% of households are given a quartile number of 2 and so on, up to the highest 25% of households which are given a quartile number of 4. This means that households are divided up into four groups, depending on their score. This data is ABS data (catalogue number: 4198.0) used with permission from the Australian Bureau of Statistics. For more information please visit the Australian Bureau of Statistics. Please note: AURIN has spatially enabled the original data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Experimental results for the MLP, SVM, RF, and XGBoost, based on the NHANES dataset, with ICA feature extraction (best results are highlighted in bold; # means the number of features).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The shorthand symbol for the stimulus starts with the C/S/K (for contrast, skew, kurtosis) and is followed by −,−−,+,++ (small magnitude and negative, large magnitude and negative, small magnitude and positive, large magnitude and positive); therefore, C+,C++,S−−,S−,S+,S++,K−−,K−,K+. Parameters in the table denoted in bold were varied in each of the three stimulus categories.