72 datasets found

Data from: Discovering System Health Anomalies using Data Mining Techniques
data.staging.idas-ds1.appdat.jsc.nasa.gov
gimi9.com
+4more
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.staging.idas-ds1.appdat.jsc.nasa.gov (2025). Discovering System Health Anomalies using Data Mining Techniques [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/discovering-system-health-anomalies-using-data-mining-techniques
Explore at:
Dataset updated
Feb 18, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
We discuss a statistical framework that underlies envelope detection schemes as well as dynamical models based on Hidden Markov Models (HMM) that can encompass both discrete and continuous sensor measurements for use in Integrated System Health Management (ISHM) applications. The HMM allows for the rapid assimilation, analysis, and discovery of system anomalies. We motivate our work with a discussion of an aviation problem where the identification of anomalous sequences is essential for safety reasons. The data in this application are discrete and continuous sensor measurements and can be dealt with seamlessly using the methods described here to discover anomalous flights. We specifically treat the problem of discovering anomalous features in the time series that may be hidden from the sensor suite and compare those methods to standard envelope detection methods on test data designed to accentuate the differences between the two methods. Identification of these hidden anomalies is crucial to building stable, reusable, and cost-efficient systems. We also discuss a data mining framework for the analysis and discovery of anomalies in high-dimensional time series of sensor measurements that would be found in an ISHM system. We conclude with recommendations that describe the tradeoffs in building an integrated scalable platform for robust anomaly detection in ISHM applications.
Z
Dataset: A Systematic Literature Review on the topic of High-value datasets
data.niaid.nih.gov
zenodo.org
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasija Nikiforova (2024). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Charalampos Alexopoulos
Andrea Miletič
Nina Rizun
Magdalena Ciesielska
Anastasija Nikiforova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

Methodology

To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

Description of the data in this data set

Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

Licenses or restrictions CC-BY

For more info, see README.txt
m
Impact and Risk Analysis Database Documentation
demo.dev.magda.io
cloud.csiss.gmu.edu
+3more
zip
Updated Apr 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2022). Impact and Risk Analysis Database Documentation [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-7d81ced1-9075-45b5-a79d-86ec87d28769
Explore at:
zipAvailable download formats
Dataset updated
Apr 13, 2022
Dataset provided by
Bioregional Assessment Program
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract Four documents describe the specifications, methods and scripts of the Impact and Risk Analysis Databases developed for the Bioregional Assessments Programme. They are: Bioregional Assessment Impact and Risk Databases Installation Advice (IMIA Database Installation Advice v1.docx). Naming Convention of the Bioregional Assessment Impact and Risk Databases (IMIA Project Naming Convention v39.docx). Data treatments for the Bioregional Assessment Impact and Risk Databases (IMIA Project …Show full descriptionAbstract Four documents describe the specifications, methods and scripts of the Impact and Risk Analysis Databases developed for the Bioregional Assessments Programme. They are: Bioregional Assessment Impact and Risk Databases Installation Advice (IMIA Database Installation Advice v1.docx). Naming Convention of the Bioregional Assessment Impact and Risk Databases (IMIA Project Naming Convention v39.docx). Data treatments for the Bioregional Assessment Impact and Risk Databases (IMIA Project Data Treatments v02.docx). Quality Assurance of the Bioregional Assessment Impact and Risk Databases (IMIA Project Quality Assurance Protocol v17.docx). This dataset also includes the Materialised View Information Manager (MatInfoManager.zip). This Microsoft Access database is used to manage the overlay definitions of materialized views of the Impact and Risk Analysis Databases. For more information about this tool, refer to the Data Treatments document. The documentation supports all five Impact and Risk Analysis Databases developed for the assessment areas: Maranoa-Balonne-Condamine: http://data.bioregionalassessments.gov.au/dataset/69075f3e-67ba-405b-8640-96e6cb2a189a Gloucester: http://data.bioregionalassessments.gov.au/dataset/d78c474c-5177-42c2-873c-64c7fe2b178c Hunter: http://data.bioregionalassessments.gov.au/dataset/7c170d60-ff09-4982-bd89-dd3998a88a47 Namoi: http://data.bioregionalassessments.gov.au/dataset/1549c88d-927b-4cb5-b531-1d584d59be58 Galilee: http://data.bioregionalassessments.gov.au/dataset/3dbb5380-2956-4f40-a535-cbdcda129045 Purpose These documents describe end-to-end treatments of scientific data for the Impact and Risk Analysis Databases, developed and published by the Bioregional Assessment Programme. The applied approach to data quality assurance is also described. These documents are intended for people with an advanced knowledge in geospatial analysis and database administration, who seek to understand, restore or utilise the Analysis Databases and their underlying methods of analysis. Dataset History The Impact and Risk Analysis Database Documentation was created for and by the Information Modelling and Impact Assessment Project (IMIA Project). Dataset Citation Bioregional Assessment Programme (2018) Impact and Risk Analysis Database Documentation. Bioregional Assessment Source Dataset. Viewed 12 December 2018, http://data.bioregionalassessments.gov.au/dataset/05e851cf-57a5-4127-948a-1b41732d538c.
H
Replication Data for: Beating the spectroscopic Rayleigh limit via...
dataverse.harvard.edu
Updated Nov 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wiktor Krokosz; Mateusz Mazelanik; Michał Lipka; Marcin Jarzyna; Wojciech Wasilewski; Konrad Banaszek; Michał Parniak (2023). Replication Data for: Beating the spectroscopic Rayleigh limit via post-processed heterodyne detection [Dataset]. http://doi.org/10.7910/DVN/F4LRZR
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/F4LRZR
Dataset updated
Nov 16, 2023
Dataset provided by
Harvard Dataverse
Authors
Wiktor Krokosz; Mateusz Mazelanik; Michał Lipka; Marcin Jarzyna; Wojciech Wasilewski; Konrad Banaszek; Michał Parniak
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Normalized variances calculated using the method described in the article, based on experimental data. Data is stored using Xarray, specifically in the NetCDF format. Data can be easily accessed using the Xarray Python library, specifically by calling xarray.open_dataset() The dataset is structured as follows: two N-dimensional DataArrays, one corresponding for calculations with time displacements (labeled as time) and one for calculations with phase displacements with the time centroid already picked (labeled as final) each DataArray has 5 dimensions: SNR, eps (separation), ph_disp/disp (displacement), sample/sample_time (bootstrapped sample), supersample (ensemble of bootstrapped samples) coordinates label the parameters along each dimension Usage examples Opening the dataset import numpy as np import xarray as xr variances = xr.open_dataset("coherent.nc") Obtaining parameter estimates def get_centroid_indices(variances): return np.bincount( variances.argmin( dim="disp" if "disp" in variances.dims else "ph_disp" ).values.flatten() ) def get_centroid_index(variances): return np.argmax(get_centroid_indices(variances)) def epsilon_estimator(eps): return 4 * np.sqrt(np.clip(var, 0, None)) time_centroid_estimates = variances["time"].idxmin(dim="disp") phase_centroid_estimates = variances["final"].idxmin(dim="ph_disp") epsilon_estimates = eps_estimator( variances["final"].isel(ph_disp=common.get_centroid_index(variances["final"])) ) Calculating and plotting precision def plot(estimates): estimator_variances = estimates.var( dim="sample" if "sample" in estimates.dims else "sample_time" ) precision = ( 1.0 / estimator_variances.snr / variances.attrs["SAMPLE_SIZE"] / estimator_variances ) precision = precision.where(xr.apply_ufunc(np.isfinite, precision), other=0) mean_precision = precision.mean(dim="supersample") mean_precision = mean_precision.where(np.isfinite(mean_precision), 0) precision_error = 2 * precision.std(dim="supersample").fillna(0) g = mean_precision.plot.scatter( x="eps", col="snr", col_wrap=2, sharex=True, sharey=True, ) for ax, snr in zip(g.axs.flat, snrs): ax.errorbar( precision.eps.values, mean_precision.sel(snr=snr), yerr=precision_error.sel(snr=snr), fmt="o", ) plot(time_centroid_estimates) plot(phase_centroid_estimates) plot(epsilon_estimates)
u
Public benchmark dataset for Conformance Checking in Process Mining
figshare.unimelb.edu.au
melbourne.figshare.com
xml
Updated Jan 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Reissner (2022). Public benchmark dataset for Conformance Checking in Process Mining [Dataset]. http://doi.org/10.26188/5cd91d0d3adaa
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.26188/5cd91d0d3adaa
Dataset updated
Jan 30, 2022
Dataset provided by
The University of Melbourne
Authors
Daniel Reissner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a variety of publicly available real-life event logs. We derived two types of Petri nets for each event log with two state-of-the-art process miners : Inductive Miner (IM) and Split Miner (SM). Each event log-Petri net pair is intended for evaluating the scalability of existing conformance checking techniques.We used this data-set to evaluate the scalability of the S-Component approach for measuring fitness. The dataset contains tables of descriptive statistics of both process models and event logs. In addition, this dataset includes the results in terms of time performance measured in milliseconds for several approaches for both multi-threaded and single-threaded executions. Last, the dataset contains a cost-comparison of different approaches and reports on the degree of over-approximation of the S-Components approach. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/1910.09767. Update:The dataset has been extended with the event logs of the BPIC18 and BPIC19 logs. BPIC19 is actually a collection of four different processes and thus was split into four event logs. For each of the additional five event logs, again, two process models have been mined with inductive and split miner. We used the extended dataset to test the scalability of our tandem repeats approach for measuring fitness. The dataset now contains updated tables of log and model statistics as well as tables of the conducted experiments measuring execution time and raw fitness cost of various fitness approaches. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/2004.01781.Update: The dataset has also been used to measure the scalability of a new Generalization measure based on concurrent and repetitive patterns. : A concurrency oracle is used in tandem with partial orders to identify concurrent patterns in the log that are tested against parallel blocks in the process model. Tandem repeats are used with various trace reduction and extensions to define repetitive patterns in the log that are tested against loops in the process model. Each pattern is assigned a partial fulfillment. The generalization is then the average of pattern fulfillments weighted by the trace counts for which the patterns have been observed. The dataset no includes the time results and a breakdown of Generalization values for the dataset.
Z
Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping...
data.niaid.nih.gov
zenodo.org
Updated Apr 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakub Simko (2022). Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5996863
Explore at:
Dataset updated
Apr 22, 2022
Dataset provided by
Elena Stefancova
Matus Tomlein
Maria Bielikova
Robert Moro
Branislav Pecher
Ivan Srba
Jakub Simko
Description
Overview

This dataset of medical misinformation was collected and is published by Kempelen Institute of Intelligent Technologies (KInIT). It consists of approx. 317k news articles and blog posts on medical topics published between January 1, 1998 and February 1, 2022 from a total of 207 reliable and unreliable sources. The dataset contains full-texts of the articles, their original source URL and other extracted metadata. If a source has a credibility score available (e.g., from Media Bias/Fact Check), it is also included in the form of annotation. Besides the articles, the dataset contains around 3.5k fact-checks and extracted verified medical claims with their unified veracity ratings published by fact-checking organisations such as Snopes or FullFact. Lastly and most importantly, the dataset contains 573 manually and more than 51k automatically labelled mappings between previously verified claims and the articles; mappings consist of two values: claim presence (i.e., whether a claim is contained in the given article) and article stance (i.e., whether the given article supports or rejects the claim or provides both sides of the argument).

The dataset is primarily intended to be used as a training and evaluation set for machine learning methods for claim presence detection and article stance classification, but it enables a range of other misinformation related tasks, such as misinformation characterisation or analyses of misinformation spreading.

Its novelty and our main contributions lie in (1) focus on medical news article and blog posts as opposed to social media posts or political discussions; (2) providing multiple modalities (beside full-texts of the articles, there are also images and videos), thus enabling research of multimodal approaches; (3) mapping of the articles to the fact-checked claims (with manual as well as predicted labels); (4) providing source credibility labels for 95% of all articles and other potential sources of weak labels that can be mined from the articles' content and metadata.

The dataset is associated with the research paper "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" accepted and presented at ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22).

The accompanying Github repository provides a small static sample of the dataset and the dataset's descriptive analysis in a form of Jupyter notebooks.

Options to access the dataset

There are two ways how to get access to the dataset:

Static dump of the dataset available in the CSV format

Continuously updated dataset available via REST API

In order to obtain an access to the dataset (either to full static dump or REST API), please, request the access by following instructions provided below.

References

If you use this dataset in any publication, project, tool or in any other form, please, cite the following papers:

@inproceedings{SrbaMonantPlatform, author = {Srba, Ivan and Moro, Robert and Simko, Jakub and Sevcech, Jakub and Chuda, Daniela and Navrat, Pavol and Bielikova, Maria}, booktitle = {Proceedings of Workshop on Reducing Online Misinformation Exposure (ROME 2019)}, pages = {1--7}, title = {Monant: Universal and Extensible Platform for Monitoring, Detection and Mitigation of Antisocial Behavior}, year = {2019} }

@inproceedings{SrbaMonantMedicalDataset, author = {Srba, Ivan and Pecher, Branislav and Tomlein Matus and Moro, Robert and Stefancova, Elena and Simko, Jakub and Bielikova, Maria}, booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22)}, numpages = {11}, title = {Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims}, year = {2022}, doi = {10.1145/3477495.3531726}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3477495.3531726}, }

Dataset creation process

In order to create this dataset (and to continuously obtain new data), we used our research platform Monant. The Monant platform provides so called data providers to extract news articles/blogs from news/blog sites as well as fact-checking articles from fact-checking sites. General parsers (from RSS feeds, Wordpress sites, Google Fact Check Tool, etc.) as well as custom crawler and parsers were implemented (e.g., for fact checking site Snopes.com). All data is stored in the unified format in a central data storage.

Ethical considerations

The dataset was collected and is published for research purposes only. We collected only publicly available content of news/blog articles. The dataset contains identities of authors of the articles if they were stated in the original source; we left this information, since the presence of an author's name can be a strong credibility indicator. However, we anonymised the identities of the authors of discussion posts included in the dataset.

The main identified ethical issue related to the presented dataset lies in the risk of mislabelling of an article as supporting a false fact-checked claim and, to a lesser extent, in mislabelling an article as not containing a false claim or not supporting it when it actually does. To minimise these risks, we developed a labelling methodology and require an agreement of at least two independent annotators to assign a claim presence or article stance label to an article. It is also worth noting that we do not label an article as a whole as false or true. Nevertheless, we provide partial article-claim pair veracities based on the combination of claim presence and article stance labels.

As to the veracity labels of the fact-checked claims and the credibility (reliability) labels of the articles' sources, we take these from the fact-checking sites and external listings such as Media Bias/Fact Check as they are and refer to their methodologies for more details on how they were established.

Lastly, the dataset also contains automatically predicted labels of claim presence and article stance using our baselines described in the next section. These methods have their limitations and work with certain accuracy as reported in this paper. This should be taken into account when interpreting them.

Reporting mistakes in the dataset The mean to report considerable mistakes in raw collected data or in manual annotations is by creating a new issue in the accompanying Github repository. Alternately, general enquiries or requests can be sent at info [at] kinit.sk.

Dataset structure

Raw data

At first, the dataset contains so called raw data (i.e., data extracted by the Web monitoring module of Monant platform and stored in exactly the same form as they appear at the original websites). Raw data consist of articles from news sites and blogs (e.g. naturalnews.com), discussions attached to such articles, fact-checking articles from fact-checking portals (e.g. snopes.com). In addition, the dataset contains feedback (number of likes, shares, comments) provided by user on social network Facebook which is regularly extracted for all news/blogs articles.

Raw data are contained in these CSV files (and corresponding REST API endpoints):

sources.csv

articles.csv

article_media.csv

article_authors.csv

discussion_posts.csv

discussion_post_authors.csv

fact_checking_articles.csv

fact_checking_article_media.csv

claims.csv

feedback_facebook.csv

Note: Personal information about discussion posts' authors (name, website, gravatar) are anonymised.

Annotations

Secondly, the dataset contains so called annotations. Entity annotations describe the individual raw data entities (e.g., article, source). Relation annotations describe relation between two of such entities.

Each annotation is described by the following attributes:

category of annotation (annotation_category). Possible values: label (annotation corresponds to ground truth, determined by human experts) and prediction (annotation was created by means of AI method).

type of annotation (annotation_type_id). Example values: Source reliability (binary), Claim presence. The list of possible values can be obtained from enumeration in annotation_types.csv.

method which created annotation (method_id). Example values: Expert-based source reliability evaluation, Fact-checking article to claim transformation method. The list of possible values can be obtained from enumeration methods.csv.

its value (value). The value is stored in JSON format and its structure differs according to particular annotation type.

At the same time, annotations are associated with a particular object identified by:

entity type (parameter entity_type in case of entity annotations, or source_entity_type and target_entity_type in case of relation annotations). Possible values: sources, articles, fact-checking-articles.

entity id (parameter entity_id in case of entity annotations, or source_entity_id and target_entity_id in case of relation annotations).

The dataset provides specifically these entity annotations:

Source reliability (binary). Determines validity of source (website) at a binary scale with two options: reliable source and unreliable source.

Article veracity. Aggregated information about veracity from article-claim pairs.

The dataset provides specifically these relation annotations:

Fact-checking article to claim mapping. Determines mapping between fact-checking article and claim.

Claim presence. Determines presence of claim in article.

Claim stance. Determines stance of an article to a claim.

Annotations are contained in these CSV files (and corresponding REST API endpoints):

entity_annotations.csv

relation_annotations.csv

Note: Identification of human annotators authors (email provided in the annotation app) is anonymised.
e
Data from: Dataset Concerning the Process Monitoring and Condition...
data.europa.eu
snd.se
unknown
Updated Mar 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luleå Tekniska universitet (2023). Dataset Concerning the Process Monitoring and Condition Monitoring Data of a Bearing Ring Grinder [Dataset]. https://data.europa.eu/data/datasets/https-doi-org-10-5878-331q-3p13?locale=en
Explore at:
unknownAvailable download formats
Dataset updated
Mar 10, 2023
Dataset authored and provided by
Luleå Tekniska universitet
Description
In the article (Ahmer, M., Sandin, F., Marklund, P. et al., 2022), we have investigated the effective use of sensors in a bearing ring grinder for failure classification in the condition-based maintenance context. The proposed methodology combines domain knowledge of process monitoring and condition monitoring to successfully achieve failure mode prediction with high accuracy using only a few key sensors. This enables manufacturing equipment to take advantage of advanced data processing and machine learning techniques.

The grinding machine is of type SGB55 from Lidköping Machine Tools and is used to produce functional raceway surface of inner rings of type SKF-6210 deep groove ball bearing. Additional sensors like vibration, acoustic emission, force, and temperature sensors are installed to monitor machine condition while producing bearing components under different operating conditions. Data is sampled from sensors as well as the machine's numerical controller during operation. Selected parts are measured for the produced quality.

Ahmer, M., Sandin, F., Marklund, P., Gustafsson, M., & Berglund, K. (2022). Failure mode classification for condition-based maintenance in a bearing ring grinding machine. In The International Journal of Advanced Manufacturing Technology (Vol. 122, pp. 1479–1495). https://doi.org/10.1007/s00170-022-09930-6

The files are of three categories and are grouped in zipped folders. The pdf file named "readme_data_description.pdf" describes the content of the files in the folders. The "lib" includes the information on libraries to read the .tdms Data Files in Matlab or Python.

The raw time-domain sensors signal data are grouped in seven main folders named after each test run e.g. "test_1"... "test_7". Each test includes seven dressing cycles named e.g. "dresscyc_1"... "dresscyc_7". Each dressing cycle includes .tdms files for fifteen rings for their individual grinding cycle. The column description for both "Analogue" and "Digital" channels are described in the "readme_data_description.pdf" file. The machine and process parameters used for the tests as sampled from the machine's control system (Numerical Controller) and compiled for all test runs in a single file "process_data.csv" in the folder "proc_param". The column description is available in "readme_data_description.pdf" under "Process Parameters". The measured quality data (nine quality parameters - normalized) of the selected produced parts are recorded in the file "measured_quality_param.csv" under folder "quality". The description of the quality parameters is available in "readme_data_description.pdf". The quality parameter disposition based on their actual acceptance tolerances for the process step is presented in file "quality_disposition.csv" under folder "quality".
c
Overview Metadata for the Regression Model Data, Estimated Discharge Data,...
s.cnmilf.com
data.usgs.gov
+2more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Overview Metadata for the Regression Model Data, Estimated Discharge Data, and Calculated Flux and Yields Data at Tumacácori National Historical Park and the Upper Santa Cruz River, Arizona (1994-2017) [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/overview-metadata-for-the-regression-model-data-estimated-discharge-data-and-calculat-1994
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Santa Cruz River, Arizona, Tumacacori-Carmen
Description
This data release contains three different datasets that were used in the Scientific Investigations Report: Spatial and Temporal Distribution of Bacterial Indicators and Microbial Source Tracking within Tumacácori National Historical Park and the Upper Santa Cruz River, Arizona, 2015-16. These datasets contain regression model data, estimated discharge data, and calculated flux and yields data. Regression Model Data: This dataset contains data used in a regression model development in the SIR. The period of data ranged from May 25, 1994 to May 19, 2017. Data from 2015 to 2017 were collected by the U.S. Geological Survey. Data prior to 2015 were provided by various agencies. Listed below are the different data contained within this dataset: - Season represented as an indicator variable (Fall, Spring, Summer, and Winter) - Hydrologic Condition represented as an indicator variable (rising limb, recession limb, peak, or unable to classify) - Flood (binary variable indicating if the sample was collected during a flood event or not) - Decimal Date (DT) represented as a continuous variable - Sine of DT represented as a continuous variable for periodic function to describe seasonal variation - Cosine of DT represented as a continuous variable for periodic function to describe seasonal variation Estimated Discharge: This dataset contains estimated discharge at four different sites between 03/02/2015 and 12/14/2016. The discharge was estimated using nearby streamgage relations and methods are described in detail in the SIR . The sites where discharge was estimated are listed below. - NW8; 312551110573901; Nogales Wash at Ruby Road - SC3; 312654110573201; Santa Cruz River abv Nogales Wash - SC10; 313343110024701; Santa Cruz River at Santa Gertrudis Lane - SC14; 09481740; Santa Cruz River at Tubac, AZ Calculated Flux and Yields: This dataset contains calculated flux and yields for E. coli and suspended sediment concentrations. Mean daily flux was calculated when mean daily discharge was available at a corresponding streamgage. Instantaneous flux was calculated when instantaneous discharge (at 15-minute intervals) were available at a corresponding streamgage, or from a measured or estimated discharge value. The yields were calculated using the calculated flux values and the area of the different watersheds. Methods and equations are described in detail in the SIR. Listed below are the data contained within this dataset: - Mean daily E. coli flux, in most probable number per day - Mean daily suspended sediment, in flux, in tons per day - Instantaneous E. coli flux, in most probable number per second - Instantaneous suspended sediment flux, in tons per second - E. coli, in most probable number per square mile - Suspended sediment, in tons per square mile
o
Synthetic river flow videos dataset
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Feb 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guillaume Bodart; Jérôme Le Coz; Magali Jodeau; Alexandre Hauet (2022). Synthetic river flow videos dataset [Dataset]. http://doi.org/10.5281/zenodo.6257391
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6257391
Dataset updated
Feb 24, 2022
Authors
Guillaume Bodart; Jérôme Le Coz; Magali Jodeau; Alexandre Hauet
Description
######### ########## ###### ######### ########## Synthetic river flow videos for evaluating image-based velocimetry methods ###### ######### ########## ###### ######### ########## # Year : 2022 # Authors : G.Bodart (guillaume.bodart@inrae.fr), J.Le Coz (jerome.lecoz@inrae.fr), M.Jodeau (magali.jodeau@edf.fr), A.Hauet (alexandre.hauet@edf.fr) ### This file describes the data attached to the article ### -> 00_article_cases ######################## This folder contains the data used in the case studies: synthetic videos + reference files. - 00_reference_velocities -> Reference velocities interpolated on a regular grid. Data are given in conventionnal units, i.e. m/s and m. - 01_XX -> Data of the first case study - 02_XX -> Data of the second case study ### -> 01_dev ############# This folder contains the Python libraries and Mantaflow modified source code used in the paper. The libraries are provided as is. Feel free to contact us for support or guidelines. - lspiv -> Python library used to extract, process and display results of LSPIV analysis carried out with Fudaa-LSPIV - mantaflow-modified -> Modified version of Mantaflow described in the article. Installation instructions can be found at http://mantaflow.com - syri -> Python library used to extract, process and display fluid simulations carried out on Mantaflow and Blender. (Require the lspiv library) ### -> 02_dataset ################# This folder contains synthetic videos generated with the method described in the article. The fluid simulation parameters, and thus the reference velocities, are the same as those presented in the article. - The videos can be used freely. Please consider citing the corresponding paper.
d
GLO climate data stats summary
data.gov.au
researchdata.edu.au
+2more
zip
Updated Apr 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2022). GLO climate data stats summary [Dataset]. https://data.gov.au/data/dataset/afed85e0-7819-493d-a847-ec00a318e657
Explore at:
zip(8810)Available download formats
Dataset updated
Apr 13, 2022
Dataset authored and provided by
Bioregional Assessment Program
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

Various climate variables summary for all 15 subregions based on Bureau of Meteorology Australian Water Availability Project (BAWAP) climate grids. Including

Time series mean annual BAWAP rainfall from 1900 - 2012.

Long term average BAWAP rainfall and Penman Potentail Evapotranspiration (PET) from Jan 1981 - Dec 2012 for each month

Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P (precipitation); (ii) Penman ETp; (iii) Tavg (average temperature); (iv) Tmax (maximum temperature); (v) Tmin (minimum temperature); (vi) VPD (Vapour Pressure Deficit); (vii) Rn (net radiation); and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend.

Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009).

As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

There are 4 csv files here:

BAWAP_P_annual_BA_SYB_GLO.csv

Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

Source data: annual BILO rainfall

P_PET_monthly_BA_SYB_GLO.csv

long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

Climatology_Trend_BA_SYB_GLO.csv

Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

Dataset History

Dataset was created from various BAWAP source data, including Monthly BAWAP rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET, Correlation coefficient data. Data were extracted from national datasets for the GLO subregion.

BAWAP_P_annual_BA_SYB_GLO.csv

Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

Source data: annual BILO rainfall

P_PET_monthly_BA_SYB_GLO.csv

long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

Climatology_Trend_BA_SYB_GLO.csv

Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

Dataset Citation

Bioregional Assessment Programme (2014) GLO climate data stats summary. Bioregional Assessment Derived Dataset. Viewed 18 July 2018, http://data.bioregionalassessments.gov.au/dataset/afed85e0-7819-493d-a847-ec00a318e657.

Dataset Ancestors

Derived From Natural Resource Management (NRM) Regions 2010

Derived From Bioregional Assessment areas v03

Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012

Derived From Bioregional Assessment areas v01

Derived From Bioregional Assessment areas v02

Derived From GEODATA TOPO 250K Series 3

Derived From NSW Catchment Management Authority Boundaries 20130917

Derived From Geological Provinces - Full Extent

Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
A multi-modal human neuroimaging dataset for data integration: simultaneous...
openneuro.org
Updated Dec 4, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giulia Lioi; Claire Cury; Lorraine Perronnet; Marsel Mano; Elise Bannier; Anatole Lecuyer; Christian Barillot (2019). A multi-modal human neuroimaging dataset for data integration: simultaneous EEG and MRI acquisition during a motor imagery neurofeedback task: XP1 [Dataset]. http://doi.org/10.18112/openneuro.ds002336.v1.0.1
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds002336.v1.0.1
Dataset updated
Dec 4, 2019
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Giulia Lioi; Claire Cury; Lorraine Perronnet; Marsel Mano; Elise Bannier; Anatole Lecuyer; Christian Barillot
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
———————————————————————————————— ORIGINAL PAPERS ———————————————————————————————— Mano, Marsel, Anatole Lécuyer, Elise Bannier, Lorraine Perronnet, Saman Noorzadeh, and Christian Barillot. 2017. “How to Build a Hybrid Neurofeedback Platform Combining EEG and FMRI.” Frontiers in Neuroscience 11 (140). https://doi.org/10.3389/fnins.2017.00140 Perronnet, Lorraine, L Anatole, Marsel Mano, Elise Bannier, Maureen Clerc, Christian Barillot, Lorraine Perronnet, et al. 2017. “Unimodal Versus Bimodal EEG-FMRI Neurofeedback of a Motor Imagery Task.” Frontiers in Human Neuroscience 11 (193). https://doi.org/10.3389/fnhum.2017.00193.

This dataset named XP1 can be pull together with the dataset XP2 (DOI: 10.18112/openneuro.ds002338.v1.0.0). Data acquisition methods have been described in Perronnet et al. (2017, Frontiers in Human Neuroscience). Simultaneous 64 channels EEG and fMRI during right-hand motor imagery and neurofeedback (NF) were acquired in this study (as well as in XP2). For this study, 10 subjects performed three types of NF runs (bimodal EEG-fMRI NF, unimodal EEG-NF and fMRI-NF).

———————————————————————————————— EXPERIMENTAL PARADIGM ————————————————————————————————
Subjects were instructed to perform a kinaesthetic motor imagery of the right hand and to find their own strategy to control and bring the ball to the target. The experimental protocol consisted of 6 EEG-fMRI runs with a 20s block design alternating rest and task motor localizer run (task-motorloc) - 8 blocks X (20s rest+20 s task) motor imagery run without NF (task-MIpre) -5 blocks X (20s rest+20 s task) three NF runs with different NF conditions (task-eegNF, task-fmriNF, task-eegfmriNF) occurring in random order- 10 blocks X (20s rest+20 s task) motor imagery run without NF (task-MIpost) - 5 blocks X (20s rest+20 s task)

———————————————————————————————— EEG DATA ———————————————————————————————— EEG data was recorded using a 64-channel MR compatible solution from Brain Products (Brain Products GmbH, Gilching, Germany).

RAW EEG DATA

EEG was sampled at 5kHz with FCz as the reference electrode and AFz as the ground electrode, and a resolution of 0.5 microV. Following the BIDs arborescence, raw eeg data for each task can be found for each subject in

XP1/sub-xp1*/eeg

in Brain Vision Recorder format (File Version 1.0). Each raw EEG recording includes three files: the data file (.eeg), the header file (.vhdr) and the marker file (*.vmrk). The header file contains information about acquisition parameters and amplifier setup. For each electrode, the impedance at the beginning of the recording is also specified. For all subjects, channel 32 is the ECG channel. The 63 other channels are EEG channels.

The marker file contains the list of markers assigned to the EEG recordings and their properties (marker type, marker ID and position in data points). Three type of markers are relevant for the EEG processing: R128 (Response): is the fMRI volume marker to correct for the gradient artifact S 99 (Stimulus): is the protocol marker indicating the start of the Rest block S 2 (Stimulus): is the protocol marker indicating the start of the Task (Motor Execution Motor Imagery or Neurofeedback)
Warning : in few EEG data, the first S99 marker might be missing, but can be easily “added” 20 s before the first S 2.

PREPROCESSED EEG DATA

Following the BIDs arborescence, processed eeg data for each task and subject in the pre-processed data folder :

XP1/derivatives/sub-xp1*/eeg_pp/*eeg_pp.*

and following the Brain Analyzer format. Each processed EEG recording includes three files: the data file (.dat), the header file (.vhdr) and the marker file (*.vmrk), containing information similar to those described for raw data. In the header file of preprocessed data channels location are also specified. In the marker file the location in data points of the identified heart pulse (R marker) are specified as well.

EEG data were pre-processed using BrainVision Analyzer II Software, with the following steps: Automatic gradient artifact correction using the artifact template subtraction method (Sliding average calculation with 21 intervals for sliding average and all channels enabled for correction. Downsampling with factor: 25 (200 Hz) Low Pass FIR Filter:Cut-off Frequency: 50 Hz. Ballistocardiogram (pulse) artifact correction using a semiautomatic procedure (Pulse Template searched between 40 s and 240 s in the ECG channel with the following parameters:Coherence Trigger = 0.5, Minimal Amplitude = 0.5, Maximal Amplitude = 1.3. The identified pulses were marked with R. Segmentation relative to the first block marker (S 99) for all the length of the training protocol (las S 2 + 20 s).

EEG NF SCORES

Neurofeedback scores can be found in the .mat structures in

XP1/derivatives/sub-xp1*/NF_eeg/d_sub*NFeeg_scores.mat

Structures names NF_eeg are composed by the following subfields: ID : Subject ID, for example sub-xp101 lapC3_ERD : a 1x1280 vector of neurofeedback scores. 4 scores per secondes, for the whole session. eeg : a 64x80200 matrix, with the pre-processed EEG signals with the step described above, filtered between 8 and 30 Hz. lapC3_bandpower_8Hz_30Hz : 1x1280 vector. Bandpower of the filtered signal with a laplacian centred on C3, used to estimate the lapC3_ERD. lapC3_filter : 1x64 vector. Laplacian filter centred on C3 channel.

———————————————————————————————— BOLD fMRI DATA ———————————————————————————————— All DICOM files were converted to Nifti-1 and then in BIDs format (version 2.1.4) using the software dcm2niix (version v1.0.20190720 GVV7.4.0)

fMRI acquisitions were performed using echo- planar imaging (EPI) and covering the entire brain with the following parameters

3T Siemens Verio EPI sequence TR=2 s TE=23 ms Resolution 2x2x4 mm3 FOV = 210×210mm2 N of slices: 32 No slice gap

As specified in the relative task event files in XP1\ *events.tsv files onset, the scanner began the EPI pulse sequence two seconds prior to the start of the protocol (first rest block), so the the first two TRs should be discarded. The useful TRs for the runs are therefore

task-motorloc: 320 s (2 to 322) task-MIpre and task-MIpost: 200 s (2 to 202) task-eegNF, task-fmriNF, task-eegfmriNF: 400 s (2 to 402)

In task events files for the different tasks, each column represents:

'onset': onset time (sec) of an event

'duration': duration (sec) of the event

'trial_type': trial (block) type: rest or task (Rest, Task-ME, Task-MI, Task-NF)

''stim_file’: image presented in a stimulus block: during Rest, Motor Imagery (Task-MI) or Motor execution (Task-ME) instructions were presented. On the other hand, during Neurofeedback blocks (Task-NF) the image presented was a ball moving in a square that the subject could control self-regulating his EEG and/or fMRI brain activity.

Following the BIDs arborescence, the functional data and relative metadata are found for each subject in the following directory

XP1/sub-xp1*/func

BOLD-NF SCORES

For each subject and NF session, a matlab structure with BOLD-NF features can be found in

XP1/derivatives/sub-xp1*/NF_bold/

In view of BOLD-NF scores computation, fMRI data were preprocessed using AutoMRI, a software based on spm8 and with the following steps: slice-time correction, spatial realignment and coregistration with the anatomical scan, spatial smoothing with a 6 mm Gaussian kernel and normalization to the Montreal Neurological Institute template For each session, a first level general linear model analysis modeling was then performed. The resulting activation maps (voxel-wise Family-Wise error corrected at p < 0.05) were used to define two ROIs (9x9x3 voxels) around the maximum of activation in the ipsilesional primary motor area (M1) and supplementary motor area (SMA) respectively.

The BOLD-NF scores were calculated as the difference between percentage signal change in the two ROIs (SMA and M1) and a large deep background region (slice 3 out of 16) whose activity is not correlated with the NF task. A smoothed version of the NF scores over the precedent three volumes was also computed.

The NF_boldi structure has the following structure

NF_bold → .m1 → .nf → .smoothnf
→ .roimean (averaged BOLD signal in the ROI) → .bgmean (averaged BOLD signal in the background slice) → .method
NFscores.fmri → .sma→ .nf → .smoothnf
→ .roimean (averaged BOLD signal in the ROI) → .bgmean (averaged BOLD signal in the background slice) → .method

Where the subfield method contains information about the ROI size (.roisize), the background mask (.bgmask) and ROI mask (.roimask).

More details about signal processing and NF calculation can be found in Perronnet et al. 2017 and Perronnet et al. 2018.

———————————————————————————————— ANATOMICAL MRI DATA ———————————————————————————————— As a structural reference for the fMRI analysis, a high resolution 3D T1 MPRAGE sequence was acquired with the following parameters

3T Siemens Verio 3D T1 MPRAGE TR=1.9 s TE=22.6
CVEfixes Dataset
kaggle.com
Updated Jun 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Girish (2023). CVEfixes Dataset [Dataset]. https://www.kaggle.com/datasets/girish17019/cvefixes-vulnerable-and-fixed-code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 13, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Girish
Description
Context

CVEfixes is a comprehensive vulnerability dataset that is automatically collected and curated from Common Vulnerabilities and Exposures (CVE) records in the public U.S. National Vulnerability Database (NVD). The goal is to support data-driven security research based on source code and source code metrics related to fixes for CVEs in the NVD by providing detailed information at different interlinked levels of abstraction, such as the commit-, file-, and method level, as well as the repository- and CVE level.

This dataset is a preprocessed version of the CVEfixes dataset provided at the following link: https://zenodo.org/record/7029359

File Information

This dataset consists of two files: - CVEFixes.csv : The preprocessed dataset. - LICENSE.txt : The license information of this dataset.

Column Description

In the CVEFixes.csv, there are three columns: - code : The source code of the data point. - language : The programming language of the source code (c, java, php, etc) - safety : Whether the code is vulnerable or safe.
Web Data Commons Phones Dataset, Augmented Version, Fixed Splits
linkagelibrary.icpsr.umich.edu
delimited
Updated Nov 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Primpeli; Christian Bizer (2020). Web Data Commons Phones Dataset, Augmented Version, Fixed Splits [Dataset]. http://doi.org/10.3886/E127243V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E127243V1
Dataset updated
Nov 23, 2020
Dataset provided by
University of Mannheim (Germany)
Authors
Anna Primpeli; Christian Bizer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Motivation: Entity Matching is the task of determining which records from different data sources describe the same real-world entity. It is an important task for data integration and has been the focus of many research works. A large number of entity matching/record linkage tasks has been made available for evaluating entity matching methods. However, the lack of fixed development and test splits as well as correspondence sets including both matching and non-matching record pairs hinders the reproducibility and comparability of benchmark experiments. In an effort to enhance the reproducibility and comparability of the experiments, we complement existing entity matching benchmark tasks with fixed sets of non-matching pairs as well as fixed development and test splits. Dataset Description: An augmented version of the wdc phones dataset for benchmarking entity matching/record linkage methods found at:http://webdatacommons.org/productcorpus/index.html#toc4 The augmented version adds fixed splits for training, validation and testing as well as their corresponding feature vectors. The feature vectors are built using data type specific similarity metrics.The dataset contains 447 records describing products deriving from 17 e-shops which are matched against a product catalog of 50 products. The gold standards have manual annotations for 258 matching and 22,092 non-matching pairs. The total number of attributes used to decribe the product records are 26 while the attribute density is 0.25. The augmented dataset enhances the reproducibility of matching methods and the comparability of matching results. The dataset is part of the CompERBench repository which provides 21 complete benchmark tasks for entity matching for public download: http://data.dws.informatik.uni-mannheim.de/benchmarkmatchingtasks/index.html
m
Cooper Basin Drawdown Calculations - De Glee method
demo.dev.magda.io
data.gov.au
+1more
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2023). Cooper Basin Drawdown Calculations - De Glee method [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-f2836b32-a4e8-4968-a09b-b0199058b92a
Explore at:
Dataset updated
Aug 8, 2023
Dataset provided by
Bioregional Assessment Program
Description
Abstract This dataset was derived by the Bioregional Assessment Programme. The parent datasets are identified in the Lineage field in this metadata statement. The process undertaken to product this …Show full descriptionAbstract This dataset was derived by the Bioregional Assessment Programme. The parent datasets are identified in the Lineage field in this metadata statement. The process undertaken to product this derived dataset are described in the History field in this metadata statement. This dataset was created in order to calculate the steady-state drawdown extent for potential CSG developments in the Cooper subregion. The calculations are made using the de Glee method, as described in Krusemann (1994) in accordance with the SA Far North Water Allocation Plan (2009) explanation document. The data was created as part of the conceptual modelling for causal pathways to assess the potential for CSG development-related impacts to propagate via groundwater drawdown. Dataset History The dataset assumes water production rates for CSG in the Cooper subregion based on production rates in other eastern Australian Permian CSG fields, reported in Onshore co-produced water: extent and management (http://data.bioregionalassessments.gov.au/dataset/6b3d8096-f09d-40a2-b5ee-09c9f8b9bdfc); and Fell, (2013) Discussion paper for Office of NSW Chief Scientist and Engineer: Water treatment and coal seam gas. (http://data.bioregionalassessments.gov.au/dataset/714e35df-76bb-4a5d-b5d8-0fcc65329dfe); Distance-drawdown was calculated using the de Glee method for steady-state drawdown described in Krusemann (1994) Analysis and Evaluation of Pumping Test Data. (http://data.bioregionalassessments.gov.au/dataset/c66b744e-e9bb-4d88-a82c-a6593efe91d2) Dataset Citation Bioregional Assessment Programme (2015) Cooper Basin Drawdown Calculations - De Glee method. Bioregional Assessment Derived Dataset. Viewed 27 November 2017, http://data.bioregionalassessments.gov.au/dataset/51323ab4-3613-47eb-acf3-3d89e8b9c062. Dataset Ancestors Derived From Discussion paper for Office of NSW Chief Scientist and Engineer: Water treatment and coal seam gas. Derived From Analysis and Evaluation of Pumping Test Data. Derived From Onshore co-produced water: extent and management
g
California Land Ownership
gimi9.com
data.cnra.ca.gov
+8more
Updated May 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). California Land Ownership [Dataset]. https://gimi9.com/dataset/data-gov_california-land-ownership-b6394
Explore at:
Dataset updated
May 15, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
California
Description
This dataset was updated April, 2024. This ownership dataset was generated primarily from CPAD data, which already tracks the majority of ownership information in California. CPAD is utilized without any snapping or clipping to FRA/SRA/LRA. CPAD has some important data gaps, so additional data sources are used to supplement the CPAD data. Currently this includes the most currently available data from BIA, DOD, and FWS. Additional sources may be added in subsequent versions. Decision rules were developed to identify priority layers in areas of overlap. Starting in 2022, the ownership dataset was compiled using a new methodology. Previous versions attempted to match federal ownership boundaries to the FRA footprint, and used a manual process for checking and tracking Federal ownership changes within the FRA, with CPAD ownership information only being used for SRA and LRA lands. The manual portion of that process was proving difficult to maintain, and the new method (described below) was developed in order to decrease the manual workload, and increase accountability by using an automated process by which any final ownership designation could be traced back to a specific dataset. The current process for compiling the data sources includes: * Clipping input datasets to the California boundary * Filtering the FWS data on the Primary Interest field to exclude lands that are managed by but not owned by FWS (ex: Leases, Easements, etc) * Supplementing the BIA Pacific Region Surface Trust lands data with the Western Region portion of the LAR dataset which extends into California. * Filtering the BIA data on the Trust Status field to exclude areas that represent mineral rights only. * Filtering the CPAD data on the Ownership Level field to exclude areas that are Privately owned (ex: HOAs) * In the case of overlap, sources were prioritized as follows: FWS > BIA > CPAD > DOD * As an exception to the above, DOD lands on FRA which overlapped with CPAD lands that were incorrectly coded as non-Federal were treated as an override, such that the DOD designation could win out over CPAD. In addition to this ownership dataset, a supplemental _source dataset is available which designates the source that was used to determine the ownership in this dataset. Data Sources: * GreenInfo Network's California Protected Areas Database (CPAD2023a). https://www.calands.org/cpad/; https://www.calands.org/wp-content/uploads/2023/06/CPAD-2023a-Database-Manual.pdf * US Fish and Wildlife Service FWSInterest dataset (updated December, 2023). https://gis-fws.opendata.arcgis.com/datasets/9c49bd03b8dc4b9188a8c84062792cff_0/explore * Department of Defense Military Bases dataset (updated September 2023) https://catalog.data.gov/dataset/military-bases * Bureau of Indian Affairs, Pacific Region, Surface Trust and Pacific Region Office (PRO) land boundaries data (2023) via John Mosley John.Mosley@bia.gov * Bureau of Indian Affairs, Land Area Representations (LAR) and BIA Regions datasets (updated Oct 2019) https://biamaps.doi.gov/bogs/datadownload.html Data Gaps & Changes: Known gaps include several BOR, ACE and Navy lands which were not included in CPAD nor the DOD MIRTA dataset. Our hope for future versions is to refine the process by pulling in additional data sources to fill in some of those data gaps. Additionally, any feedback received about missing or inaccurate data can be taken back to the appropriate source data where appropriate, so fixes can occur in the source data, instead of just in this dataset. 24_1: Input datasets this year included numerous changes since the previous version, particularly the CPAD and DOD inputs. Of particular note was the re-addition of Camp Pendleton to the DOD input dataset, which is reflected in this version of the ownership dataset. We were unable to obtain an updated input for tribral data, so the previous inputs was used for this version. 23_1: A few discrepancies were discovered between data changes that occurred in CPAD when compared with parcel data. These issues will be taken to CPAD for clarification for future updates, but for ownership23_1 it reflects the data as it was coded in CPAD at the time. In addition, there was a change in the DOD input data between last year and this year, with the removal of Camp Pendleton. An inquiry was sent for clarification on this change, but for ownership23_1 it reflects the data per the DOD input dataset. 22_1 : represents an initial version of ownership with a new methodology which was developed under a short timeframe. A comparison with previous versions of ownership highlighted the some data gaps with the current version. Some of these known gaps include several BOR, ACE and Navy lands which were not included in CPAD nor the DOD MIRTA dataset. Our hope for future versions is to refine the process by pulling in additional data sources to fill in some of those data gaps. In addition, any topological errors (like overlaps or gaps) that exist in the input datasets may thus carry over to the ownership dataset. Ideally, any feedback received about missing or inaccurate data can be taken back to the relevant source data where appropriate, so fixes can occur in the source data, instead of just in this dataset.
d
HUN GW Uncertainty Analysis v01
data.gov.au
data.wu.ac.at
zip
Updated Jun 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2022). HUN GW Uncertainty Analysis v01 [Dataset]. https://data.gov.au/dataset/3b9239f2-561b-47f4-b5f5-eb3bea4bdd47
Explore at:
zipAvailable download formats
Dataset updated
Jun 27, 2022
Dataset provided by
Bioregional Assessment Program
License
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Description
Abstract The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The …Show full descriptionAbstract The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement. This dataset contains all the scripts used to carry out the uncertainty analysis for the maximum drawdown and time to maximum drawdown at the groundwater receptors in the Hunter bioregion and all the resulting posterior predictions. This is described in product 2.6.2 Groundwater numerical modelling (Herron et al. 2016). See History for a detailed explanation of the dataset contents. References: Herron N, Crosbie R, Peeters L, Marvanek S, Ramage A and Wilkins A (2016) Groundwater numerical modelling for the Hunter subregion. Product 2.6.2 for the Hunter subregion from the Northern Sydney Basin Bioregional Assessment. Department of the Environment, Bureau of Meteorology, CSIRO and Geoscience Australia, Australia. Dataset History This dataset uses the results of the design of experiment runs of the groundwater model of the Hunter subregion to train emulators to (a) constrain the prior parameter ensembles into the posterior parameter ensembles and to (b) generate the predictive posterior ensembles of maximum drawdown and time to maximum drawdown. This is described in product 2.6.2 Groundwater numerical modelling (Herron et al. 2016). A flow chart of the way the various files and scripts interact is provided in HUN_GW_UA_Flowchart.png (editable version in HUN_GW_UA_Flowchart.gliffy). R-script HUN_DoE_Parameters.R creates the set of parameters for the design of experiment in HUN_DoE_Parameters.csv. Each of these parameter combinations is evaluated with the groundwater model (dataset HUN GW Model v01). Associated with this spreadsheet is file HUN_GW_Parameters.csv. This file contains, for each parameter, if it is included in the sensitivity analysis, tied to another parameters, the initial value and range, the transformation, the type of prior distribution with its mean and covariance structure. The results of the design of experiment model runs are summarised in files HUN_GW_dmax_DoE_Predictions.csv, HUN_GW_tmax_DoE_Predictions.csv, HUN_GW_DoE_Observations.csv, HUN_GW_DoE_mean_BL_BF_hist.csv which have the maximum additional drawdown, the time to maximum additional drawdown for each receptor and the simulated equivalents to observed groundwater levels and SW-GW fluxes respectively. These are generated with post-processing scripts in dataset HUN GW Model v01 from the output (as exemplified in dataset HUN GW Model simulate ua999 pawsey v01). Spreadsheets HUN_GW_dmax_Predictions.csv and HUN_GW_tmax_Predictions.csv capture additional information on each prediction; the name of the prediction, transformation, min, max and median of design of experiment, a boolean to indicate the prediction is to be included in the uncertainty analysis, the layer it is assigned to and which objective function to use to constrain the prediction. Spreadsheet HUN_GW_Observations.csv has additional information on each observation; the name of the observation, a boolean to indicate to use the observation, the min and max of the design of experiment, a metadata statement describing the observation, the spatial coordinates, the observed value and the number of observations at this location (from dataset HUN bores v01). Further it has the distance of each bore to the nearest blue line network and the distance to each prediction (both in km). Spreadsheet HUN_GW_mean_BL_BF_hist.csv has similar information, but on the SW-GW flux. The observed values are from dataset HUN Groundwater Flowrate Time Series v01 These files are used in script HUN_GW_SI.py to generate sensitivity indices (based on the Plischke et al. (2013) method) for each group of observations and predictions. These indices are saved in spreadsheets HUN_GW_dmax_SI.csv, HUN_GW_tmax_SI.csv, HUN_GW_hobs_SI.py, HUN_GW_mean_BF_hist_SI.csv Script HUN_GW_dmax_ObjFun.py calculates the objective function values for the design of experiment runs. Each prediction has a tailored objective function which is a weighted sum of the residuals between observations and predictions with weights based on the distance between observation and prediction. In addition to that there is an objective function for the baseflow rates. The results are stored in HUN_GW_DoE_ObjFun.csv and HUN_GW_ObjFun.csv. The latter files are used in scripts HUN_GW_dmax_CreatePosteriorParameters.R to carry out the Monte Carlo sampling of the prior parameter distributions with the Approximate Bayesian Computation methodology as described in Herron et al (2016) by generating and applying emulators for each objective function. The scripts use the scripts in dataset R-scripts for uncertainty analysis v01. These files are run on the high performance computation cluster machines with batch file HUN_GW_dmax_CreatePosterior.slurm. These scripts result in posterior parameter combinations for each objective function, stored in directory PosteriorParameters, with filename convention HUN_GW_dmax_Posterior_Parameters_OO_$OFName$.csv where $OFName$ is the name of the objective function. Python script HUN_GW_PosteriorParameters_Percentiles.py summarizes these posterior parameter combinations and stores the results in HUN_GW_PosteriorParameters_Percentiles.csv. The same set of spreadsheets is used to test convergence of the emulator performance with script HUN_GW_emulator_convergence.R and batch file HUN_GW_emulator_convergence.slurm to produce spreadsheet HUN_GW_convergence_objfun_BF.csv. The posterior parameter distributions are sampled with scripts HUN_GW_dmax_tmax_MCsampler.R and associated .slurm batch file. The script create and apply an emulator for each prediction. The emulator and results are stored in directory Emulators. This directory is not part of the this dataset but can be regenerated by running the scripts on the high performance computation clusters. A single emulator and associated output is included for illustrative purposes. Script HUN_GW_collate_predictions.csv collates all posterior predictive distributions in spreadsheets HUN_GW_dmax_PosteriorPredictions.csv and HUN_GW_tmax_PosteriorPredictions.csv. These files are further summarised in spreadsheet HUN_GW_dmax_tmax_excprob.csv with script HUN_GW_exc_prob. This spreadsheet contains for all predictions the coordinates, layer, number of samples in the posterior parameter distribution and the 5th, 50th and 95th percentile of dmax and tmax, the probability of exceeding 1 cm and 20 cm drawdown, the maximum dmax value from the design of experiment and the threshold of the objective function and the acceptance rate. The script HUN_GW_dmax_tmax_MCsampler.R is also used to evaluate parameter distributions HUN_GW_dmax_Posterior_Parameters_HUN_OF_probe439.csv and HUN_GW_dmax_Posterior_Parameters_Mackie_OF_probe439.csv. These are, for one predictions, different parameter distributions, in which the latter represents local information. The corresponding dmax values are stored in HUN_GW_dmax_probe439_HUN.csv and HUN_GW_dmax_probe439_Mackie.csv Dataset Citation Bioregional Assessment Programme (XXXX) HUN GW Uncertainty Analysis v01. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/c25db039-5082-4dd6-bb9d-de7c37f6949a. Dataset Ancestors Derived From HUN GW Model code v01 Derived From Hydstra Groundwater Measurement Update - NSW Office of Water, Nov2013 Derived From Groundwater Economic Elements Hunter NSW 20150520 PersRem v02 Derived From NSW Office of Water - National Groundwater Information System 20140701 Derived From Travelling Stock Route Conservation Values Derived From HUN GW Model v01 Derived From NSW Wetlands Derived From Climate Change Corridors Coastal North East NSW Derived From Communities of National Environmental Significance Database - RESTRICTED - Metadata only Derived From Climate Change Corridors for Nandewar and New England Tablelands Derived From National Groundwater Dependent Ecosystems (GDE) Atlas Derived From Fauna Corridors for North East NSW Derived From R-scripts for uncertainty analysis v01 Derived From Asset database for the Hunter subregion on 27 August 2015 Derived From Hunter CMA GDEs (DRAFT DPI pre-release) Derived From Estuarine Macrophytes of Hunter Subregion NSW DPI Hunter 2004 Derived From Birds Australia - Important Bird Areas (IBA) 2009 Derived From Camerons Gorge Grassy White Box Endangered Ecological Community (EEC) 2008 Derived From Asset database for the Hunter subregion on 16 June 2015 Derived From Spatial Threatened Species and Communities (TESC) NSW 20131129 Derived From Gippsland Project boundary Derived From Bioregional Assessment areas v04 Derived From Asset database for the Hunter subregion on 24 February 2016 Derived From Natural Resource Management (NRM) Regions 2010 Derived From Gosford Council Endangered Ecological Communities (Umina woodlands) EEC3906 Derived From NSW Office of Water Surface Water Offtakes - Hunter v1 24102013 Derived From National Groundwater Dependent Ecosystems (GDE) Atlas (including WA) Derived From Bioregional Assessment areas v03 Derived From HUN groundwater flow rate time series v01 Derived From Asset list for Hunter - CURRENT Derived From NSW Office of Water Surface Water Entitlements Locations v1_Oct2013 Derived From Species Profile and Threats Database (SPRAT) - Australia - Species of National Environmental Significance Database (BA subset - RESTRICTED - Metadata only) Derived From HUN GW Model simulate ua999 pawsey v01 Derived From Northern Rivers CMA GDEs (DRAFT DPI
m
Impact Modes and Effects Analysis for the GLO subregion
demo.dev.magda.io
cloud.csiss.gmu.edu
+3more
zip
Updated Apr 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2022). Impact Modes and Effects Analysis for the GLO subregion [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-ff86d7bd-4716-4515-bfe8-26f7fc346e3b
Explore at:
zipAvailable download formats
Dataset updated
Apr 13, 2022
Dataset provided by
Bioregional Assessment Program
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract This dataset was created within the Bioregional Assessment Programme. Data has not been derived from any source datasets. Metadata has been compiled by the Bioregional Assessment Programme. In this dataset we describe the application of Impact Modes and Effects Analysis (IMEA) to the hazards associated with coal seam gas and coal mining operations in the Gloucester subregion. Attention is restricted to water mediated hazards, i.e. hazards that might lead directly or indirectly to …Show full descriptionAbstract This dataset was created within the Bioregional Assessment Programme. Data has not been derived from any source datasets. Metadata has been compiled by the Bioregional Assessment Programme. In this dataset we describe the application of Impact Modes and Effects Analysis (IMEA) to the hazards associated with coal seam gas and coal mining operations in the Gloucester subregion. Attention is restricted to water mediated hazards, i.e. hazards that might lead directly or indirectly to impacts on groundwater or surface water, and the assets that depend on them. All other hazards, for example the effects of air quality, are explicitly excluded. Full details of the hazard analysis process are described in "M11Systematic analysis of water-related hazards associated with coal resource development" available at http://bioregionalassessments.gov.au/methods/submethodologies Dataset History Full details of the hazard analysis process are described in M 11: Systematic analysis of water-related hazards associated with coal resource development available at http://bioregionalassessments.gov.au/methods/submethodologies Dataset Citation Bioregional Assessment Programme (2015) Impact Modes and Effects Analysis for the GLO subregion. Bioregional Assessment Source Dataset. Viewed 18 July 2018, http://data.bioregionalassessments.gov.au/dataset/52adbf75-b695-49fe-9d7b-b34ded9feb3a.
d
MUSICA IASI / RemoTeC TROPOMI fused methane data set (version 2.0) - Dataset...
b2find.dkrz.de
Updated Nov 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). MUSICA IASI / RemoTeC TROPOMI fused methane data set (version 2.0) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/0775ec6b-04e7-5648-b643-3182cacee812
Explore at:
Dataset updated
Nov 2, 2023
Area covered
Iași
Description
This version 2.0 MUSICA IASI / RemoTeC TROPOMI fused methane data set contains total (ground – top of atmosphere, variable ), tropospheric (ground – about 6 km a.s.l., variable ), and UTLS (upper tropospheric/lower stratospheric, about 6 – 20 km a.s.l., variable ) column-averaged dry-air mole fractions of methane (CH4). The data are obtained by combining the level 2 CH4 profiles and XCH4 total columns (generated from the IASI TIR spectra and the TROPOMI NIR/SWIR spectra, respectively). The level 2 CH4 profiles were generated by the MUSICA processor (version 3.3.0) and the level 2 XCH4 total columns by the RemoTeC processor (operational processing algorithm version 2.3.1, this version includes data over ocean in glint mode). The combination is realized by means of a Kalman filter that uses the MUSICA IASI data as the background and the TROPOMI data as the new observation. Details of the combination method, the IASI and TROPOMI collocation requirements, and the data quality are described in Schneider et al. (2022, https://doi.org/10.5194/amt-15-4339-2022). The data cover an example period for northern hemispheric winter and summer conditions (01 January – 30 January 2020 and 01 July – 30 July 2020, respectively). The only difference of this version 2.0 to the version 1.0 of the fused MUSICA IASI / RemoTeC TROPOMI data set (accessible at https://doi.org/10.35097/689) is the use of TROPOMI RemoTeC operational processing version 2.3.1 (instead of version 2.2.0), which offers among others additional data availability over ocean. Metop IASI and Sentinel-5 Precursor TROPOMI The data fusion method is described in detail in Schneider et al. (2022, https://doi.org/10.5194/amt-15-4339-2022). The used the RemoTeC TROPOMI XCH4 data (operational processing algorithm version 2.3.1) are described in Lorente et al. (2022, https://doi.org/10.5194/amt-2022-197). The used MUSICA IASI CH4 profile data (processing version 3.3.0) are described in Schneider et al. (2022, https://doi.org/10.5194/essd-14-709-2022).
P
Data for "Image-based Backbone Reconstruction for Non-Slender Soft Robots"...
paperswithcode.com
zenodo.org
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leon Schindler; Kristin M. de Payrebrune (2024). Data for "Image-based Backbone Reconstruction for Non-Slender Soft Robots" Dataset [Dataset]. https://paperswithcode.com/dataset/data-for-image-based-backbone-reconstruction
Explore at:
Dataset updated
Oct 14, 2024
Authors
Leon Schindler; Kristin M. de Payrebrune
Description
Data for "Image-based Backbone Reconstruction for Non-Slender Soft Robots" This dataset provides the data for the forthcoming paper "Image-based Backbone Reconstruction for Non-Slender Soft Robots". The backbone reconstruction method used is based on the method described in Hoffmann et al. 1. The modifications to this method to support the non-slender soft robot in this dataset are described in the forthcoming paper mentioned above. This dataset holds raw images of pressurized and elongated soft robots and the corresponding reconstructed backbones.

Dataset The dataset is split into two subsets with similar structure. The first subset is contained in dataset_01. The second dataset is contained in dataset_02.

Each subset consists of five folders and one schedule file. The schedule file schedule.csv contains the index of the schedule entry, the angle $\alpha$ in degree, the pressure of each chamber $p_1$ to $p_3$ in bar and if the pressurization is active. Furthermore, the five folders of the subset can be described as follows

raw: Contains the raw cropped images. The filenames are formatted as CROPPED_C{CAMERA_INDEX}_E{SCHEDULE_ENTRY}.png with the camera index CAMERA_INDEX and the schedule entry SCHEDULE_ENTRY.

constant_curvature_slender, constant_curvature_volumetric, cubic_curvature_slender and cubic_curvature_volumetric. These folders contain the actual reconstructed backbones based on the raw data from the raw folder. A different reconstruction approach was used in each of these folders

constant_curvature_slender - A constant curvature backbone kinematic based on the slender model, constant_curvature_volumetric - A constant curvature backbone kinematic based on the volumetric model, cubic_curvature_slender - A cubic curvature backbone kinematic based on the slender model, cubic_curvature_volumetric - A cubic curvature backbone kinematic based on the volumetric model.

Each of these folders contain a data and figures folder. The data folder consists of PARAMETER_E{SCHEDULE_ENTRY}.json files listing the optimization parameters for each schedule entry SCHEDULE_ENTRY in the JSON format. The figures folder contains annotated images of the reconstructed backbone on the cropped raw images. The filenames are structured ANNOTATED_E{SCHEDULE_ENTRY}_C{CAMERA_INDEX}_EPOCH{EPOCH}.png with the schedule entry SCHEDULE_ENTRY, the camera index CAMERA_INDEX and the epoch EPOCH of the optimization algorithm.

The optimization parameters include the base position base_position of the reconstructed backbone in world coordinates, the coefficients for the curvature polynomials ux and uy, and the constant coefficient for the elongation polynomial la.

Calibration Data The calibration data is located in the calibration folder and consists of multiple .npy files in the numpy format. The corresponding camera index for the calibrated camera is abbreviated with CAMERA_INDEX in the following:

C{CAMERA_INDEX}.npy - Stores the reprojection error, camera matrix, distortion coefficients, rotation, and translation vectors as returned by the cv2.calibrateCamera 2 method. C{CAMERA_INDEX}_camera_matrix.npy - Stores the camera_matrix as returned by the cv2.calibrateCamera 2 method. C{CAMERA_INDEX}_distortion_coefficients.npy - Stores the distortion coefficients as returned by the cv2.calibrateCamera 2 method. C{CAMERA_INDEX}_projection_matrix.npy - Stores the projection matrix from world space to pixel space based on the stereo camera calibration. STEREO.npy - Stores the reprojection error, R, T, E, F as returned by the cv2.stereoCalibrate 2 method as an object datatype.

Acknowledgement Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 501861263 – SPP2353

References 1 M. K. Hoffmann, J. Mühlenhoff, Z. Ding, T. Sattel and K. Flaßkamp. An iterative closest point algorithm for marker-free 3D shape registration of continuum robots. arXiv. https://arxiv.org/abs/2405.15336

2 OpenCV. Camera Calibration and 3D Reconstruction. OpenCV Documentation. https://docs.opencv.org/4.x/d9/d0c/group_calib3d.html, accessed May 27, 2024.
Data: study populations - location, wing length, monitoring method, tide
figshare.com
txt
Updated Feb 3, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Bulla (2016). Data: study populations - location, wing length, monitoring method, tide [Dataset]. http://doi.org/10.6084/m9.figshare.1536260.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1536260.v4
Dataset updated
Feb 3, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Martin Bulla
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
--------------------------------------------------------------------------------------------------------# Description of the dataset "study-populations_location_wing-length_monitoring-method_tide.csv"#--------------------------------------------------------------------------------------------------------# The dataset contains estimates of mean female wing length for breeding and wintering populations of biparental shorebirds described from .....# Questions can be directed to: Martin Bulla (bulla.mar@gmail.com)# Values are separated by semi-colon.# Missing values are NA. 1. scinam : scientific name of the species 2. species : four letter abbreviatio of the species's English name 3. study_site : name of the study site 4. site_abbreviation : four letter abbreviation of the study site 5. type : was the study site at the breeding ground (breeding) or not (wintering) 6. lat : latitude of the study site (decimal) 7. lon : longitude of the study site (decimal) 8. tidal_habitat : is the study site at primarily tidal habitat (y=yes, n=no) 9. tidal_used : if the study site is primarily tidal habitat, do the birds use it for foraging (y=yes, n=no) 10. sexing_method : identifies the sexing method of the individuals used for the mean estimate 11. mean_female_wing : mean female wing length for the population 12. f_wing_N : sample size used for the mean estimate 13. mean_male_wing : mean male wing length for the population 14. m_wing_N : sample size used for the mean estimate 15. data_source : is the mean estimate based on the primary data ("our primary data") or literature (citation)#--------------------------------------------------------------------------------------------------------#WHEN USING THIS DATA, PLEASE CITE:#Bulla et al (2016). Data: study populations - location, wing length, monitoring method, tide. figshare. http://dx.doi.org/10.6084/m9.figshare.1536260. Retrieved ADD DATETIME.#--------------------------------------------------------------------------------------------------------

Facebook

Twitter

Click to copy link

Link copied

Cite

data.staging.idas-ds1.appdat.jsc.nasa.gov (2025). Discovering System Health Anomalies using Data Mining Techniques [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/discovering-system-health-anomalies-using-data-mining-techniques

Data from: Discovering System Health Anomalies using Data Mining Techniques

Explore at:

Dataset updated

Feb 18, 2025

Dataset provided by

NASAhttp://nasa.gov/

Description

We discuss a statistical framework that underlies envelope detection schemes as well as dynamical models based on Hidden Markov Models (HMM) that can encompass both discrete and continuous sensor measurements for use in Integrated System Health Management (ISHM) applications. The HMM allows for the rapid assimilation, analysis, and discovery of system anomalies. We motivate our work with a discussion of an aviation problem where the identification of anomalous sequences is essential for safety reasons. The data in this application are discrete and continuous sensor measurements and can be dealt with seamlessly using the methods described here to discover anomalous flights. We specifically treat the problem of discovering anomalous features in the time series that may be hidden from the sensor suite and compare those methods to standard envelope detection methods on test data designed to accentuate the differences between the two methods. Identification of these hidden anomalies is crucial to building stable, reusable, and cost-efficient systems. We also discuss a data mining framework for the analysis and discovery of anomalies in high-dimensional time series of sensor measurements that would be found in an ISHM system. We conclude with recommendations that describe the tradeoffs in building an integrated scalable platform for robust anomaly detection in ISHM applications.

Clear search

Close search

Google apps

Main menu

Data from: Discovering System Health Anomalies using Data Mining Techniques

Dataset: A Systematic Literature Review on the topic of High-value datasets

Impact and Risk Analysis Database Documentation

Replication Data for: Beating the spectroscopic Rayleigh limit via...

Public benchmark dataset for Conformance Checking in Process Mining

Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping...

Data from: Dataset Concerning the Process Monitoring and Condition...

Overview Metadata for the Regression Model Data, Estimated Discharge Data,...

Synthetic river flow videos dataset

GLO climate data stats summary

Abstract

Dataset History

Dataset Citation

Dataset Ancestors

A multi-modal human neuroimaging dataset for data integration: simultaneous...

CVEfixes Dataset

Context

File Information

Column Description

Web Data Commons Phones Dataset, Augmented Version, Fixed Splits

Cooper Basin Drawdown Calculations - De Glee method

California Land Ownership

HUN GW Uncertainty Analysis v01

Impact Modes and Effects Analysis for the GLO subregion

MUSICA IASI / RemoTeC TROPOMI fused methane data set (version 2.0) - Dataset...

Data for "Image-based Backbone Reconstruction for Non-Slender Soft Robots"...

Data: study populations - location, wing length, monitoring method, tide

Data from: Discovering System Health Anomalies using Data Mining Techniques