7 datasets found

XBT and CTD pairs dataset Version 2
researchdata.edu.au
data.csiro.au
datadownload
Updated Oct 16, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Susan Wijffels; Franco Reseghetti; Zanna Chase; Mark Rosenberg; Steve Rintoul; Rebecca Cowley (2014). XBT and CTD pairs dataset Version 2 [Dataset]. https://researchdata.edu.au/3377826
Explore at:
datadownloadAvailable download formats
Dataset updated
Oct 16, 2014
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
Susan Wijffels; Franco Reseghetti; Zanna Chase; Mark Rosenberg; Steve Rintoul; Rebecca Cowley
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Time period covered
Jan 1, 1984 - Aug 30, 2013
Area covered

Description
The XBT/CTD pairs dataset (Version 2) contains additional datasets and updated datasets from the Version 1 data. Version 1 data was used to update the calculation of historical XBT fall rate and temperature corrections presented in Cowley, R., Wijffels, S., Cheng, L., Boyer, T., and Kizu, S. (2013). Biases in Expendable Bathythermograph Data: A New View Based on Historical Side-by-Side Comparisons. Journal of Atmospheric and Oceanic Technology, 30, 1195–1225, doi:10.1175/JTECH-D-12-00127.1. http://journals.ametsoc.org/doi/abs/10.1175/JTECH-D-12-00127.1 Version 2 contains 1,188 pairs from seven datasets that add to Version 1 which contains 4,115 pairs from 114 datasets. There are also 10 updated datasets included in Version 2. The updates apply to the CTD depth data in the Quality Controlled version of the 10 datasets. The 10 updated Version 2 datasets should be used in preference to the copies in Version 1. Note that future versions of the XBT/CTD pairs database may supersede this version. Please check more recent versions for updates to individual datasets. Each dataset contains the scientifically quality controlled version and (where available) the originator's data. The XBT/CTD pairs are identified in the document 'XBT_CTDpairs_metadata_V2.csv'. Although the XBT data in the additional datasets was collected after 2008, much of the probes in the ss2012t01 dataset were manufactured during the mid-1980s. Lineage: Data is sourced from CSIRO Oceans and Atmosphere Flagship, Australian Antarctic Division and Italian National Agency for New Technologies, Energy and Sustainable Economic Development. Original and raw data files are included where available. Quality controlled datasets follow the procedure of Bailey, R., Gronell, A., Phillips, H., Tanner, E., and Meyers, G. (1994). Quality control cookbook for XBT data, Version 1.1. CSIRO Marine Laboratories Reports, 221. Quality controlled data is in the 'MQNC' format used at CSIRO Marine and Atmospheric Research. The MQNC format is described in the document 'XBT_CTDpairs_descriptionV2.pdf'. Note that future versions of the XBT/CTD pairs database may supersede this version. Please check more recent versions for updates to individual datasets.
Z
Data from: Multi-Source Distributed System Data for AI-powered Analytics
data.niaid.nih.gov
zenodo.org
Updated Nov 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorge Cardoso (2022). Multi-Source Distributed System Data for AI-powered Analytics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3484800
Explore at:
Dataset updated
Nov 10, 2022
Dataset provided by
Odej Kao
Soeren Becker
Jorge Cardoso
Jasmin Bogatinovski
Sasho Nedelkoski
Ajay Kumar Mandapati
Description
Abstract:

In recent years there has been an increased interest in Artificial Intelligence for IT Operations (AIOps). This field utilizes monitoring data from IT systems, big data platforms, and machine learning to automate various operations and maintenance (O&M) tasks for distributed systems. The major contributions have been materialized in the form of novel algorithms. Typically, researchers took the challenge of exploring one specific type of observability data sources, such as application logs, metrics, and distributed traces, to create new algorithms. Nonetheless, due to the low signal-to-noise ratio of monitoring data, there is a consensus that only the analysis of multi-source monitoring data will enable the development of useful algorithms that have better performance.
Unfortunately, existing datasets usually contain only a single source of data, often logs or metrics. This limits the possibilities for greater advances in AIOps research. Thus, we generated high-quality multi-source data composed of distributed traces, application logs, and metrics from a complex distributed system. This paper provides detailed descriptions of the experiment, statistics of the data, and identifies how such data can be analyzed to support O&M tasks such as anomaly detection, root cause analysis, and remediation.

General Information:

This repository contains the simple scripts for data statistics, and link to the multi-source distributed system dataset.

You may find details of this dataset from the original paper:

Sasho Nedelkoski, Ajay Kumar Mandapati, Jasmin Bogatinovski, Soeren Becker, Jorge Cardoso, Odej Kao, "Multi-Source Distributed System Data for AI-powered Analytics". [link very soon]

If you use the data, implementation, or any details of the paper, please cite!

The multi-source/multimodal dataset is composed of distributed traces, application logs, and metrics produced from running a complex distributed system (Openstack). In addition, we also provide the workload and fault scripts together with the Rally report which can serve as ground truth (all at the Zenodo link below). We provide two datasets, which differ on how the workload is executed. The openstack_multimodal_sequential_actions is generated via executing workload of sequential user requests. The openstack_multimodal_concurrent_actions is generated via executing workload of concurrent user requests.

The difference of the concurrent dataset is that:

Due to the heavy load on the control node, the metric data for wally113 (control node) is not representative and we excluded it.

Three rally actions are executed in parallel: boot_and_delete, create_and_delete_networks, create_and_delete_image, whereas for the sequential there were 5 actions executed.

The raw logs in both datasets contain the same files. If the user wants the logs filetered by time with respect to the two datasets, should refer to the timestamps at the metrics (they provide the time window). In addition, we suggest to use the provided aggregated time ranged logs for both datasets in CSV format.

Important: The logs and the metrics are synchronized with respect time and they are both recorded on CEST (central european standard time). The traces are on UTC (Coordinated Universal Time -2 hours). They should be synchronized if the user develops multimodal methods.

Our GitHub repository can be found at: https://github.com/SashoNedelkoski/multi-source-observability-dataset/
O
Open Source Big Data Tools Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AMA Research & Media LLP (2025). Open Source Big Data Tools Report [Dataset]. https://www.archivemarketresearch.com/reports/open-source-big-data-tools-58978
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 15, 2025
Dataset provided by
AMA Research & Media LLP
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The open-source big data tools market is experiencing robust growth, driven by the increasing need for scalable, cost-effective data management and analysis solutions across diverse sectors. The market, estimated at $15 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 18% from 2025 to 2033. This expansion is fueled by several key factors. Firstly, the rising volume and velocity of data generated across industries, from banking and finance to manufacturing and government, necessitate powerful and adaptable tools. Secondly, the cost-effectiveness and flexibility of open-source solutions compared to proprietary alternatives are major drawcards, especially for smaller organizations and startups. The ease of customization and community support further enhance their appeal. Growth is also being propelled by technological advancements such as the development of more sophisticated data analytics tools, improved cloud integration, and increased adoption of containerization technologies like Docker and Kubernetes for deployment and management. The market's segmentation across application (banking, manufacturing, etc.) and tool type (data collection, storage, analysis) reflects the diverse range of uses and specialized tools available. Key restraints to market growth include the complexity associated with implementing and managing open-source solutions, requiring skilled personnel and ongoing maintenance. Security concerns and the need for robust data governance frameworks also pose challenges. However, the growing maturity of the open-source ecosystem, coupled with the emergence of managed services providers offering support and expertise, is mitigating these limitations. The continued advancements in artificial intelligence (AI) and machine learning (ML) are further integrating with open-source big data tools, creating synergistic opportunities for growth in predictive analytics and advanced data processing. This integration, alongside the ever-increasing volume of data needing analysis, will undoubtedly drive continued market expansion over the forecast period.
e
Nature management plan, differences Design and views Management types map
data.europa.eu
Updated Apr 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Nature management plan, differences Design and views Management types map [Dataset]. https://data.europa.eu/data/datasets/56651e92-2e30-404b-bdb9-68477b7b2166?locale=en
Explore at:
Dataset updated
Apr 23, 2024
Description
The Nature Management Plan includes a map with Nature Management Area (Management Type Map). Changes to this card can take place in 2 ways. [1] With a draft decision, a consultation period, followed by a final decision by way of views. [2] With a Card Adaptation Decision that is immediately final. This map shows differences from [1] Design and views. The April version of this map shows the differences of the Draft Decision, the September version shows the differences of the Draft Decision including views. The differences are compared to the previous definitive version of the Management Type Card.
f
Comparison in terms of the Wilcoxon rank-sum statistical test with 5% among...
figshare.com
xls
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ying Li; Yanyu Geng; Huankun Sheng (2024). Comparison in terms of the Wilcoxon rank-sum statistical test with 5% among the BIMGO and the 8 comparison algorithms. [Dataset]. http://doi.org/10.1371/journal.pone.0307288.t012
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0307288.t012
Dataset updated
Jul 16, 2024
Dataset provided by
PLOS ONE
Authors
Ying Li; Yanyu Geng; Huankun Sheng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison in terms of the Wilcoxon rank-sum statistical test with 5% among the BIMGO and the 8 comparison algorithms.
Map of articles about "Teaching Open Science"
zenodo.org
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isabel Steinhardt; Isabel Steinhardt (2020). Map of articles about "Teaching Open Science" [Dataset]. http://doi.org/10.5281/zenodo.3371415
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3371415
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Isabel Steinhardt; Isabel Steinhardt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This description is part of the blog post "Systematic Literature Review of teaching Open Science" https://sozmethode.hypotheses.org/839

According to my opinion, we do not pay enough attention to teaching Open Science in higher education. Therefore, I designed a seminar to teach students the practices of Open Science by doing qualitative research.About this seminar, I wrote the article ”Teaching Open Science and qualitative methods“. For the article ”Teaching Open Science and qualitative methods“, I started to review the literature on ”Teaching Open Science“. The result of my literature review is that certain aspects of Open Science are used for teaching. However, Open Science with all its aspects (Open Access, Open Data, Open Methodology, Open Science Evaluation and Open Science Tools) is not an issue in publications about teaching.

Based on this insight, I have started a systematic literature review. I realized quickly that I need help to analyse and interpret the articles and to evaluate my preliminary findings. Especially different disciplinary cultures of teaching different aspects of Open Science are challenging, as I myself, as a social scientist, do not have enough insight to be able to interpret the results correctly. Therefore, I would like to invite you to participate in this research project!

I am now looking for people who would like to join a collaborative process to further explore and write the systematic literature review on “Teaching Open Science“. Because I want to turn this project into a Massive Open Online Paper (MOOP). According to the 10 rules of Tennant et al (2019) on MOOPs, it is crucial to find a core group that is enthusiastic about the topic. Therefore, I am looking for people who are interested in creating the structure of the paper and writing the paper together with me. I am also looking for people who want to search for and review literature or evaluate the literature I have already found. Together with the interested persons I would then define, the rules for the project (cf. Tennant et al. 2019). So if you are interested to contribute to the further search for articles and / or to enhance the interpretation and writing of results, please get in touch. For everyone interested to contribute, the list of articles collected so far is freely accessible at Zotero: https://www.zotero.org/groups/2359061/teaching_open_science. The figure shown below provides a first overview of my ongoing work. I created the figure with the free software yEd and uploaded the file to zenodo, so everyone can download and work with it:

To make transparent what I have done so far, I will first introduce what a systematic literature review is. Secondly, I describe the decisions I made to start with the systematic literature review. Third, I present the preliminary results.

Systematic literature review – an Introduction

Systematic literature reviews “are a method of mapping out areas of uncertainty, and identifying where little or no relevant research has been done.” (Petticrew/Roberts 2008: 2). Fink defines the systematic literature review as a “systemic, explicit, and reproducible method for identifying, evaluating, and synthesizing the existing body of completed and recorded work produced by researchers, scholars, and practitioners.” (Fink 2019: 6). The aim of a systematic literature reviews is to surpass the subjectivity of a researchers’ search for literature. However, there can never be an objective selection of articles. This is because the researcher has for example already made a preselection by deciding about search strings, for example “Teaching Open Science”. In this respect, transparency is the core criteria for a high-quality review.

In order to achieve high quality and transparency, Fink (2019: 6-7) proposes the following seven steps:

Selecting a research question.

Selecting the bibliographic database.

Choosing the search terms.

Applying practical screening criteria.

Applying methodological screening criteria.

Doing the review.

Synthesizing the results.

I have adapted these steps for the “Teaching Open Science” systematic literature review. In the following, I will present the decisions I have made.

Systematic literature review – decisions I made

Research question: I am interested in the following research questions: How is Open Science taught in higher education? Is Open Science taught in its full range with all aspects like Open Access, Open Data, Open Methodology, Open Science Evaluation and Open Science Tools? Which aspects are taught? Are there disciplinary differences as to which aspects are taught and, if so, why are there such differences?

Databases: I started my search at the Directory of Open Science (DOAJ). “DOAJ is a community-curated online directory that indexes and provides access to high quality, open access, peer-reviewed journals.” (https://doaj.org/) Secondly, I used the Bielefeld Academic Search Engine (base). Base is operated by Bielefeld University Library and “one of the world’s most voluminous search engines especially for academic web resources” (base-search.net). Both platforms are non-commercial and focus on Open Access publications and thus differ from the commercial publication databases, such as Web of Science and Scopus. For this project, I deliberately decided against commercial providers and the restriction of search in indexed journals. Thus, because my explicit aim was to find articles that are open in the context of Open Science.

Search terms: To identify articles about teaching Open Science I used the following search strings: “teaching open science” OR teaching “open science” OR teach „open science“. The topic search looked for the search strings in title, abstract and keywords of articles. Since these are very narrow search terms, I decided to broaden the method. I searched in the reference lists of all articles that appear from this search for further relevant literature. Using Google Scholar I checked which other authors cited the articles in the sample. If the so checked articles met my methodological criteria, I included them in the sample and looked through the reference lists and citations at Google Scholar. This process has not yet been completed.

Practical screening criteria: I have included English and German articles in the sample, as I speak these languages (articles in other languages are very welcome, if there are people who can interpret them!). In the sample only journal articles, articles in edited volumes, working papers and conference papers from proceedings were included. I checked whether the journals were predatory journals – such articles were not included. I did not include blogposts, books or articles from newspapers. I only included articles that fulltexts are accessible via my institution (University of Kassel). As a result, recently published articles at Elsevier could not be included because of the special situation in Germany regarding the Project DEAL (https://www.projekt-deal.de/about-deal/). For articles that are not freely accessible, I have checked whether there is an accessible version in a repository or whether preprint is available. If this was not the case, the article was not included. I started the analysis in May 2019.

Methodological criteria: The method described above to check the reference lists has the problem of subjectivity. Therefore, I hope that other people will be interested in this project and evaluate my decisions. I have used the following criteria as the basis for my decisions: First, the articles must focus on teaching. For example, this means that articles must describe how a course was designed and carried out. Second, at least one aspect of Open Science has to be addressed. The aspects can be very diverse (FOSS, repositories, wiki, data management, etc.) but have to comply with the principles of openness. This means, for example, I included an article when it deals with the use of FOSS in class and addresses the aspects of openness of FOSS. I did not include articles when the authors describe the use of a particular free and open source software for teaching but did not address the principles of openness or re-use.

Doing the review: Due to the methodical approach of going through the reference lists, it is possible to create a map of how the articles relate to each other. This results in thematic clusters and connections between clusters. The starting point for the map were four articles (Cook et al. 2018; Marsden, Thompson, and Plonsky 2017; Petras et al. 2015; Toelch and Ostwald 2018) that I found using the databases and criteria described above. I used yEd to generate the network. „yEd is a powerful desktop application that can be used to quickly and effectively generate high-quality diagrams.” (https://www.yworks.com/products/yed) In the network, arrows show, which articles are cited in an article and which articles are cited by others as well. In addition, I made an initial rough classification of the content using colours. This classification is based on the contents mentioned in the articles’ title and abstract. This rough content classification requires a more exact, i.e., content-based subdivision and
f
Comparison in terms of the selected features.
plos.figshare.com
xls
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ying Li; Yanyu Geng; Huankun Sheng (2024). Comparison in terms of the selected features. [Dataset]. http://doi.org/10.1371/journal.pone.0307288.t011
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0307288.t011
Dataset updated
Jul 16, 2024
Dataset provided by
PLOS ONE
Authors
Ying Li; Yanyu Geng; Huankun Sheng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Feature selection is an important solution for dealing with high-dimensional data in the fields of machine learning and data mining. In this paper, we present an improved mountain gazelle optimizer (IMGO) based on the newly proposed mountain gazelle optimizer (MGO) and design a binary version of IMGO (BIMGO) to solve the feature selection problem for medical data. First, the gazelle population is initialized using iterative chaotic map with infinite collapses (ICMIC) mapping, which increases the diversity of the population. Second, a nonlinear control factor is introduced to balance the exploration and exploitation components of the algorithm. Individuals in the population are perturbed using a spiral perturbation mechanism to enhance the local search capability of the algorithm. Finally, a neighborhood search strategy is used for the optimal individuals to enhance the exploitation and convergence capabilities of the algorithm. The superior ability of the IMGO algorithm to solve continuous problems is demonstrated on 23 benchmark datasets. Then, BIMGO is evaluated on 16 medical datasets of different dimensions and compared with 8 well-known metaheuristic algorithms. The experimental results indicate that BIMGO outperforms the competing algorithms in terms of the fitness value, number of selected features and sensitivity. In addition, the statistical results of the experiments demonstrate the significantly superior ability of BIMGO to select the most effective features in medical datasets.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Susan Wijffels; Franco Reseghetti; Zanna Chase; Mark Rosenberg; Steve Rintoul; Rebecca Cowley (2014). XBT and CTD pairs dataset Version 2 [Dataset]. https://researchdata.edu.au/3377826

XBT and CTD pairs dataset Version 2

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

datadownloadAvailable download formats

Dataset updated

Oct 16, 2014

Dataset provided by

CSIROhttp://www.csiro.au/

Authors

Susan Wijffels; Franco Reseghetti; Zanna Chase; Mark Rosenberg; Steve Rintoul; Rebecca Cowley

License

Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically

Time period covered

Jan 1, 1984 - Aug 30, 2013

Area covered

Description

The XBT/CTD pairs dataset (Version 2) contains additional datasets and updated datasets from the Version 1 data. Version 1 data was used to update the calculation of historical XBT fall rate and temperature corrections presented in Cowley, R., Wijffels, S., Cheng, L., Boyer, T., and Kizu, S. (2013). Biases in Expendable Bathythermograph Data: A New View Based on Historical Side-by-Side Comparisons. Journal of Atmospheric and Oceanic Technology, 30, 1195–1225, doi:10.1175/JTECH-D-12-00127.1. http://journals.ametsoc.org/doi/abs/10.1175/JTECH-D-12-00127.1 Version 2 contains 1,188 pairs from seven datasets that add to Version 1 which contains 4,115 pairs from 114 datasets. There are also 10 updated datasets included in Version 2. The updates apply to the CTD depth data in the Quality Controlled version of the 10 datasets. The 10 updated Version 2 datasets should be used in preference to the copies in Version 1. Note that future versions of the XBT/CTD pairs database may supersede this version. Please check more recent versions for updates to individual datasets. Each dataset contains the scientifically quality controlled version and (where available) the originator's data. The XBT/CTD pairs are identified in the document 'XBT_CTDpairs_metadata_V2.csv'. Although the XBT data in the additional datasets was collected after 2008, much of the probes in the ss2012t01 dataset were manufactured during the mid-1980s. Lineage: Data is sourced from CSIRO Oceans and Atmosphere Flagship, Australian Antarctic Division and Italian National Agency for New Technologies, Energy and Sustainable Economic Development. Original and raw data files are included where available. Quality controlled datasets follow the procedure of Bailey, R., Gronell, A., Phillips, H., Tanner, E., and Meyers, G. (1994). Quality control cookbook for XBT data, Version 1.1. CSIRO Marine Laboratories Reports, 221. Quality controlled data is in the 'MQNC' format used at CSIRO Marine and Atmospheric Research. The MQNC format is described in the document 'XBT_CTDpairs_descriptionV2.pdf'. Note that future versions of the XBT/CTD pairs database may supersede this version. Please check more recent versions for updates to individual datasets.

Clear search

Close search

Google apps

Main menu

XBT and CTD pairs dataset Version 2

Data from: Multi-Source Distributed System Data for AI-powered Analytics

Open Source Big Data Tools Report

Nature management plan, differences Design and views Management types map

Comparison in terms of the Wilcoxon rank-sum statistical test with 5% among...

Map of articles about "Teaching Open Science"

Comparison in terms of the selected features.

XBT and CTD pairs dataset Version 2See More Versions

XBT and CTD pairs dataset Version 2