The Data Visualization Workshop II: Data Wrangling was a web-based event held on October 18, 2017. This workshop report summarizes the individual perspectives of a group of visualization experts from the public, private, and academic sectors who met online to discuss how to improve the creation and use of high-quality visualizations. The specific focus of this workshop was on the complexities of "data wrangling". Data wrangling includes finding the appropriate data sources that are both accessible and usable and then shaping and combining that data to facilitate the most accurate and meaningful analysis possible. The workshop was organized as a 3-hour web event and moderated by the members of the Human Computer Interaction and Information Management Task Force of the Networking and Information Technology Research and Development Program's Big Data Interagency Working Group. Report prepared by the Human Computer Interaction And Information Management Task Force, Big Data Interagency Working Group, Networking & Information Technology Research & Development Subcommittee, Committee On Technology Of The National Science & Technology Council...
Advancements in cyberinfrastructure (CI) to support cloud-based tools and services for the water science community have changed how researchers conduct, share, and publish scientific workflows. These have had a transformative impact on how our community addresses the challenges associated with interdisciplinary collaboration, reproducing scientific findings, and developing real-world educational modules. The Consortium of Universities for the Advancement of Hydrologic Science, Inc (CUAHSI) facilitates discussion around these topics, with the water science community, to better identify the shortcomings of current CI approaches and define the requirements for the next generation of cloud services. The purpose of this workshop is to introduce and solicit feedback on the current suite of CUAHSI community to computational tools to that have been designed to improve the way water science research and education is conducted in the cloud. This workshop will consist of several technologies that are actively being developed for working with data Earth surface data. Our goal is to demonstrate how these compute environments can be used in educational applications, workshops, reproducing published work, and conducting research. Participants will be presented with several approaches for working with their data within the CUAHSI ecosystem of tools. The workshop will focus heavily on interactive examples and will feature several programming languages including Python, R, and MATLAB. Participants are not required to be proficient in these languages but should bring a laptop computer, be ready to work through live examples, and willing to provide constructive feedback.
Report of Workshop - E3 Workshop, PELS Working Group Meeting & PALS Project Steering Commmittee Meeting, February 2015. Held in Canberra from 23 - 27 February 2015.
This workshop will introduce OpenRefine, a powerful open source tool for exploring, cleaning and manipulating "messy" data. Through hands-on activities, using a variety of datasets, participants will learn how to: Explore and identify patterns in data; Normalize data using facets and clusters; Manipulate and generate new textual and numeric data; Transform and reshape datasets; Use the General Regular Expression Language (GREL) to undertake manipulations, such as concatenating strings.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 16.0px 'Andale Mono'; color: #29f914; background-color: #000000} span.s1 {font-variant-ligatures: no-common-ligatures} These files are intended for use with the Data Carpentry Genomics curriculum (https://datacarpentry.org/genomics-workshop/). Files will be useful for instructors teaching this curriculum in a workshop setting, as well as individuals working through these materials on their own.
This curriculum is normally taught using Amazon Web Services (AWS). Data Carpentry maintains an AWS image that includes all of the data files needed to use these lesson materials. For information on how to set up an AWS instance from that image, see https://datacarpentry.org/genomics-workshop/setup.html. Learners and instructors who would prefer to teach on a different remote computing system can access all required files from this FigShare dataset.
This curriculum uses data from a long term evolution experiment published in 2016: Tempo and mode of genome evolution in a 50,000-generation experiment (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4988878/) by Tenaillon O, Barrick JE, Ribeck N, Deatherage DE, Blanchard JL, Dasgupta A, Wu GC, Wielgoss S, Cruveiller S, Médigue C, Schneider D, and Lenski RE. (doi: 10.1038/nature18959). All sequencing data sets are available in the NCBI BioProject database under accession number PRJNA294072 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA294072).
backup.tar.gz: contains original fastq files, reference genome, and subsampled fastq files. Directions for obtaining these files from public databases are given during the lesson https://datacarpentry.org/wrangling-genomics/02-quality-control/index.html). On the AWS image, these files are stored in ~/.backup directory. 1.3Gb in size.
Ecoli_metadata.xlsx: an example Excel file to be loaded during the R lesson.
shell_data.tar.gz: contains the files used as input to the Introduction to the Command Line for Genomics lesson (https://datacarpentry.org/shell-genomics/).
sub.tar.gz: contains subsampled fastq files that are used as input to the Data Wrangling and Processing for Genomics lesson (https://datacarpentry.org/wrangling-genomics/). 109Mb in size.
solutions: contains the output files of the Shell Genomics and Wrangling Genomics lessons, including fastqc output, sam, bam, bcf, and vcf files.
vcf_clean_script.R: converts vcf output in .solutions/wrangling_solutions/variant_calling_auto to single tidy data frame.
combined_tidy_vcf.csv: output of vcf_clean_script.R
The high performance computing (HPC) and big data (BD) communities traditionally have pursued independent trajectories in the world of computational science. HPC has been synonymous with modeling and simulation, and BD with ingesting and analyzing data from diverse sources, including from simulations. However, both communities are evolving in response to changing user needs and technological landscapes. Researchers are increasingly using machine learning (ML) not only for data analytics but also for modeling and simulation; science-based simulations are increasingly relying on embedded ML models not only to interpret results from massive data outputs but also to steer computations. Science-based models are being combined with data-driven models to represent complex systems and phenomena. There also is an increasing need for real-time data analytics, which requires large-scale computations to be performed closer to the data and data infrastructures, to adapt to HPC-like modes of operation. These new use cases create a vital need for HPC and BD systems to deal with simulations and data analytics in a more unified fashion. To explore this need, the NITRD Big Data and High-End Computing R&D Interagency Working Groups held a workshop, The Convergence of High-Performance Computing, Big Data, and Machine Learning, on October 29-30, 2018, in Bethesda, Maryland. The purposes of the workshop were to bring together representatives from the public, private, and academic sectors to share their knowledge and insights on integrating HPC, BD, and ML systems and approaches and to identify key research challenges and opportunities. The 58 workshop participants represented a balanced cross-section of stakeholders involved in or impacted by this area of research. Additional workshop information, including a webcast, is available at https://www.nitrd.gov/nitrdgroups/index.php?title=HPC-BD-Convergence.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created as part of the "Urban Climate Resilience" research project, conducted at the Department of Environmental Sciences, University of Example. The project aims to study temperature and humidity variations in metropolitan areas to better understand the impacts of urban heat islands.
The dataset serves as the primary data source for the statistical analysis of microclimate conditions across three city districts, collected over the summer of 2024. Data was gathered using IoT-based environmental sensors deployed at 30 locations. Each sensor recorded temperature, humidity, and air pressure at 5-minute intervals.
The dataset is organized into three main folders, one for each district ("District_A", "District_B", and "District_C"). Each folder contains daily CSV files named in the format YYYY-MM-DD_sensorID.csv
. A README file at the root level explains the folder structure, file naming convention, and column definitions.
The CSV files can be opened with any standard spreadsheet software (e.g., Excel, LibreOffice) or programmatically using tools such as Python (pandas) or R. A Jupyter Notebook is included to demonstrate basic data loading and visualization.
Additional documentation and source code for the data collection scripts and analysis pipeline are available on the project's GitHub repository: https://github.com/example/urban-climate-resilience
Further details
Please note that while sensor calibration was performed prior to deployment, occasional anomalies may occur due to weather interference or battery fluctuations. Users are advised to apply the provided quality control script (quality_check.py
) before analysis.
We encourage reuse and welcome collaboration. If you use this dataset in your work, please cite it using the provided DOI.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Scientific and management challenges in the water domain require synthesis of diverse data. Many data analysis tasks are difficult because datasets are large and complex; standard data formats are not always agreed upon or mapped to efficient structures for analysis; scientists may lack training for tackling large and complex datasets; and it can be difficult to share, collaborate around, and reproduce scientific work. Overcoming barriers to accessing, organizing, and preparing datasets for analyses can transform the way water scientists work. Building on the HydroShare repository’s cyberinfrastructure, we have advanced two Python packages that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS) (i.e., a Python equivalent of USGS’ R dataRetrieval package), loading data into performant structures that integrate with existing visualization, analysis, and data science capabilities available in Python, and writing analysis results back to HydroShare for sharing and publication. While these Python packages can be installed for use within any Python environment, we will demonstrate how the technical burden for scientists associated with creating a computational environment for executing analyses can be reduced and how sharing and reproducibility of analyses can be enhanced through the use of these packages within CUAHSI’s HydroShare-linked JupyterHub server.
This HydroShare resource includes all of the materials presented in a workshop at the 2023 CUAHSI Biennial Colloquium.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains pre-processed NPLinker (v2.0.0-alpha.8) results for two Paired omics Data Platform (PoDP) entries, facilitating the efficient exploration of genomic-metabolomic connections in natural product discovery workflows.
Two pickle (PKL) files are provided:
npl_podp.pkl
: Generated from PoDP entry ID 4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4npl_b78b5817.pkl
: Generated from PoDP entry ID b78b5817-86e2-4e5e-a087-a6b0d9710fce.6Usage example:
import pickle
# Load the saved data
with open('output/npl_xxx.pkl', 'rb') as f:
data = pickle.load(f)
# Unpack the tuple components
bgcs, gcfs, spectra, mfs, strains, links = data
# Verify the loaded data components
print(f"BGCs: {type(bgcs)}, Count: {len(bgcs)}")
print(f"GCFs: {type(gcfs)}, Count: {len(gcfs)}")
print(f"Spectra: {type(spectra)}, Count: {len(spectra)}")
print(f"Molecular Families: {type(mfs)}, Count: {len(mfs)}")
print(f"Strains: {type(strains)}, Count: {len(strains)}")
print(f"Links: {type(links)}, Count: {len(links) if links else 0}")
Note: This dataset was prepared for the eScience Workshop 2025.
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
A two days waste management workshop was conducted in the provincial capital of Central Province, Tulagi on Tuesday 30th July 2019 to Wednesday 31st July 2019. The workshop is facilitated by the Ministry of Environment Climate Change Disaster Management & Meteorology with supporting partners representing the Honiara City Council, Japan International Cooperation Agency through the J-PRISM II Project in collaboration with the Central Islands Provincial Government. The workshop is co-funded by the J-PRISM II Project and the Solomon Islands Government through the Ministry of Environment Climate Change Disaster Management & Meteorology. The goal of this two days workshop is to enhance knowledge of stakeholder participants from Central Province on waste management techniques and skills with lessons learnt to be shared based on experiences from Honiara and other case studies. During the two days’ workshop, participants from the province have the chance to discuss waste management issues faced in Tulagi and overall in Central Province; identify strategies which will assist in the development of a Waste Management Plan for Tulagi. This is also in line with the current National Waste Management & Pollution Control 2017-2026 to support Provinces in waste management. Mr David Manetiva, Premier of Central Islands Province remarked during a courtesy call visit that his Government is serious in going forward on addressing the waste management in the province. In closing the final day of the workshop, Mr Christian Siale, Provincial Secretary of the Central Islands Provincial Government stated , “Waste Management is one of the three areas to be addressed. As a result of the workshop, a working paper for the plan is in progress.” He added that the Province is committed and will allocate budget for areas such as beautification , promotion of composting and will procure required equipment such as loader and highlight the need for an incinerator for Tulagi Hospital. Follow-up after the workshop, a waste characterization study or waste audit was conducted by the team from Honiara and Central Province in the capital centre in Tulagi. This survey is purposely conducted to gather baseline data which will be important for the province in its planning and decision making on how to manage the waste issues in provincial capital. This workshop training and waste survey is part of the government’s 100days program to improve and build capacity on proper waste management in the country.
The circum-Arctic coastal margin is about 200,000 km long and is the interface through which land-shelf exchanges are mediated. Sediment input to the Arctic shelf resulting from erosion of ice-rich, permafrost dominated coastlines may be equal to or greater than input from river discharge. In addition, climate change in the Arctic is predicted to be more rapid and more intense than at lower latitudes. Determining sediment sources and transport rates along high latitude coasts and inner shelves is critical for interpreting the geological history of the shelves and for predictions of future behavior of these coasts in response to climatic and sea level changes. The fourth IASC-sponsored ACD workshop was held in St. Petersburg, Russia, on November 10-13, 2003. Participants from Canada (7), Germany (7), Great Britain (2), the Netherlands (1), Norway (1), Russia (32), Ukraine (1) and the United States (8) attended. During the first part of the workshop, 63 papers dealing with regional and/or circum-Arctic coastal dynamics were presented. Based on the material presented, five thematic working groups were identified: (1) GIS working group to develop of a circum-Arctic coastal GIS system, (2) coastal permafrost working group to discuss processes involved in the transition of onshore to offshore permafrost, (3) biogeochemistry working group with the focus on transport and fate of eroded material (4) biodiversity working group to initiate planning of an Arctic Coastal Biodiversity research agenda, (5) environmental data working group to discuss coastal dynamics as a function of environmental forcing. Finally, the results of the workshop and the next steps were discussed in the ACD Steering Committee meeting. The present report summarizes the program of the workshop and the main results.
https://opensource.org/licenses/BSD-3-Clausehttps://opensource.org/licenses/BSD-3-Clause
This dataset contains pre-processed microphone array time signals from the FAN-01 benchmark derotated using the mode-time domain Virtual Rotating Array (VRA-M) method. The VRA-M method compensates for the fan's rotational motion, enabling spatial separation of individual source contributions using frequency-domain microphone array methods. The resulting derotated time signals are stored in HDF5 format. Details on the measurement setup and evaluation using the Acoular framework is given in the preprint.pdf
How to Use:
The data supplements the presentation and publication An Interactive Tutorial on Advanced Microphone Array Methods for Acoustic Source Mapping, which was held at DAS | DAGA 2025 in Copenhagen. The presentation can be found here.
One can use the Acoular Python framework to load and process the dataset. Example code for loading the data is provided below:
import acoular as ac
vra_time_data = ac.TimeSamples(file='2015-08-12_16-55-10_n1ug_no_grid_1_4_derotated.h5')
mics = ac.MicGeom(file='ring64.xml')
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The Data Visualization Workshop II: Data Wrangling was a web-based event held on October 18, 2017. This workshop report summarizes the individual perspectives of a group of visualization experts from the public, private, and academic sectors who met online to discuss how to improve the creation and use of high-quality visualizations. The specific focus of this workshop was on the complexities of "data wrangling". Data wrangling includes finding the appropriate data sources that are both accessible and usable and then shaping and combining that data to facilitate the most accurate and meaningful analysis possible. The workshop was organized as a 3-hour web event and moderated by the members of the Human Computer Interaction and Information Management Task Force of the Networking and Information Technology Research and Development Program's Big Data Interagency Working Group. Report prepared by the Human Computer Interaction And Information Management Task Force, Big Data Interagency Working Group, Networking & Information Technology Research & Development Subcommittee, Committee On Technology Of The National Science & Technology Council...