14 datasets found

f
Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
figshare
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
z
Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2...
zenodo.org
application/gzip
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi (2024). Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2 Data Science Libraries in Python [Dataset]. http://doi.org/10.5281/zenodo.11584961
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11584961
Dataset updated
Jul 15, 2024
Dataset provided by
Zenodo
Authors
João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication Package

This repository contains data and source files needed to replicate our work described in the paper "Unboxing Default Argument Breaking Changes in Scikit Learn".

Requirements

We recommend the following requirements to replicate our study:

Internet access

At least 100GB of space

Docker installed

Git installed

Package Structure

We relied on Docker containers to provide a working environment that is easier to replicate. Specifically, we configure the following containers:

data-analysis, an R-based Container we used to run our data analysis.

data-collection, a Python Container we used to collect Scikit's default arguments and detect them in client applications.

database, a Postgres Container we used to store clients' data, obtainer from Grotov et al.

storage, a directory used to store the data processed in data-analysis and data-collection. This directory is shared in both containers.

docker-compose.yml, the Docker file that configures all containers used in the package.

In the remainder of this document, we describe how to set up each container properly.

Using VSCode to Setup the Package

We selected VSCode as the IDE of choice because its extensions allow us to implement our scripts directly inside the containers. In this package, we provide configuration parameters for both data-analysis and data-collection containers. This way you can directly access and run each container inside it without any specific configuration.

You first need to set up the containers

$ cd /replication/package/folder $ docker-compose build $ docker-compose up # Wait docker creating and running all containers

Then, you can open them in Visual Studio Code:

Open VSCode in project root folder

Access the command palette and select "Dev Container: Reopen in Container"

Select either Data Collection or Data Analysis.

Start working

If you want/need a more customized organization, the remainder of this file describes it in detail.

Longest Road: Manual Package Setup

Database Setup

The database container will automatically restore the dump in dump_matroskin.tar in its first launch. To set up and run the container, you should:

Build an image:

$ cd ./database $ docker build --tag 'dabc-database' . $ docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE dabc-database latest b6f8af99c90d 50 minutes ago 18.5GB

Create and enter inside the container:

$ docker run -it --name dabc-database-1 dabc-database $ docker exec -it dabc-database-1 /bin/bash root# psql -U postgres -h localhost -d jupyter-notebooks jupyter-notebooks=# \dt List of relations Schema | Name | Type | Owner --------+-------------------+-------+------- public | Cell | table | root public | Code_cell | table | root public | Md_cell | table | root public | Notebook | table | root public | Notebook_features | table | root public | Notebook_metadata | table | root public | repository | table | root

If you got the tables list as above, your database is properly setup.

It is important to mention that this database is extended from the one provided by Grotov et al.. Basically, we added three columns in the table Notebook_features (API_functions_calls, defined_functions_calls, andother_functions_calls) containing the function calls performed by each client in the database.

Data Collection Setup

This container is responsible for collecting the data to answer our research questions. It has the following structure:

dabcs.py, extract DABCs from Scikit Learn source code, and export them to a CSV file.

dabcs-clients.py, extract function calls from clients and export them to a CSV file. We rely on a modified version of Matroskin to leverage the function calls. You can find the tool's source code in the `matroskin`` directory.

Makefile, commands to set up and run both dabcs.py and dabcs-clients.py

matroskin, the directory containing the modified version of matroskin tool. We extended the library to collect the function calls performed on the client notebooks of Grotov's dataset.

storage, a docker volume where the data-collection should save the exported data. This data will be used later in Data Analysis.

requirements.txt, Python dependencies adopted in this module.

Note that the container will automatically configure this module for you, e.g., install dependencies, configure matroskin, download scikit learn source code, etc. For this, you must run the following commands:

$ cd ./data-collection $ docker build --tag "data-collection" . $ docker run -it -d --name data-collection-1 -v $(pwd)/:/data-collection -v $(pwd)/../storage/:/data-collection/storage/ data-collection $ docker exec -it data-collection-1 /bin/bash $ ls Dockerfile Makefile config.yml dabcs-clients.py dabcs.py matroskin storage requirements.txt utils.py

If you see project files, it means the container is configured accordingly.

Data Analysis Setup

We use this container to conduct the analysis over the data produced by the Data Collection container. It has the following structure:

dependencies.R, an R script containing the dependencies used in our data analysis.

data-analysis.Rmd, the R notebook we used to perform our data analysis

datasets, a docker volume pointing to the storage directory.

Execute the following commands to run this container:

$ cd ./data-analysis $ docker build --tag "data-analysis" . $ docker run -it -d --name data-analysis-1 -v $(pwd)/:/data-analysis -v $(pwd)/../storage/:/data-collection/datasets/ data-analysis $ docker exec -it data-analysis-1 /bin/bash $ ls data-analysis.Rmd datasets dependencies.R Dockerfile figures Makefile

If you see project files, it means the container is configured accordingly.

A note on storage shared folder

As mentioned, the storage folder is mounted as a volume and shared between data-collection and data-analysis containers. We compressed the content of this folder due to space constraints. Therefore, before starting working on Data Collection or Data Analysis, make sure you extracted the compressed files. You can do this by running the Makefile inside storage folder.

$ make unzip # extract files $ ls clients-dabcs.csv clients-validation.csv dabcs.csv Makefile scikit-learn-versions.csv versions.csv $ make zip # compress files $ ls csv-files.tar.gz Makefile
Z
Data and Code for High Throughput FTIR Analysis of Macro and Microplastics...
data.niaid.nih.gov
zenodo.org
Updated Nov 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukas Gehrke (2023). Data and Code for High Throughput FTIR Analysis of Macro and Microplastics with Plate Readers [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7772571
Explore at:
Dataset updated
Nov 14, 2023
Dataset provided by
Hannah Jebens
Benjamin D Maurer
Win Cowger
Ali Chamas
Sebastian Primpke
Lisa Roscher
Gunnar Gerdts
Lukas Gehrke
Description
Data and source code for reproducing the analysis conducted in "High Throughput FTIR Analysis of Macro and Microplastics with Plate Readers" All materials are licensed for noncommercial purposes https://creativecommons.org/licenses/by-nc/4.0/ Explanatory_Videos.zip has videos showing data collection methods. HIDA_Publication.R has source code for doing data cleanup and analysis on data in database.zip. databasedata.zip holds all raw and analyzed data. - ATR, Reflectance, and Transmission folders has all data used in the manuscript. In a raw (.0) and combined (export.csv) format for each of the plates analyzed (folder numbers). - Plots folder has images of each spectrum. - cell_information.csv has the raw ids and comments made at the time the particles were assessed. - classes_reference_2.csv has the transformations used to standardize open specy's terms to polymer classes. - CleanedSpectra_raw.csv has the total cleaned up database of all spectral intensities in long format.
- joined_cell_metadata.csv has the metadata for each plate well analyzed. - library_metadata.csv has metadata for each spectrum in raw form for each particle id. - Lisa_Plate_6.csv has the metadata from Lisa Roscher used in this study. - Metadata_raw.csv has the conformed metadata that can be paired with the CleanedSpectra_raw.csv file. - OpenSpecy_Classification_Baseline.csv has the particle metadata combined with Open Specy's classes identified after baseline correcting and smoothing the spectra with the standard Open Specy routine. - OpenSpecy_Classification_Raw.csv has the particle metadata combined with Open Specy's identified classes if using the raw spectra. - particle_spectrum_match.csv converts particle ids to their reference in the Polymer_Material_Database_AWI_V2_Win.xlsx file. - Polymer_Material_Database_AWI_V2_Win.xlsx metadata on materials from Primpke's database. - polymer_metadata_2.csv can be used to crosswalk polymer categories to more or less specific terminology. - spread_os.csv is the reference database used in CleanedSpectra_raw.csv that has been spread to wide format. - Top Correlation Data20221201-125621.csv is a download of results from Open Specy's beta tool that provides the top ids from the reference database.
Test data and model for the FlowCam data processing pipeline
zenodo.org
csv, zip
Updated Jan 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katerina Symiakaki; Katerina Symiakaki; Tim Walles; Tim Walles; Cassidy J. Park; Jens Nejstgaard; Jens Nejstgaard; Stella A. Berger; Stella A. Berger; Cassidy J. Park (2025). Test data and model for the FlowCam data processing pipeline [Dataset]. http://doi.org/10.5281/zenodo.14732560
Explore at:
zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14732560
Dataset updated
Jan 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katerina Symiakaki; Katerina Symiakaki; Tim Walles; Tim Walles; Cassidy J. Park; Jens Nejstgaard; Jens Nejstgaard; Stella A. Berger; Stella A. Berger; Cassidy J. Park
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Testing data for the processing pipeline for FlowCam data

The data are fully processed but can be used to test each pipeline component. You can download the scripts at

Pipeline scripts

To use the model, unzip the freshwater_phytoplankton_model.zip and place the folder in the respective model folder in the services.

|--services |-- ProcessData.py |-- config.py |-- classification
|-- ObjectClassification
|-- models
|--

Once you unzip the data.zip file, each folder corresponds to the data export of a FlowCam run. You have the TIF collage files, a CSV file with the sample name containing all the parameters measured by the FlowCam, and a LabelChecker_

You can run the preprocessing.py script directly on the files by including the -R (reprocess) argument. Otherwise you can do it by removing the LabelChecker CSV from the folders. The PreprocessingTrue column will remain the same.

When running the classification.py script you can get new predictions on the data. In this case, only the LabelPredicted column will be updated and the validated labels (LabelTrue column) will not be lost.

You could also use these files to try out the train_model.ipynb, although the resulting model will not be very good with so little data. We recommend trying it with your own data.

LabelChecker

These files can be used to test LabelChecker. You can open them one by one or all together and try all functionalities. We provide a label_file.csv but you can also make your own.
d
QA/QC-ed Groundwater Level Time Series in PLM-1 and PLM-6 Monitoring Wells,...
dataone.org
Updated Feb 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boris Faybishenko; Roelof Versteeg; Kenneth Williams; Rosemary Carroll; Wenming Dong; Tetsu Tokunaga; Dylan O'Ryan (2024). QA/QC-ed Groundwater Level Time Series in PLM-1 and PLM-6 Monitoring Wells, East River, Colorado (2016-2022) [Dataset]. http://doi.org/10.15485/1866836
Explore at:
Unique identifier
https://doi.org/10.15485/1866836
Dataset updated
Feb 8, 2024
Dataset provided by
ESS-DIVE
Authors
Boris Faybishenko; Roelof Versteeg; Kenneth Williams; Rosemary Carroll; Wenming Dong; Tetsu Tokunaga; Dylan O'Ryan
Time period covered
Nov 30, 2016 - Oct 13, 2022
Area covered

Description
This data set contains QA/QC-ed (Quality Assurance and Quality Control) water level data for the PLM1 and PLM6 wells. PLM1 and PLM6 are location identifiers used by the Watershed Function SFA project for two groundwater monitoring wells along an elevation gradient located along the lower montane life zone of a hillslope near the Pumphouse location at the East River Watershed, Colorado, USA. These wells are used to monitor subsurface water and carbon inventories and fluxes, and to determine the seasonally dependent flow of groundwater under the PLM hillslope. The downslope flow of groundwater in combination with data on groundwater chemistry (see related references) can be used to estimate rates of solute export from the hillslope to the floodplain and river. QA/QC analysis of measured groundwater levels in monitoring wells PLM-1 and PLM-6 included identification and flagging of duplicated values of timestamps, gap filling of missing timestamps and water levels, removal of abnormal/bad and outliers of measured water levels. The QA/QC analysis also tested the application of different QA/QC methods and the development of regular (5-minute, 1-hour, and 1-day) time series datasets, which can serve as a benchmark for testing other QA/QC techniques, and will be applicable for ecohydrological modeling. The package includes a Readme file, one R code file used to perform QA/QC, a series of 8 data csv files (six QA/QC-ed regular time series datasets of varying intervals (5-min, 1-hr, 1-day) and two files with QA/QC flagging of original data), and three files for the reporting format adoption of this dataset (InstallationMethods, file level metadata (flmd), and data dictionary (dd) files).QA/QC-ed data herein were derived from the original/raw data publication available at Williams et al., 2020 (DOI: 10.15485/1818367). For more information about running R code file (10.15485_1866836_QAQC_PLM1_PLM6.R) to reproduce QA/QC output files, see README (QAQC_PLM_readme.docx). This dataset replaces the previously published raw data time series, and is the final groundwater data product for the PLM wells in the East River. Complete metadata information on the PLM1 and PLM6 wells are available in a related dataset on ESS-DIVE: Varadharajan C, et al (2022). https://doi.org/10.15485/1660962. These data products are part of the Watershed Function Scientific Focus Area collection effort to further scientific understanding of biogeochemical dynamics from genome to watershed scales. 2022/09/09 Update: Converted data files using ESS-DIVE’s Hydrological Monitoring Reporting Format. With the adoption of this reporting format, the addition of three new files (v1_20220909_flmd.csv, V1_20220909_dd.csv, and InstallationMethods.csv) were added. The file-level metadata file (v1_20220909_flmd.csv) contains information specific to the files contained within the dataset. The data dictionary file (v1_20220909_dd.csv) contains definitions of column headers and other terms across the dataset. The installation methods file (InstallationMethods.csv) contains a description of methods associated with installation and deployment at PLM1 and PLM6 wells. Additionally, eight data files were re-formatted to follow the reporting format guidance (er_plm1_waterlevel_2016-2020.csv, er_plm1_waterlevel_1-hour_2016-2020.csv, er_plm1_waterlevel_daily_2016-2020.csv, QA_PLM1_Flagging.csv, er_plm6_waterlevel_2016-2020.csv, er_plm6_waterlevel_1-hour_2016-2020.csv, er_plm6_waterlevel_daily_2016-2020.csv, QA_PLM6_Flagging.csv). The major changes to the data files include the addition of header_rows above the data containing metadata about the particular well, units, and sensor description. 2023/01/18 Update: Dataset updated to include additional QA/QC-ed water level data up until 2022-10-12 for ER-PLM1 and 2022-10-13 for ER-PLM6. Reporting format specific files (v2_20230118_flmd.csv, v2_20230118_dd.csv, v2_20230118_InstallationMethods.csv) were updated to reflect the additional data. R code file (QAQC_PLM1_PLM6.R) was added to replace the previously uploaded HTML files to enable execution of the associated code. R code file (QAQC_PLM1_PLM6.R) and ReadMe file (QAQC_PLM_readme.docx) were revised to clarify where original data was retrieved from and to remove local file paths.
Data supporting the Master thesis "Monitoring von Open Data Praktiken -...
zenodo.org
data.niaid.nih.gov
zip
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14196539
Dataset updated
Nov 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katharina Zinke; Katharina Zinke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

## Data sources

Folder 01_SourceData/

- PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

- ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

- ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

- Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

## Automatic classification

Folder 02_AutomaticClassification/

- (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

- (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

- PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

- oddpub_results_wDOIs.csv (results file of the ODDPub classification)

- PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

## Manual coding

Folder 03_ManualCheck/

- CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

- ManualCheck_2023-06-08.csv (Manual coding results file)

- PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

## Explorative analysis for the discoverability of open data

Folder04_FurtherAnalyses

Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

## R-Script

Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)
n
2007-08 V3 CEAMARC-CASO Bathymetry Plots Over Time During Events
cmr.earthdata.nasa.gov
researchdata.edu.au
+2more
Updated Sep 5, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). 2007-08 V3 CEAMARC-CASO Bathymetry Plots Over Time During Events [Dataset]. http://doi.org/10.4225/15/59ae2f5b239c2
Explore at:
Unique identifier
https://doi.org/10.4225/15/59ae2f5b239c2
Dataset updated
Sep 5, 2017
Time period covered
Dec 17, 2007 - Jan 26, 2008
Area covered

Description
A routine was developed in R ('bathy_plots.R') to plot bathymetry data over time during individual CEAMARC events. This is so we can analyse benthic data in relation to habitat, ie. did we trawl over a slope or was the sea floor relatively flat. Note that the depth range in the plots is autoscaled to the data, so a small range in depths appears as a scatetring of points. As long as you look at the depth scale though interpretation will be ok.

The R files need a file of bathymetry data in '200708V3_one_minute.csv' which is a file containing a data export from the underway PostgreSQL ship database and 'events.csv' which is a stripped down version of the events export from the ship board events database export. If you wish to run the code again you may need to change the pathnames in the R script to relevant locations. If you have opened the csv files in excel at any stage and the R script gets an error you may need to format the date/time columns as yyyy-mm-dd hh;mm:ss, save and close the file as csv without opening it again and then run the R script.

However, all output files are here for every CEAMARC event. Filenames contain a reference to CEAMARC event id. Files are in eps format and can be viewed using Ghostview which is available as a free download on the internet.
d
Data from: Indoor air quality in California homes with code-required...
datadryad.org
data.niaid.nih.gov
+2more
zip
Updated Apr 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wanyu Chan; Yang-Seon Kim; William Delp; Iain Walker; Brett Singer (2020). Indoor air quality in California homes with code-required mechanical ventilation [Dataset]. http://doi.org/10.7941/D1ZS7X
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.7941/D1ZS7X
Dataset updated
Apr 22, 2020
Dataset provided by
Dryad
Authors
Wanyu Chan; Yang-Seon Kim; William Delp; Iain Walker; Brett Singer
Time period covered
Feb 7, 2020
Area covered
California
Description
Time Series Data Handling and Quality Assurance Review

Most instruments had internal logging and special software to download data from the field instruments as binary files or ascii/csv files. The instruments for which files downloaded as binary provide software to view the data or export the data to csv files.

One-minute resolution time-series data files were created for each house using an R script that pulled data from the csv files, aligned data by time, executed unit conversions, and translated from instruments with longer or different data intervals (e.g. 30 min formaldehyde data and 1.5 min for anemometer data). Visual review was conducted on the compiled files (and primary csv or binary files were consulted as needed) to check for translation or writing errors (especially from terminal emulator), indications of instrument malfunction, mislabeled units or unit conversion errors, mislabeled location, and time stamp errors.

The draft final set of time-series data&nb...
n
ESG rating of general stock indices
narcis.nl
data.mendeley.com
Updated Oct 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erhart, S (via Mendeley Data) (2021). ESG rating of general stock indices [Dataset]. http://doi.org/10.17632/58mwkj5pf8.1
Explore at:
Unique identifier
https://doi.org/10.17632/58mwkj5pf8.1
Dataset updated
Oct 22, 2021
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Erhart, S (via Mendeley Data)
Description
################################################################################################## THE FILES HAVE BEEN CREATED BY SZILÁRD ERHART FOR A RESEARCH: ERHART (2021): ESG RATINGS OF GENERAL # STOCK EXCHANGE INDICES, INTERNATIONAL REVIEW OF FINANCIAL ANALYSIS# USERS OF THE FILES AGREE TO QUOTE THE ABOVE PAPER# THE PYTHON SCRIPT (PYTHONESG_ERHART.TXT) HELPS USERS TO GET TICKERS BY STOCK EXCHANGES AND EXTRACT ESG SCORES FOR THE UNDERLYING STOCKS FROM YAHOO FINANCE.# THE R SCRIPT (ESG_UA.TXT) HELPS TO REPLICATE THE MONTE CARLO EXPERIMENT DETAILED IN THE STUDY.# THE EXPORT_ALL CSV CONTAINS THE DOWNLOADED ESG DATA (SCORES, CONTROVERSIES, ETC) ORGANIZED BY STOCKS AND EXCHANGES.############################################################################################################################################################################################################### DISCLAIMER # The author takes no responsibility for the timeliness, accuracy, completeness or quality of the information provided. # The author is in no event liable for damages of any kind incurred or suffered as a result of the use or non-use of the # information presented or the use of defective or incomplete information. # The contents are subject to confirmation and not binding. # The author expressly reserves the right to alter, amend, whole and in part, # without prior notice or to discontinue publication for a period of time or even completely. ###########################################################################################################################################READ ME############################################################# BEFORE USING THE MONTE CARLO SIMULATIONS SCRIPT: # (1) COPY THE goascores.csv and goalscores_alt.csv FILES ONTO YOUR ON COMPUTER DRIVE. THE TWO FILES ARE IDENTICAL.# (2) SET THE EXACT FILE LOCATION INFORMATION IN THE 'Read in data' SECTION OF THE MONTE CARLO SCRIPT AND FOR THE OUTPUT FILES AT THE END OF THE SCRIPT# (3) LOAD MISC TOOLS AND MATRIXSTATS IN YOUR R APPLICATION# (4) RUN THE CODE.####################################READ ME
Z
Data from: # Replication code and data for: Tracking green space along...
data.niaid.nih.gov
zenodo.org
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Falchetta, Giacomo (2024). # Replication code and data for: Tracking green space along streets of world cities [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8001676
Explore at:
Dataset updated
Oct 11, 2024
Dataset provided by
Hammad, T. Ahmed
Falchetta, Giacomo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication code and data for: Tracking green space along streets of world citiesBy Giacomo Falchetta and Ahmed T. HammadPreprint: https://doi.org/10.21203/rs.3.rs-3916891/v1

To replicate the analysis, the results, and the figures of the paper:

Download input data from this Zenodo repository and code from Github https://github.com/giacfalk/urban_green_space_mapping_and_tracking

Optional data extraction steps (processed output data are already available in the Zenodo repository):

Adjust your working directory

Run [lines 4-11] of workflow/sourcer.R

Run the Javascript scripts written by the string_generator_training.R and string_generator_prediction.R files in Google Earth Engine (https://code.earthengine.google.com) and complete the export to Drive tasks to generate the output .csv files

Run workflow/sourcer.R [lines 15-46] to train the ML model and make predictions (including figures and tables replication)
f
S1 Supporting information -
plos.figshare.com
zip
Updated Oct 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jens Winther Johannsen; Julian Laabs; Magdalena M. E. Bunbury; Morten Fischer Mortensen (2024). S1 Supporting information - [Dataset]. http://doi.org/10.1371/journal.pone.0301938.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0301938.s001
Dataset updated
Oct 28, 2024
Dataset provided by
PLOS ONE
Authors
Jens Winther Johannsen; Julian Laabs; Magdalena M. E. Bunbury; Morten Fischer Mortensen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
S1 File. SI_C01_SPD_KDE_models. R-script for analysing radiocarbon dates dates. The code performs the computation of over-regional and regional SPD and KDE models, as well as their export to CSV files (Rmd). S2 File. SI_C02_aoristic_dating. R-script for exporting aoristic time series derived from typochronological dated archaeological material as CSV files (Rmd). S3 File. SI_C03_vegetation_openness_score_example. R-script performing the computation of a vegetation openness score from pollen records and the export of the generated time series as CVS file (Rmd). S4 File. SI_C04_data_preparation. Jupyter Notebook performing the import and transformation of relevant data visualize plots exhibited in the paper (ipynb). S5 File. SI_C05_figures_extra. Jupyter Notebook visualizing the plots exhibited in the paper (ipynb). S1 Data. SI_D01_reg_data_no_dups. Spread sheet holding radiocarbon dates, with the information of laboratory identification, site name, geographical coordinates, site type, material, source and regional affiliation (csv). S2 Data. SI_D02_reg_axe_dagger_graves. Spread sheet holding entries of axes and daggers, with the information of context, site, parish, artefact identification, type, subtype, absolute dating, typochonological dating, references, geographical coordinates and regional affiliations (csv). S3 Data. SI_D03_pollen_example. Spread sheet holding sample entries of the pollen records from Krageholm (neotoma Site ID 3204) and Bjäresjöholmsjön (neotoma Site ID 3017) for example run of S3 File. Record can be access via the neotoma explorer (https://apps.neotomadb.org/explorer/) with their given IDs. Each entry holds the information of the records type, regional affiliation, absolute BP and BCE dating, as well as the counts of given plant taxa (csv). S4 Data. SI_D04_PAP_303600_TOC_LOI. Table holding sample entries of TOC content, LOI and SST reconstruction of sediment core PAP_303600 for correlations of population development with Baltic sea surface temperature. Available via 10.1594/PANGAEA.883292 (tab). S5 Data. SI_D05_vos_[…]. Spread sheets holding the vegetation openness score time series of lake Belau, Vinge, Northern Jutland and Zealand (csv). (ZIP)
d
Data from: Commercial harvest and export of snapping turtles (Chelydra...
datadryad.org
search.dataone.org
zip
Updated Nov 17, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin C. Colteaux; Derek M. Johnson (2017). Commercial harvest and export of snapping turtles (Chelydra serpentina) in the United States: trends and the efficacy of size limits at reducing harvest [Dataset]. http://doi.org/10.5061/dryad.j5v05
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.j5v05
Dataset updated
Nov 17, 2017
Dataset provided by
Dryad
Authors
Benjamin C. Colteaux; Derek M. Johnson
Time period covered
Nov 16, 2016
Area covered
United States
Description
State Harvest Data (csv)Commercial snapping turtle harvest data (in individuals) for eleven states from 1998 - 2013. States reporting are Arkansas, Delaware, Iowa, Maryland, Massachusetts, Michigan, Minnesota, New Jersey, North Carolina, Pennsylvania, and Virginia.StateHarvestData.csvInput and execution code for Colteaux_Johnson_2016Attached R file includes the code described in the listed publication. The companion JAGS (just another Gibbs sampler) code is also stored in this repository under separate cover.ColteauxJohnsonNatureConservation.RJAGS model code for Colteaux_Johnson_2016Attached R file includes the JAGS (just another Gibbs sampler) code described in the listed publication. The companion input and execution code is also stored in this repository under separate cover.ColteauxJohnsonNatureConservationJAGS.R
u
Surface Water Disinfection Byproducts and Organic Matter Characterization...
data.nceas.ucsb.edu
search.dataone.org
+1more
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura T. Leonard; Curtis A. Beutler; Rosalie Chu; Robert E. Danczak; Brieanne Forbes; Vanessa A. Garayburu-Caruso; Amy E. Goldman; Stephanie S. Lau; Sophia A. McKever; William A. Mitch; Alexander W. Newman; Lupita Renteria; Jason G. Toyoda; James C. Stegen; Gary F. Vanzin; Kenneth H. Williams; Jonathon O. Sharp (2024). Surface Water Disinfection Byproducts and Organic Matter Characterization Data Associated with: “Disinfection byproducts formed during drinking water treatment reveal an export control point for dissolved organic matter in a subalpine headwater stream” [Dataset]. http://doi.org/10.15485/1969118
Explore at:
Unique identifier
https://doi.org/10.15485/1969118
Dataset updated
Aug 20, 2024
Dataset provided by
ESS-DIVE
Authors
Laura T. Leonard; Curtis A. Beutler; Rosalie Chu; Robert E. Danczak; Brieanne Forbes; Vanessa A. Garayburu-Caruso; Amy E. Goldman; Stephanie S. Lau; Sophia A. McKever; William A. Mitch; Alexander W. Newman; Lupita Renteria; Jason G. Toyoda; James C. Stegen; Gary F. Vanzin; Kenneth H. Williams; Jonathon O. Sharp
Time period covered
Jul 30, 2020 - Jul 28, 2021
Area covered

Description
This dataset is associated with the publication “Disinfection byproducts formed during drinking water treatment reveal an export control point for dissolved organic matter in a subalpine headwater stream” published in Water Research X (Leonard et al. 2022; https://doi.org/10.1016/j.wroa.2022.100144). The associated study analyzed temporal trends from the Town of Crested Butte water treatment facility and synoptic sampling at Coal Creek in Crested Butte, Colorado, US. This work demonstrates how drinking water quality archives combined with synoptic sampling and targeted analyses can be used to identify and understand export control points for dissolved organic matter. This dataset is comprised of one main data folder containing (1) file-level metadata; (2) data dictionary; (3) metadata and international geo-sample number (IGSN) mapping file; (4) dissolved organic carbon (DOC), ultraviolet absorbance at 254 nanometers (UV254), total nitrogen (TN), and specific ultraviolet absorbance (SUVA) data; (5) disinfection bioproduct formation potential (DBP-FP) data; (6) readme; (7) methods codes; (8) water collection protocol; (9) folder of high resolution characterization of organic matter via 12 Tesla Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) through the Environmental Molecular Sciences Laboratory (EMSL; https://www.pnnl.gov/environmental-molecular-sciences-laboratory); and (10) folder of excitation emissions matrix (EEM) spectra. The FTICR folder contains a file of DOC (measured as non-purgeable organic carbon; NPOC) used for FTICR sample preparation. The FTICR folder also contains three subfolders: one subfolder containing the raw .xml data files, one containing the processed data, and the other containing instructions for using Formularity (https://omics.pnl.gov/software/formularity) and an R script to process the data based on the user's specific needs. All files are .csv, .pdf, .dat, .R, .ref, or .xml
Data and script for the GenABEL paper
zenodo.org
data.niaid.nih.gov
bin, csv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lennart C. Karssen; Lennart C. Karssen; Cornelia M. Van Duijn; Yurii S. Aulchenko; Yurii S. Aulchenko; Cornelia M. Van Duijn (2020). Data and script for the GenABEL paper [Dataset]. http://doi.org/10.5281/zenodo.51008
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.51008
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lennart C. Karssen; Lennart C. Karssen; Cornelia M. Van Duijn; Yurii S. Aulchenko; Yurii S. Aulchenko; Cornelia M. Van Duijn
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset contains the automatically collected data used for the overview paper about the GenABEL Project (Karssen et al, 2016, DOI:10.12688/f1000research.8733.1). Some data used for the paper was collected manually and is therefore not included in this dataset.

The file "tracker_report-2016-04-16.csv" is an export of the bug reports from the GenABEL R-forge bug tracker on the date listed in the file name.

The file "Analytics www.genabel.org Locatie Lennart 20150428-20160428.csv" is a custom export of the Google Analytics data for visits to the GenABEL website (www.genabel.org) in the period marked by the dates listed in the file name. The columns contain the ISO code of the country, city, number of sessions, number of new viewers, bounce percentage, pages per session and average session duration, respectively.

The file analysis_GenABELpaper.org contains the source code
used for the automated data extraction for this paper in Emacs
Org mode literate programming format (http://orgmode.org, Schulte 2012, doi:10.18637/jss.v046.i03)
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1

Petre_Slide_CategoricalScatterplotFigShare.pptx

Explore at:

pptxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.3840102.v1

Dataset updated

Sep 19, 2016

Dataset provided by

figshare

Authors

Benj Petre; Aurore Coince; Sophien Kamoun

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/

Clear search

Close search

Google apps

Main menu

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2...

Data and Code for High Throughput FTIR Analysis of Macro and Microplastics...

Test data and model for the FlowCam data processing pipeline

Testing data for the processing pipeline for FlowCam data

Pipeline scripts

LabelChecker

QA/QC-ed Groundwater Level Time Series in PLM-1 and PLM-6 Monitoring Wells,...

Data supporting the Master thesis "Monitoring von Open Data Praktiken -...

2007-08 V3 CEAMARC-CASO Bathymetry Plots Over Time During Events

Data from: Indoor air quality in California homes with code-required...

ESG rating of general stock indices

Data from: # Replication code and data for: Tracking green space along...

Replication code and data for: Tracking green space along streets of world citiesBy Giacomo Falchetta and Ahmed T. HammadPreprint: https://doi.org/10.21203/rs.3.rs-3916891/v1

S1 Supporting information -

Data from: Commercial harvest and export of snapping turtles (Chelydra...

Surface Water Disinfection Byproducts and Organic Matter Characterization...

Data and script for the GenABEL paper

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate