37 datasets found

Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: A...

zenodo.org

application/gzip, bin +2

Updated Aug 2, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: A Case Study of the PyPI Ecosystem - the dataset [Dataset]. http://doi.org/10.5281/zenodo.1297925

Explore at:

text/x-python, zip, bin, application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1297925

Dataset updated

Aug 2, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb

License

https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

Description

Replication pack, FSE2018 submission #164:
------------------------------------------

**Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
A Case Study of the PyPI Ecosystem

**Note:** link to data artifacts is already included in the paper. 
Link to the code will be included in the Camera Ready version as well.


Content description
===================

- **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
 described below
- **settings.py** - settings template for the code archive.
- **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
 This dataset only includes stats aggregated by the ecosystem (PyPI)
- **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
 statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
 themselves, which take around 2TB.
- **build_model.r, helpers.r** - R files to process the survival data 
  (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
  `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
  **dataset_full_Jan_2018.tgz**)
- **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
- LICENSE - text of GPL v3, under which this dataset is published
- INSTALL.md - replication guide (~2 pages)

Replication guide
=================

Step 0 - prerequisites
----------------------

- Unix-compatible OS (Linux or OS X)
- Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
- R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)

Depending on detalization level (see Step 2 for more details):
- up to 2Tb of disk space (see Step 2 detalization levels)
- at least 16Gb of RAM (64 preferable)
- few hours to few month of processing time

Step 1 - software
----------------

- unpack **ghd-0.1.0.zip**, or clone from gitlab:

   git clone https://gitlab.com/user2589/ghd.git
   git checkout 0.1.0
 
 `cd` into the extracted folder. 
 All commands below assume it as a current directory.
  
- copy `settings.py` into the extracted folder. Edit the file:
  * set `DATASET_PATH` to some newly created folder path
  * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
- install docker. For Ubuntu Linux, the command is 
  `sudo apt-get install docker-compose`
- install libarchive and headers: `sudo apt-get install libarchive-dev`
- (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
 Without this dependency, you might get an error on the next step, 
 but it's safe to ignore.
- install Python libraries: `pip install --user -r requirements.txt` . 
- disable all APIs except GitHub (Bitbucket and Gitlab support were
 not yet implemented when this study was in progress): edit
 `scraper/init.py`, comment out everything except GitHub support
 in `PROVIDERS`.

Step 2 - obtaining the dataset
-----------------------------

The ultimate goal of this step is to get output of the Python function 
`common.utils.survival_data()` and save it into a CSV file:

  # copy and paste into a Python console
  from common import utils
  survival_data = utils.survival_data('pypi', '2008', smoothing=6)
  survival_data.to_csv('survival_data.csv')

Since full replication will take several months, here are some ways to speedup
the process:

####Option 2.a, difficulty level: easiest

Just use the precomputed data. Step 1 is not necessary under this scenario.

- extract **dataset_minimal_Jan_2018.zip**
- get `survival_data.csv`, go to the next step

####Option 2.b, difficulty level: easy

Use precomputed longitudinal feature values to build the final table.
The whole process will take 15..30 minutes.

- create a folder `

d
R-LOADEST files to produce results in the Heart River Basin, North Dakota,...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). R-LOADEST files to produce results in the Heart River Basin, North Dakota, 1970-2020 [Dataset]. https://catalog.data.gov/dataset/r-loadest-files-to-produce-results-in-the-heart-river-basin-north-dakota-1970-2020
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Heart River, North Dakota
Description
This child page contains a zipped folder which contains all of the items necessary to run load estimation using R-LOADEST to produce results that are published in U.S. Geological Survey Investigations Report 2021-XXXX [Tatge, W.S., Nustad, R.A., and Galloway, J.M., 2021, Evaluation of Salinity and Nutrient Conditions in the Heart River Basin, North Dakota, 1970-2020: U.S. Geological Survey Scientific Investigations Report 2021-XXXX, XX p]. The folder contains an allsiteinfo.table.csv file, a "datain" folder, and a "scripts" folder. The allsiteinfo.table.csv file can be used to cross reference the sites with the main report (Tatge and others, 2021). The "datain" folder contains all the input data necessary to reproduce the load estimation results. The naming convention in the "datain" folder is site_MI_rloadest or site_NUT_rloadest for either the major ion loads or the nutrient loads. The .Rdata files are used in the scripts to run the estimations and the .csv files can be used to look at the data. The "scripts" folder contains the written R scripts to produce the results of the load estimation from the main report. R-LOADEST is a software package for analyzing loads in streams and an accompanying report (Runkel and others, 2004) serves as the formal documentation for R-LOADEST. The package is a collection of functions written in R (R Development Core Team, 2019), an open source language and a general environment for statistical computing and graphics. The following system requirements are necessary for producing results: Windows 10 operating system R (version 3.4 or later; 64-bit recommended) RStudio (version 1.1.456 or later) R-LOADEST program (available at https://github.com/USGS-R/rloadest). Runkel, R.L., Crawford, C.G., and Cohn, T.A., 2004, Load Estimator (LOADEST): A FORTRAN Program for Estimating Constituent Loads in Streams and Rivers: U.S. Geological Survey Techniques and Methods Book 4, Chapter A5, 69 p., [Also available at https://pubs.usgs.gov/tm/2005/tm4A5/pdf/508final.pdf.] R Development Core Team, 2019, R—A language and environment for statistical computing: Vienna, Austria, R Foundation for Statistical Computing, accessed December 7, 2020, at https://www.r-project.org.
Data from: A dataset to model Levantine landcover and land-use change...
zenodo.org
data.niaid.nih.gov
zip
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Kempf; Michael Kempf (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.10396148
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10396148
Dataset updated
Dec 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Kempf; Michael Kempf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 16, 2023
Area covered
Levant
Description
Overview

This dataset is the repository for the following paper submitted to Data in Brief:

Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

The Data in Brief article contains the supplement information and is the related data paper to:

Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

Description/abstract

The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

Folder structure

The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

“code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

“MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

“mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

“yield_productivity” contains .csv files of yield information for all countries listed above.

“population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

“GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

“built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

Code structure

1_MODIS_NDVI_hdf_file_extraction.R

This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.

2_MERGE_MODIS_tiles.R

In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").

3_CROP_MODIS_merged_tiles.R

Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
The repository provides the already clipped and merged NDVI datasets.

4_TREND_analysis_NDVI.R

Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.

5_BUILT_UP_change_raster.R

Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.

6_POPULATION_numbers_plot.R

For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.

7_YIELD_plot.R

In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.

8_GLDAS_read_extract_trend

The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.
d
Data and scripts associated with a manuscript investigating impacts of solid...
search.dataone.org
data.nceas.ucsb.edu
+1more
Updated Aug 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alan Roebuck; Brieanne Forbes; Vanessa A. Garayburu-Caruso; Samantha Grieger; Khadijah Homolka; James C. Stegen; Allison Myers-Pigg (2023). Data and scripts associated with a manuscript investigating impacts of solid phase extraction on freshwater organic matter optical signatures and mass spectrometry pairing [Dataset]. http://doi.org/10.15485/1995543
Explore at:
Unique identifier
https://doi.org/10.15485/1995543
Dataset updated
Aug 21, 2023
Dataset provided by
ESS-DIVE
Authors
Alan Roebuck; Brieanne Forbes; Vanessa A. Garayburu-Caruso; Samantha Grieger; Khadijah Homolka; James C. Stegen; Allison Myers-Pigg
Time period covered
Aug 30, 2021 - Sep 15, 2021
Area covered

Description
This data package is associated with the publication “Investigating the impacts of solid phase extraction on dissolved organic matter optical signatures and the pairing with high-resolution mass spectrometry data in a freshwater system” submitted to “Limnology and Oceanography: Methods.” This data is an extension of the River Corridor and Watershed Biogeochemistry SFA’s Spatial Study 2021 (https://doi.org/10.15485/1898914). Other associated data and field metadata can be found at the link provided. The goal of this manuscript is to assess the impact of solid phase extraction (SPE) on the ability to pair ultra-high resolution mass spectrometry data collected from SPE extracts with optical properties collected on ambient stream samples. Forty-seven samples collected from within the Yakima River Basin, Washington were analyzed dissolved organic carbon (DOC, measured as non-purgeable organic carbon, NPOC), absorbance, and fluorescence. Samples were subsequently concentrated with SPE and reanalyzed for each measurement. The extraction efficiency for the DOC and common optical indices were calculated. In addition, SPE samples were subject to ultra-high resolution mass spectrometry and compared with the ambient and SPE generated optical data. Finally, in addition to this cross-platform inter-comparison, we further performed and intra-comparison among the high-resolution mass spectrometry data to determine the impact of sample preparation on the interpretability of results. Here, the SPE samples were prepared at 40 milligrams per liter (mg/L) based on the known DOC extraction efficiency of the samples (ranging from ~30 to ~75%) compared to the common practice of assuming the DOC extraction efficiency of freshwater samples at 60%. This data package folder consists of one main data folder with one subfolder (Data_Input). The main data folder contains (1) readme; (2) data dictionary (dd); (3) file-level metadata (flmd); (4) final data summary output from processing script; and (5) the processing script. The R-markdown processing script (SPE_Manuscript_Rmarkdown_Data_Package.rmd) contains all code needed to reproduce manuscript statistics and figures (with the exception of that stated below). The Data_Input folder has two subfolders: (1) FTICR and (2) Optics. Additionally, the Data_Input folder contains dissolved organic carbon (DOC, measured as non-purgeable organic carbon, NPOC) data (SPS_NPOC_Summary.csv) and relevant supporting Solid Phase Extraction Volume information (SPS_SPE_Volumes.csv). Methods information for the optical and FTICR data is embedded in the header rows of SPS_EEMs_Methods.csv and SPS_FTICR_Methods.csv, respectively. In addition, the data dictionary (SPS_SPE_dd.csv), file level metadata (SPS_SPE_flmd.csv), and methods codes (SPS_SPE_Methods_codes.csv) are provided. The FTICR subfolder contains all raw FTICR data as well as instructions for processing. In addition, post processed FTICR molecular information (Processed_FTICRMS_Mol.csv) and sample data (Processed_FTICRMS_Data.csv) is provided that can be directly read into R with the associated R-markdown file. The Optics subfolder contains all Absorbance and Fluorescence Spectra. Fluorescence spectra have been blank corrected, inner filter corrected, and undergone scatter removal. In addition, this folder contains Matlab code used to make a portion of Figure 1 within the manuscript, derive various spectral parameters used within the manuscript, and used for parallel factor analysis (PARAFAC) modeling. Spectral indices (SPS_SpectralIndices.csv) and PARAFAC outputs (SPS_PARAFAC_Model_Loadings.csv and SPS_PARAFAC_Sample_Scores.csv) are directly read into the associated R-markdown file.
Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...
data.niaid.nih.gov
zenodo.org
+1more
zip
Updated Dec 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies [Dataset]. https://data.niaid.nih.gov/resources?id=dryad_w3r2280w0
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.w3r2280w0
Dataset updated
Dec 7, 2023
Dataset provided by
HIV Prevention Trials Network
HIV Vaccine Trials Networkhttp://www.hvtn.org/
National Institute of Allergy and Infectious Diseaseshttp://www.niaid.nih.gov/
PEPFAR
Authors
Dylan Westfall; Mullins James
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence PCR amplicons derived from cDNA templates tagged with universal molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR and the use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Handling of the large datasets produced from SMRT-UMI sequencing was facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline), that automatically filters and parses reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination or early cycle PCR errors, resulting in highly accurate sequence datasets. The optimized SMRT-UMI sequencing method presented here represents a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus (HIV) quasispecies. Methods This serves as an overview of the analysis performed on PacBio sequence data that is summarized in Analysis Flowchart.pdf and was used as primary data for the paper by Westfall et al. "Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies" Five different PacBio sequencing datasets were used for this analysis: M027, M2199, M1567, M004, and M005 For the datasets which were indexed (M027, M2199), CCS reads from PacBio sequencing files and the chunked_demux_config files were used as input for the chunked_demux pipeline. Each config file lists the different Index primers added during PCR to each sample. The pipeline produces one fastq file for each Index primer combination in the config. For example, in dataset M027 there were 3–4 samples using each Index combination. The fastq files from each demultiplexed read set were moved to the sUMI_dUMI_comparison pipeline fastq folder for further demultiplexing by sample and consensus generation with that pipeline. More information about the chunked_demux pipeline can be found in the README.md file on GitHub. The demultiplexed read collections from the chunked_demux pipeline or CCS read files from datasets which were not indexed (M1567, M004, M005) were each used as input for the sUMI_dUMI_comparison pipeline along with each dataset's config file. Each config file contains the primer sequences for each sample (including the sample ID block in the cDNA primer) and further demultiplexes the reads to prepare data tables summarizing all of the UMI sequences and counts for each family (tagged.tar.gz) as well as consensus sequences from each sUMI and rank 1 dUMI family (consensus.tar.gz). More information about the sUMI_dUMI_comparison pipeline can be found in the paper and the README.md file on GitHub. The consensus.tar.gz and tagged.tar.gz files were moved from sUMI_dUMI_comparison pipeline directory on the server to the Pipeline_Outputs folder in this analysis directory for each dataset and appended with the dataset name (e.g. consensus_M027.tar.gz). Also in this analysis directory is a Sample_Info_Table.csv containing information about how each of the samples was prepared, such as purification methods and number of PCRs. There are also three other folders: Sequence_Analysis, Indentifying_Recombinant_Reads, and Figures. Each has an .Rmd file with the same name inside which is used to collect, summarize, and analyze the data. All of these collections of code were written and executed in RStudio to track notes and summarize results. Sequence_Analysis.Rmd has instructions to decompress all of the consensus.tar.gz files, combine them, and create two fasta files, one with all sUMI and one with all dUMI sequences. Using these as input, two data tables were created, that summarize all sequences and read counts for each sample that pass various criteria. These are used to help create Table 2 and as input for Indentifying_Recombinant_Reads.Rmd and Figures.Rmd. Next, 2 fasta files containing all of the rank 1 dUMI sequences and the matching sUMI sequences were created. These were used as input for the python script compare_seqs.py which identifies any matched sequences that are different between sUMI and dUMI read collections. This information was also used to help create Table 2. Finally, to populate the table with the number of sequences and bases in each sequence subset of interest, different sequence collections were saved and viewed in the Geneious program. To investigate the cause of sequences where the sUMI and dUMI sequences do not match, tagged.tar.gz was decompressed and for each family with discordant sUMI and dUMI sequences the reads from the UMI1_keeping directory were aligned using geneious. Reads from dUMI families failing the 0.7 filter were also aligned in Genious. The uncompressed tagged folder was then removed to save space. These read collections contain all of the reads in a UMI1 family and still include the UMI2 sequence. By examining the alignment and specifically the UMI2 sequences, the site of the discordance and its case were identified for each family as described in the paper. These alignments were saved as "Sequence Alignments.geneious". The counts of how many families were the result of PCR recombination were used in the body of the paper. Using Identifying_Recombinant_Reads.Rmd, the dUMI_ranked.csv file from each sample was extracted from all of the tagged.tar.gz files, combined and used as input to create a single dataset containing all UMI information from all samples. This file dUMI_df.csv was used as input for Figures.Rmd. Figures.Rmd used dUMI_df.csv, sequence_counts.csv, and read_counts.csv as input to create draft figures and then individual datasets for eachFigure. These were copied into Prism software to create the final figures for the paper.
F
Data from: Solar self-sufficient households as a driving factor for...
data.uni-hannover.de
.zip, r, rdata +2
Updated Dec 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institut für Kartographie und Geoinformatik (2024). Solar self-sufficient households as a driving factor for sustainability transformation [Dataset]. https://data.uni-hannover.de/sk/dataset/19503682-5752-4352-97f6-511ae31d97df
Explore at:
r(63854), .zip, r(21968), r(24773), rdata(1024592), r(3406), r(6280), txt(1431), rdata(426), rdata(408277), text/x-sh(183)Available download formats
Dataset updated
Dec 12, 2024
Dataset authored and provided by
Institut für Kartographie und Geoinformatik
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
To get the consumption model from Section 3.1, one needs load execute the file consumption_data.R. Load the data for the 3 Phases ./data/CONSUMPTION/PL1.csv, PL2.csv, PL3.csv, transform the data and build the model (starting line 225). The final consumption data can be found in one file for each year in ./data/CONSUMPTION/MEGA_CONS_list.Rdata

To get the results for the optimization problem, one needs to execute the file analyze_data.R. It provides the functions to compare production and consumption data, and to optimize for the different values (PV, MBC,).

To reproduce the figures one needs to execute the file visualize_results.R. It provides the functions to reproduce the figures.

To calculate the solar radiation that is needed in the Section Production Data, follow file calculate_total_radiation.R.

To reproduce the radiation data from from ERA5, that can be found in data.zip, do the following steps: 1. ERA5 - download the reanalysis datasets as GRIB file. For FDIR select "Total sky direct solar radiation at surface", for GHI select "Surface solar radiation downwards", and for ALBEDO select "Forecast albedo". 2. convert GRIB to csv with the file era5toGRID.sh 3. convert the csv file to the data that is used in this paper with the file convert_year_to_grid.R
d
R Program - Claims-Based Frailty Index
search.dataone.org
dataverse.harvard.edu
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bedell, Douglas (2024). R Program - Claims-Based Frailty Index [Dataset]. http://doi.org/10.7910/DVN/4Y3Y23
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/4Y3Y23
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
Bedell, Douglas
Description
This R program calculates CFI for each patient from analytic data files containing information on patient identifiers, ICD-9-CM diagnosis codes (version 32), ICD-10-CM Diagnosis Codes (version 2020), CPT codes, and HCPCS codes. NOTE: When downloading, store "CFI_ICD9CM_V32.tab" and "CFI_ICD10CM_V2020.tab" as csv files (these files are originally stored as csv files, but Dataverse automatically converts them to tab files). Please read "Frailty-Index-R-code-Guide" before proceeding. Interpretation, validation data, and annotated references are provided in "Research Background - Claims-Based Frailty Index".
Data from: Data and code from: Cover crop and crop rotation effects on...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2024). Data and code from: Cover crop and crop rotation effects on tissue and soil population dynamics of Macrophomina phaseolina and yield in no-till system - V2 [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-cover-crop-and-crop-rotation-effects-on-tissue-and-soil-population-dyna-831b9
Explore at:
Dataset updated
Aug 2, 2024
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
[Note 2023-08-14 - Supersedes version 1, https://doi.org/10.15482/USDA.ADC/1528086 ] This dataset contains all code and data necessary to reproduce the analyses in the manuscript: Mengistu, A., Read, Q. D., Sykes, V. R., Kelly, H. M., Kharel, T., & Bellaloui, N. (2023). Cover crop and crop rotation effects on tissue and soil population dynamics of Macrophomina phaseolina and yield under no-till system. Plant Disease. https://doi.org/10.1094/pdis-03-23-0443-re The .zip archive cropping-systems-1.0.zip contains data and code files. Data stem_soil_CFU_by_plant.csv: Soil disease load (SoilCFUg) and stem tissue disease load (StemCFUg) for individual plants in CFU per gram, with columns indicating year, plot ID, replicate, row, plant ID, previous crop treatment, cover crop treatment, and comments. Missing data are indicated with . yield_CFU_by_plot.csv: Yield data (YldKgHa) at the plot level in units of kg/ha, with columns indicating year, plot ID, replicate, and treatments, as well as means of soil and stem disease load at the plot level. Code cropping_system_analysis_v3.0.Rmd: RMarkdown notebook with all data processing, analysis, and visualization code equations.Rmd: RMarkdown notebook with formatted equations formatted_figs_revision.R: R script to produce figures formatted exactly as they appear in the manuscript The Rproject file cropping-systems.Rproj is used to organize the RStudio project. Scripts and notebooks used in older versions of the analysis are found in the testing/ subdirectory. Excel spreadsheets containing raw data from which the cleaned CSV files were created are found in the raw_data subdirectory.
Data from: Data and code from: Environmental influences on drying rate of...
s.cnmilf.com
agdatacommons.nal.usda.gov
+2more
Updated May 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2024). Data and code from: Environmental influences on drying rate of spray applied disinfestants from horticultural production services [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/data-and-code-from-environmental-influences-on-drying-rate-of-spray-applied-disinfestants-
Explore at:
Dataset updated
May 31, 2024
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
This dataset includes all the data and R code needed to reproduce the analyses in a forthcoming manuscript:Copes, W. E., Q. D. Read, and B. J. Smith. Environmental influences on drying rate of spray applied disinfestants from horticultural production services. PhytoFrontiers, DOI pending.Study description: Instructions for disinfestants typically specify a dose and a contact time to kill plant pathogens on production surfaces. A problem occurs when disinfestants are applied to large production areas where the evaporation rate is affected by weather conditions. The common contact time recommendation of 10 min may not be achieved under hot, sunny conditions that promote fast drying. This study is an investigation into how the evaporation rates of six commercial disinfestants vary when applied to six types of substrate materials under cool to hot and cloudy to sunny weather conditions. Initially, disinfestants with low surface tension spread out to provide 100% coverage and disinfestants with high surface tension beaded up to provide about 60% coverage when applied to hard smooth surfaces. Disinfestants applied to porous materials were quickly absorbed into the body of the material, such as wood and concrete. Even though disinfestants evaporated faster under hot sunny conditions than under cool cloudy conditions, coverage was reduced considerably in the first 2.5 min under most weather conditions and reduced to less than or equal to 50% coverage by 5 min. Dataset contents: This dataset includes R code to import the data and fit Bayesian statistical models using the model fitting software CmdStan, interfaced with R using the packages brms and cmdstanr. The models (one for 2022 and one for 2023) compare how quickly different spray-applied disinfestants dry, depending on what chemical was sprayed, what surface material it was sprayed onto, and what the weather conditions were at the time. Next, the statistical models are used to generate predictions and compare mean drying rates between the disinfestants, surface materials, and weather conditions. Finally, tables and figures are created. These files are included:Drying2022.csv: drying rate data for the 2022 experimental runWeather2022.csv: weather data for the 2022 experimental runDrying2023.csv: drying rate data for the 2023 experimental runWeather2023.csv: weather data for the 2023 experimental rundisinfestant_drying_analysis.Rmd: RMarkdown notebook with all data processing, analysis, and table creation codedisinfestant_drying_analysis.html: rendered output of notebookMS_figures.R: additional R code to create figures formatted for journal requirementsfit2022_discretetime_weather_solar.rds: fitted brms model object for 2022. This will allow users to reproduce the model prediction results without having to refit the model, which was originally fit on a high-performance computing clusterfit2023_discretetime_weather_solar.rds: fitted brms model object for 2023data_dictionary.xlsx: descriptions of each column in the CSV data files
n
ESG rating of general stock indices
narcis.nl
data.mendeley.com
Updated Oct 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erhart, S (via Mendeley Data) (2021). ESG rating of general stock indices [Dataset]. http://doi.org/10.17632/58mwkj5pf8.1
Explore at:
Unique identifier
https://doi.org/10.17632/58mwkj5pf8.1
Dataset updated
Oct 22, 2021
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Erhart, S (via Mendeley Data)
Description
################################################################################################## THE FILES HAVE BEEN CREATED BY SZILÁRD ERHART FOR A RESEARCH: ERHART (2021): ESG RATINGS OF GENERAL # STOCK EXCHANGE INDICES, INTERNATIONAL REVIEW OF FINANCIAL ANALYSIS# USERS OF THE FILES AGREE TO QUOTE THE ABOVE PAPER# THE PYTHON SCRIPT (PYTHONESG_ERHART.TXT) HELPS USERS TO GET TICKERS BY STOCK EXCHANGES AND EXTRACT ESG SCORES FOR THE UNDERLYING STOCKS FROM YAHOO FINANCE.# THE R SCRIPT (ESG_UA.TXT) HELPS TO REPLICATE THE MONTE CARLO EXPERIMENT DETAILED IN THE STUDY.# THE EXPORT_ALL CSV CONTAINS THE DOWNLOADED ESG DATA (SCORES, CONTROVERSIES, ETC) ORGANIZED BY STOCKS AND EXCHANGES.############################################################################################################################################################################################################### DISCLAIMER # The author takes no responsibility for the timeliness, accuracy, completeness or quality of the information provided. # The author is in no event liable for damages of any kind incurred or suffered as a result of the use or non-use of the # information presented or the use of defective or incomplete information. # The contents are subject to confirmation and not binding. # The author expressly reserves the right to alter, amend, whole and in part, # without prior notice or to discontinue publication for a period of time or even completely. ###########################################################################################################################################READ ME############################################################# BEFORE USING THE MONTE CARLO SIMULATIONS SCRIPT: # (1) COPY THE goascores.csv and goalscores_alt.csv FILES ONTO YOUR ON COMPUTER DRIVE. THE TWO FILES ARE IDENTICAL.# (2) SET THE EXACT FILE LOCATION INFORMATION IN THE 'Read in data' SECTION OF THE MONTE CARLO SCRIPT AND FOR THE OUTPUT FILES AT THE END OF THE SCRIPT# (3) LOAD MISC TOOLS AND MATRIXSTATS IN YOUR R APPLICATION# (4) RUN THE CODE.####################################READ ME
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...
zenodo.org
data.niaid.nih.gov
bin, csv, zip
Updated Dec 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. http://doi.org/10.5281/zenodo.6965147
Explore at:
bin, zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6965147
Dataset updated
Dec 24, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

Background

This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

Usage

The data is licensed through the Creative Commons Attribution 4.0 International.

If you have used our data and are publishing your work, we ask that you please reference both:

this database through its DOI, and

any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

Included Files

Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.

Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.

Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data

Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.

We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Clean_Data_v1-0-0.zip: contains all the downsampled data

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Database_References_v1-0-0.bib

Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

File Format: Downsampled Data

These are the "LP_

The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data

Time[s]: time in seconds since the start of the test

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: the surface temperature in degC

These data files can be easily loaded using the pandas library in Python through:

import pandas data = pandas.read_csv(data_file, index_col=0)

The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

File Format: Unreduced Data

These are the "LP_

The first column is the index of each data point

S/No: sample number recorded by the DAQ

System Date: Date and time of sample

Time[s]: time in seconds since the start of the test

C_1_Force[kN]: load cell force

C_1_Déform1[mm]: extensometer displacement

C_1_Déplacement[mm]: cross-head displacement

Eng_Stress[MPa]: engineering stress

Eng_Strain[]: engineering strain

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: specimen surface temperature in degC

The data can be loaded and used similarly to the downsampled data.

File Format: Overall_Summary

The overall summary file provides data on all the test specimens in the database. The columns include:

hidden_index: internal reference ID

grade: material grade

spec: specifications for the material

source: base material for the test specimen

id: internal name for the specimen

lp: load protocol

size: type of specimen (M8, M12, M20)

gage_length_mm_: unreduced section length in mm

avg_reduced_dia_mm_: average measured diameter for the reduced section in mm

avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm

avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm

fy_n_mpa_: nominal yield stress

fu_n_mpa_: nominal ultimate stress

t_a_deg_c_: ambient temperature in degC

date: date of test

investigator: person(s) who conducted the test

location: laboratory where test was conducted

machine: setup used to conduct test

pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control

pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control

pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control

citekey: reference corresponding to the Database_References.bib file

yield_stress_mpa_: computed yield stress in MPa

elastic_modulus_mpa_: computed elastic modulus in MPa

fracture_strain: computed average true strain across the fracture surface

c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass

file: file name of corresponding clean (downsampled) stress-strain data

File Format: Summarized_Mechanical_Props_Campaign

Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv', index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1], keep_default_na=False, na_values='')

citekey: reference in "Campaign_References.bib".

Grade: material grade.

Spec.: specifications (e.g., J2+N).

Yield Stress [MPa]: initial yield stress in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Elastic Modulus [MPa]: initial elastic modulus in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Caveats

The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:

A500

A992_Gr50

BCP325

BCR295

HYP400

S460NL

S690QL/25mm

S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm
o
useNews
osf.io
Updated Sep 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cornelius Puschmann; Mario Haim (2022). useNews [Dataset]. http://doi.org/10.17605/OSF.IO/UZCA3
Explore at:
Unique identifier
https://doi.org/10.17605/OSF.IO/UZCA3
Dataset updated
Sep 26, 2022
Dataset provided by
Center For Open Science
Authors
Cornelius Puschmann; Mario Haim
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The useNews dataset has been compiled to enable the study of online news engagement. It relies on the MediaCloud and CrowdTangle APIs as well as on data from the Reuters Digital News Report. The entire dataset builds on data from 2019 and 2020 as well as a total of 12 countries. It is free to use (subject to citing/referencing it).

The data originates from both the 2019 and the 2020 Reuters Digital News Report (http://www.digitalnewsreport.org/), media content from MediaCloud (https://mediacloud.org/) for 2019 and 2020 from all news outlets that have been used most frequently in the respective year according to the survey data, and engagement metrics for all available news-article URLs through CrowdTangle (https://www.crowdtangle.com/).

To start using the data, a total of eight data objects exist, namely one each for 2019 and 2020 for the survey, news-article meta information, news-article DFM's, and engagement metrics. To make your life easy, we've provided several packaged download options:

survey data for 2019, 2020, or both (also available in CSV format)

news-article meta data for 2019, 2020, or both (also available in CSV format)

news-article DFM's for 2019, 2020, or both

engagement data for 2019, 2020, or both (also available in CSV format)

all of 2019 or 2020

Also, if you are working with R, we have prepared a simple file to automatically download all necessary data (~1.5 GByte) at once: https://osf.io/fxmgq/

Note that all .rds files are .xz-compressed, which shouldn't bother you when you are in R. You can import all the .rds files through variable_name <- readRDS('filename.rds'), .RData (also .xz-compressed) can be imported by simply using load('filename.RData') which will load several already named objects into your R environment. To import data through other programming languages, we also provide all data in respective CSV files. These files are rather large, however, which is why we have also .xz-compressed them. DFM's, unfortunately, are not available as CSV's due to their sparsity and size.

Find out more about the data variables and dig into plenty of examples in the useNews-examples workbook: https://osf.io/snuk2/
d
Data from: Data and code from: Stem borer herbivory dependent on...
catalog.data.gov
agdatacommons.nal.usda.gov
+1more
Updated Aug 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2024). Data and code from: Stem borer herbivory dependent on interactions of sugarcane variety, associated traits, and presence of prior borer damage [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-stem-borer-herbivory-dependent-on-interactions-of-sugarcane-variety-ass-1e076
Explore at:
Dataset updated
Aug 2, 2024
Dataset provided by
Agricultural Research Service
Description
This dataset contains all the data and code needed to reproduce the analyses in the manuscript: Penn, H. J., & Read, Q. D. (2023). Stem borer herbivory dependent on interactions of sugarcane variety, associated traits, and presence of prior borer damage. Pest Management Science. https://doi.org/10.1002/ps.7843 Included are two .Rmd notebooks containing all code required to reproduce the analyses in the manuscript, two .html file of rendered notebook output, three .csv data files that are loaded and analyzed, and a .zip file of intermediate R objects that are generated during the model fitting and variable selection process. Notebook files 01_boring_analysis.Rmd: This RMarkdown notebook contains R code to read and process the raw data, create exploratory data visualizations and tables, fit a Bayesian generalized linear mixed model, extract output from the statistical model, and create graphs and tables summarizing the model output including marginal means for different varieties and contrasts between crop years. 02_trait_covariate_analysis.Rmd: This RMarkdown notebook contains R code to read raw variety-level trait data, perform feature selection based on correlations between traits, fit another generalized linear mixed model using traits as predictors, and create graphs and tables from that model output including marginal means by categorical trait and marginal trends by continuous trait. HTML files These HTML files contain the rendered output of the two RMarkdown notebooks. They were generated by Quentin Read on 2023-08-30 and 2023-08-15. 01_boring_analysis.html 02_trait_covariate_analysis.html CSV data files These files contain the raw data. To recreate the notebook output the CSV files should be at the file path project/data/ relative to where the notebook is run. Columns are described below. BoredInternodes_26April2022_no format.csv: primary data file with sugarcane borer (SCB) damage Columns A-C are the year, date, and location. All location values are the same. Column D identifies which experiment the data point was collected from. Column E, Stubble, indicates the crop year (plant cane or first stubble) Column F indicates the variety Column G indicates the plot (integer ID) Column H indicates the stalk within each plot (integer ID) Column I, # Internodes, indicates how many internodes were on the stalk Columns J-AM are numbered 1-30 and indicate whether SCB damage was observed on that internode (0 if no, 1 if yes, blank cell if that internode was not present on the stalk) Column AN indicates the experimental treatment for those rows that are part of a manipulative experiment Column AO contains notes variety_lookup.csv: summary information for the 16 varieties analyzed in this study Column A is the variety name Column B is the total number of stalks assessed for SCB damage for that variety across all years Column C is the number of years that variety is present in the data Column D, Stubble, indicates which crop years were sampled for that variety ("PC" if only plant cane, "PC, 1S" if there are data for both plant cane and first stubble crop years) Column E, SCB resistance, is a categorical designation with four values: susceptible, moderately susceptible, moderately resistant, resistant Column F is the literature reference for the SCB resistance value Select_variety_traits_12Dec2022.csv: variety-level traits for the 16 varieties analyzed in this study Column A is the variety name Column B is the SCB resistance designation as an integer Column C is the categorical SCB resistance designation (see above) Columns D-I are continuous traits from year 1 (plant cane), including sugar (Mg/ha), biomass or aboveground cane production (Mg/ha), TRS or theoretically recoverable sugar (g/kg), stalk weight of individual stalks (kg), stalk population density (stalks/ha), and fiber content of stalk (percent). Columns J-O are the same continuous traits from year 2 (first stubble) Columns P-V are categorical traits (in some cases continuous traits binned into categories): maturity timing, amount of stalk wax, amount of leaf sheath wax, amount of leaf sheath hair, tightness of leaf sheath, whether leaf sheath becomes necrotic with age, and amount of collar hair. ZIP file of intermediate R objects To recreate the notebook output without having to run computationally intensive steps, unzip the archive. The fitted model objects should be at the file path project/ relative to where the notebook is run. intermediate_R_objects.zip: This file contains intermediate R objects that are generated during the model fitting and variable selection process. You may use the R objects in the .zip file if you would like to reproduce final output including figures and tables without having to refit the computationally intensive statistical models. binom_fit_intxns_updated_only5yrs.rds: fitted brms model object for the main statistical model binom_fit_reduced.rds: fitted brms model object for the trait covariate analysis marginal_trends.RData: calculated values of the estimated marginal trends with respect to year and previous damage marginal_trend_trs.rds: calculated values of the estimated marginal trend with respect to TRS marginal_trend_fib.rds: calculated values of the estimated marginal trend with respect to fiber content Resources in this dataset:Resource Title: Sugarcane borer damage data by internode, 1993-2021. File Name: BoredInternodes_26April2022_no format.csvResource Title: Summary information for the 16 sugarcane varieties analyzed. File Name: variety_lookup.csvResource Title: Variety-level traits for the 16 sugarcane varieties analyzed. File Name: Select_variety_traits_12Dec2022.csvResource Title: RMarkdown notebook 2: trait covariate analysis. File Name: 02_trait_covariate_analysis.RmdResource Title: Rendered HTML output of notebook 2. File Name: 02_trait_covariate_analysis.htmlResource Title: RMarkdown notebook 1: main analysis. File Name: 01_boring_analysis.RmdResource Title: Rendered HTML output of notebook 1. File Name: 01_boring_analysis.htmlResource Title: Intermediate R objects. File Name: intermediate_R_objects.zip
i
MaizeB73_hic_map_v2_after_manual_curation
doi.ipk-gatersleben.de
Updated Sep 12, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marina Püpke Marone; Martin Mascher; Marina Püpke Marone (2022). MaizeB73_hic_map_v2_after_manual_curation [Dataset]. https://doi.ipk-gatersleben.de/DOI/319a58cb-df9f-4397-b0fe-d2b0fefdfccb/61917344-4c20-4e49-a4e2-1d416372e912/0
Explore at:
Dataset updated
Sep 12, 2022
Dataset provided by
Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland OT Gatersleben, Corrensstraße 3, 06466, Germany
Authors
Marina Püpke Marone; Martin Mascher; Marina Püpke Marone
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example files generated in the TRITEX assembly pipeline. These files were generated using the input datasets of maize B73 downloaded from NCBI SRA with accession numbers SRR11606869 and PRJNA391551. The reference guide map was generated using the RefGen_v5 genome of maize B73 (accession number GCA_000005005.1). The marker guide map was generated using the set of markers from the linkage map of the Intermated B73 x Mo17 (IBM) population (doi:10.1371/journal.pone.0028334). Some files are provided in RDS format (serialized R object), which can be loaded in R; others are CSV files. More details on README.txt and on the TRITEX long-read paper.
d
Data from: Reference transcriptomics of porcine peripheral immune cells...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2024). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. https://catalog.data.gov/dataset/data-from-reference-transcriptomics-of-porcine-peripheral-immune-cells-created-through-bul-e667c
Explore at:
Dataset updated
Mar 30, 2024
Dataset provided by
Agricultural Research Service
Description
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
Z
Storage and Transit Time Data and Code
data.niaid.nih.gov
zenodo.org
Updated Jun 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8136816
Explore at:
Dataset updated
Jun 12, 2024
Dataset authored and provided by
Andrew Felton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Andrew J. FeltonDate: 5/5/2024

This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis and figure production for the study entitled:

"Global estimates of the storage and transit time of water through vegetation"

Please note that 'turnover' and 'transit' are used interchangeably in this project.

Data information:

The data folder contains key data sets used for analysis. In particular:

"data/turnover_from_python/updated/annual/multi_year_average/average_annual_turnover.nc" contains a global array summarizing five year (2016-2020) averages of annual transit, storage, canopy transpiration, and number of months of data. This is the core dataset for the analysis; however, each folder has much more data, including a dataset for each year of the analysis. Data are also available is separate .csv files for each land cover type. Oterh data can be found for the minimum, monthly, and seasonal transit time found in their respective folders. These data were produced using the python code found in the "supporting_code" folder given the ease of working with .nc and EASE grid in the xarray python module. R was used primarily for data visualization purposes. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here.

Code information

Python scripts can be found in the "supporting_code" folder.

Each R script in this project has a particular function:

01_start.R: This script loads the R packages used in the analysis, sets thedirectory, and imports custom functions for the project. You can also load in the main transit time (turnover) datasets here using the source() function.

02_functions.R: This script contains the custom function for this analysis, primarily to work with importing the seasonal transit data. Load this using the source() function in the 01_start.R script.

03_generate_data.R: This script is not necessary to run and is primarilyfor documentation. The main role of this code was to import and wranglethe data needed to calculate ground-based estimates of aboveground water storage.

04_annual_turnover_storage_import.R: This script imports the annual turnover andstorage data for each landcover type. You load in these data from the 01_start.R scriptusing the source() function.

05_minimum_turnover_storage_import.R: This script imports the minimum turnover andstorage data for each landcover type. Minimum is defined as the lowest monthlyestimate.You load in these data from the 01_start.R scriptusing the source() function.

06_figures_tables.R: This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the manuscript_figures folder. Note that allmaps were produced using Python code found in the "supporting_code"" folder.
o
Single-cell Atlas Reveals Diagnostic Features Predicting Progressive Drug...
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Aug 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vaidehi Krishnan; Florian Schmidt; Zahid Nawaz; Prasanna Nori Venkatesh; Lee Kian Leong; Chan Zhu En; Alice Man Sze Cheung; Sudipto Bari; Meera Makheja; Ahmad Lajam; Pavanish Kumar; John Ouyang; Owen Rackham; William Ying Khee Hwang; Salvatore Albani; Charles Chuah; Shyam Prabhakar; Sin Tiong Ong (2021). Single-cell Atlas Reveals Diagnostic Features Predicting Progressive Drug Resistance in Chronic Myeloid Leukemia [Dataset]. http://doi.org/10.5281/zenodo.5118610
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5118610
Dataset updated
Aug 6, 2021
Authors
Vaidehi Krishnan; Florian Schmidt; Zahid Nawaz; Prasanna Nori Venkatesh; Lee Kian Leong; Chan Zhu En; Alice Man Sze Cheung; Sudipto Bari; Meera Makheja; Ahmad Lajam; Pavanish Kumar; John Ouyang; Owen Rackham; William Ying Khee Hwang; Salvatore Albani; Charles Chuah; Shyam Prabhakar; Sin Tiong Ong
Description
This archive contains data of scRNAseq and CyTOF in form of Seurat objects, txt and csv files as well as R scripts for data analysis and Figure generation. A summary of the content is provided in the following. R scripts Script to run Machine learning models predicting group specific marker genes: CML_Find_Markers_Zenodo.R Script to reproduce the majority of Main and Supplementary Figures shown in the manuscript: CML_Paper_Figures_Zenodo.R Script to run inferCNV analysis: inferCNV_Zenodo.R Script to plot NATMI analysis results:NATMI_CvsA_FC0.32_Updown_Column_plot_Zenodo.R Script to conduct sub-clustering and filtering of NK cells NK_Marker_Detection_Zenodo.R Helper scripts for plotting and DEG calculation:ComputePairWiseDE_v2.R, Seurat_DE_Heatmap_RCA_Style.R RDS files General scRNA-seq Seurat objects: scRNA-seq seurat object after QC, and cell type annotation used for most analysis in the manuscript: DUKE_DataSet_Doublets_Removed_Relabeled.RDS scRNA-seq including findings e.g. from NK analysis used in the shiny app: DUKE_final_for_Shiny_App.rds Neighborhood enrichment score computed for group A across all HSPCs: Enrichment_score_global_groupA.RDS UMAP coordinates used in the article: Layout_2D_nNeighbours_25_Metric_cosine_TCU_removed.RDS SCENIC files: Regulon set used in SCENIC: 2.6_regulons_asGeneSet.Rds AUC values computed for regulons: 3.4_regulonAUC.Rds MetaData used in SCENIC cellInfo.Rds Group specific regulons for LCS: groupSpecificRegulonsBCRAblP.RDS Patient specific regulons for LSC: patientSpecificRegulonsBCRAblP.RDS Patient specificity score for LSC: PatientSpecificRegulonSpecificityScoreBCRAblP.RDS Regulon specificty score for LSC: RegulonSpecificityScoreBCRAblP.RDS BCR-ABL1 inference: HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label.RDS UMAP for HSC with inferred BCR-ABL1 label: HSCs_CML_with_BCR-Abl_label_UMAP.RDS HSPCs with BCR-ABL1 module scores: HSPC_metacluster_74K_with_modscore_27thmay.RDS NK sub-clustering and filtering: NK object with module scores: NK_8617cells_with_modscore_1stjune.RDS Feature genes for NK cells computed with DubStepR: NK_Cells_DubStepR NK cells Seurat object excluding contaminating T and B cells: NK_cells_T_B_17_removed.RDS NK Seurat object including neighbourhood enrichment score calculations: NK_seurat_object_with_enrichment_labels_V2.RDS txt and csv files: Proportions per cluster calculated from CyTOF: CyTOF_Proportions.txt Correlation between scRNAseq and CyTOF cell type abundance: scRNAseq_Cor_Cytof.txt Correlation between manual gating and FlowSOM clustering: Manual_vs_FlowSOM.txt GSEA results: HSPC, HSC and LSC results: FINAL_GSEA_DATA_For_GGPLOT.txt NK: NK_For_Plotting.txt TFRC and HLA expression: TFRC_and_HLA_Values.txt NATMI result files: UP-regulated_mean.csv DOWN-regulated_mean.csv Gene position file used in inferCNV: inferCNV_gene_positions_hg38.txt Module scores for NK subclusters per cell: NK_Supplementary_Module_Scores.csv Compressed folders: All CyTOF raw data files: CyTOF_Data_raw.zip Results of the patient-based classifier: PatientwiseClassifier.zip Results of the single-cell based classifier: SingleCellClassifierResults.zip For general new data analysis approaches, we recommend the readers to use the Seruat object stored in DUKE_final_for_Shiny_App.rds or to use the shiny app(http://scdbm.ddnetbio.com/) and perform further analysis from there. RAW data is available at EGA upon request using Study ID: EGAS00001005509 Revision The for_CML_manuscript_revision.tar.gz folder contains scripts and data for the paper revision including 1) Detection of the BCR-ABL fusion with long read sequencing; 2) Identification of BCR-ABL junction reads with scRNAseq; 3) Detection of expressed mutations using scRNAseq.
Z
SDSS Galaxy Subset
data.niaid.nih.gov
zenodo.org
Updated Sep 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carvalho, Nuno Ramos (2022). SDSS Galaxy Subset [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6393487
Explore at:
Dataset updated
Sep 6, 2022
Dataset authored and provided by
Carvalho, Nuno Ramos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Sloan Digital Sky Survey (SDSS) is a comprehensive survey of the northern sky. This dataset contains a subset of this survey, of 100077 objects classified as galaxies, it includes a CSV file with a collection of information and a set of files for each object, namely JPG image files, FITS and spectra data. This dataset is used to train and explore the astromlp-models collection of deep learning models for galaxies characterisation.

The dataset includes a CSV data file where each row is an object from the SDSS database, and with the following columns (note that some data may not be available for all objects):

objid: unique SDSS object identifier

mjd: MJD of observation

plate: plate identifier

tile: tile identifier

fiberid: fiber identifier

run: run number

rerun: rerun number

camcol: camera column

field: field number

ra: right ascension

dec: declination

class: spectroscopic class (only objetcs with GALAXY are included)

subclass: spectroscopic subclass

modelMag_u: better of DeV/Exp magnitude fit for band u

modelMag_g: better of DeV/Exp magnitude fit for band g

modelMag_r: better of DeV/Exp magnitude fit for band r

modelMag_i: better of DeV/Exp magnitude fit for band i

modelMag_z: better of DeV/Exp magnitude fit for band z

redshift: final redshift from SDSS data z

stellarmass: stellar mass extracted from the eBOSS Firefly catalog

w1mag: WISE W1 "standard" aperture magnitude

w2mag: WISE W2 "standard" aperture magnitude

w3mag: WISE W3 "standard" aperture magnitude

w4mag: WISE W4 "standard" aperture magnitude

gz2c_f: Galaxy Zoo 2 classification from Willett et al 2013

gz2c_s: simplified version of Galaxy Zoo 2 classification (labels set)

Besides the CSV file a set of directories are included in the dataset, in each directory you'll find a list of files named after the objid column from the CSV file, with the corresponding data, the following directories tree is available:

sdss-gs/ ├── data.csv ├── fits ├── img ├── spectra └── ssel

Where, each directory contains:

img: RGB images from the object in JPEG format, 150x150 pixels, generated using the SkyServer DR16 API

fits: FITS data subsets around the object across the u, g, r, i, z bands; cut is done using the ImageCutter library

spectra: full best fit spectra data from SDSS between 4000 and 9000 wavelengths

ssel: best fit spectra data from SDSS for specific selected intervals of wavelengths discussed by Sánchez Almeida 2010

Changelog

v0.0.4 - Increase number of objects to ~100k.

v0.0.3 - Increase number of objects to ~80k.

v0.0.2 - Increase number of objects to ~60k.

v0.0.1 - Initial import.
f
Data from: Novakova.et.al_altruism.attractiveness.csv
figshare.com
txt
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie Novakova; Jaroslav Flegr; Kamila Machová; Kateřina Sýkorová; Vojtěch Zíka (2023). Novakova.et.al_altruism.attractiveness.csv [Dataset]. http://doi.org/10.6084/m9.figshare.12067407.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12067407.v1
Dataset updated
Jun 4, 2023
Dataset provided by
figshare
Authors
Julie Novakova; Jaroslav Flegr; Kamila Machová; Kateřina Sýkorová; Vojtěch Zíka
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Dataset from Novakova et al., submitted. Main aim: to investigate the effect of attractiveness priming on altruism.Data in csv format, comma as separator. Suitable for R.Columns:First column - row numbersID - participant ID, randomly assigned from a batch of numbers for each sessionSex - participant sex (0 female, 1 male)DG_control - control question assessing whether participant correctly understood rules of the Dictator Game (correct answer: "získá zbylých 100 žetonů")DG_given - number of tokens given to another player in the DG (0-400 tokens)UG_control - control question assessing whether participant correctly understood rules of the Ultimatum Game (correct answer: "pokud nabídku přijme, dostane 100 žetonů a Hráč 1 si ponechá 300 žetonů; pokud odmítne, ani jeden nedostane nic")UG_offer - offer made in the UG (0-400 tokens)UG_120offer - would the player accept a hypothetical offer of 120 out of 400 tokens from another player (0 reject, 1 accept)UG_MAO - minimum acceptable offer in the UG (0-400 tokens)DG_UG_knowledge - any previous knowledge of the games (0 no, 1 yes)DG_UG_knowledge_where - source of any previous knowledge of the games (text)age - participant agefamily_situation - participant's satisfaction with their family situation (ordinal; very bad 0 - very good 5)economic_situation - participant's satisfaction with their economic situation (ordinal; very bad 0 - very good 5)own.face_attractive - participant's assessment of the attractiveness of their face relatively to other members of their sex (ordinal; 0 very unattractive - 5 very attractive)own.body_attractive - participant's assessment of the attractiveness of their body relatively to other members of their sex (ordinal; 0 very unattractive - 5 very attractive)own.behav_attractive - participant's assessment of the attractiveness of their behavior (personality) relatively to other members of their sex (ordinal; 0 very unattractive - 5 very attractive)Self.perceived.attractiveness - sum of values from the previous three variablesromantic.part_life - number of romantic partners in the participant's life so farsexual.part_life - number of sexual partners in the participant's life so farromantic.part_year - number of romantic partners of the participant in the previous yearsexual.part_year - number of sexual partners of the participant in the previous yearextrov - question from the Ten-Item Personality Inventory: I see myself as extraverted, enthusiastic (ordinal; 1 strongly disagree, 7 strongly agree)critic_reversed - question from the Ten-Item Personality Inventory: I see myself as critical, quarrelsome (ordinal; 7 strongly disagree, 1 strongly agree)depend - question from the Ten-Item Personality Inventory: I see myself as dependable, self-disciplined (ordinal; 1 strongly disagree, 7 strongly agree)anxio_reversed - question from the Ten-Item Personality Inventory: I see myself as anxious, easily upset (ordinal; 7 strongly disagree, 1 strongly agree)open - question from the Ten-Item Personality Inventory: I see myself as open to new experiences, complex (ordinal; 1 strongly disagree, 7 strongly agree)reserved_reversed - question from the Ten-Item Personality Inventory: I see myself as reserved, quiet (ordinal; 7 strongly disagree, 1 strongly agree)sympat - question from the Ten-Item Personality Inventory: I see myself as sympathetic, warm (ordinal; 1 strongly disagree, 7 strongly agree)disorg_reversed - question from the Ten-Item Personality Inventory: I see myself as disorganized, careless (ordinal; 7 strongly disagree, 1 strongly agree)calm - question from the Ten-Item Personality Inventory: I see myself as calm, emotionally stable (ordinal; 1 strongly disagree, 7 strongly agree)convent_reversed - question from the Ten-Item Personality Inventory: I see myself as conventional, uncreative (ordinal; 7 strongly disagree, 1 strongly agree)Extraversion - extraversion measure from the Ten-Item Personality Inventory, computed as average of extrov and reserved_reversedAgreeableness - agreeableness measure from the Ten-Item Personality Inventory, computed as average of sympat and critic_reversedConscientiousness - conscientiousness measure from the Ten-Item Personality Inventory, computed as average of depend and disorg_reversedNeuroticism - neuroticism measure from the Ten-Item Personality Inventory, computed as average of calm and anxio_reversedOpenness - openness measure from the Ten-Item Personality Inventory, computed as average of open and convent_reversedorder_attractiveness - order of appearance of attractive or unattractive images shown to the participant (randomly selected by Qualtrics)priming - type of priming (0 by unattractive images, 1 by attractive images)altruism_score - scale measure of altruism (sum of z-scores of DG_given and UG_offer)Collection methodology (from Novakova et al., submitted):Subjects and course of the experimentThe data was collected at the Faculty of Science, Charles University in Prague, in June and July 2018 and 2019. Participants were invited via online recruitment on Facebook, with a maximum of 18 participants invited for each session, and took part in several tasks for a set of studies. Each session consisted of either men, or women only. Only self-reported heterosexuals were recruited for the study in order to clearly discern relationships between the measured variables on a sample of the target size. In total, 158 people participated in the experiments (74 men, 84 women, mean age=21, median of age=21).Participants were greeted at the reception and led to the computer lab with seats separated by cardboard screens, so that the subjects could not see each other or otherwise interact during the experiment. They read and signed their informed consent and were able to ask questions about it before the commencement of the experiments. At the beginning of the computer-based survey, participants viewed twenty neutral-expression frontal portrait color photos of people of the opposite sex. The priming part of the session was always overseen by an experimenter of the same sex as the subjects.The subjects were asked to rate the attractiveness of the faces on an 8-point Liekert scale. One half of the subjects was randomly allocated photos rated as less attractive on the same Liekert scale by an independent sample in a previous online questionnaire (Machová 2018), the other half was shown the twenty faces rated as more attractive. The unattractive sample of men was originally rated 2.73 on average, the attractive 3.97 (t=7.98, df=37.67, p
96 wells fluorescence reading and R code statistic for analysis
zenodo.org
bin, csv, doc, pdf
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JVD Molino; JVD Molino (2024). 96 wells fluorescence reading and R code statistic for analysis [Dataset]. http://doi.org/10.5281/zenodo.1119285
Explore at:
doc, csv, pdf, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1119285
Dataset updated
Aug 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
JVD Molino; JVD Molino
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

Data points present in this dataset were obtained following the subsequent steps: To assess the secretion efficiency of the constructs, 96 colonies from the selection plates were evaluated using the workflow presented in Figure Workflow. We picked transformed colonies and cultured in 400 μL TAP medium for 7 days in Deep-well plates (Corning Axygen®, No.: PDW500CS, Thermo Fisher Scientific Inc., Waltham, MA), covered with Breathe-Easy® (Sigma-Aldrich®). Cultivation was performed on a rotary shaker, set to 150 rpm, under constant illumination (50 μmol photons/m²s). Then 100 μL sample were transferred clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA) and fluorescence was measured using an Infinite® M200 PRO plate reader (Tecan, Männedorf, Switzerland). Fluorescence was measured at excitation 575/9 nm and emission 608/20 nm. Supernatant samples were obtained by spinning Deep-well plates at 3000 × g for 10 min and transferring 100 μL from each well to the clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA), followed by fluorescence measurement. To compare the constructs, R Statistic version 3.3.3 was used to perform one-way ANOVA (with Tukey's test), and to test statistical hypotheses, the significance level was set at 0.05. Graphs were generated in RStudio v1.0.136. The codes are deposit herein.

Info

ANOVA_Turkey_Sub.R -> code for ANOVA analysis in R statistic 3.3.3

barplot_R.R -> code to generate bar plot in R statistic 3.3.3

boxplotv2.R -> code to generate boxplot in R statistic 3.3.3

pRFU_+_bk.csv -> relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

sup_+_bl.csv -> supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

sup_raw.csv -> supernatant mCherry fluorescence dataset of 96 colonies for each construct.

who_+_bl2.csv -> whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

who_raw.csv -> whole culture mCherry fluorescence dataset of 96 colonies for each construct.

who_+_Chlo.csv -> whole culture chlorophyll fluorescence dataset of 96 colonies for each construct.

Anova_Output_Summary_Guide.pdf -> Explain the ANOVA files content

ANOVA_pRFU_+_bk.doc -> ANOVA of relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

ANOVA_sup_+_bk.doc -> ANOVA of supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

ANOVA_who_+_bk.doc -> ANOVA of whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

ANOVA_Chlo.doc -> ANOVA of whole culture chlorophyll fluorescence of all constructs, plus average and standard deviation values.

Consider citing our work.

Molino JVD, de Carvalho JCM, Mayfield SP (2018) Comparison of secretory signal peptides for heterologous protein expression in microalgae: Expanding the secretion portfolio for Chlamydomonas reinhardtii. PLoS ONE 13(2): e0192433. https://doi.org/10.1371/journal. pone.0192433

Facebook

Twitter

Click to copy link

Link copied

Cite

Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: A Case Study of the PyPI Ecosystem - the dataset

Explore at:

text/x-python, zip, bin, application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1297925

Dataset updated

Aug 2, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb

License

https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

Description

Replication pack, FSE2018 submission #164:
------------------------------------------

**Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
A Case Study of the PyPI Ecosystem

**Note:** link to data artifacts is already included in the paper. 
Link to the code will be included in the Camera Ready version as well.


Content description
===================

- **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
 described below
- **settings.py** - settings template for the code archive.
- **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
 This dataset only includes stats aggregated by the ecosystem (PyPI)
- **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
 statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
 themselves, which take around 2TB.
- **build_model.r, helpers.r** - R files to process the survival data 
  (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
  `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
  **dataset_full_Jan_2018.tgz**)
- **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
- LICENSE - text of GPL v3, under which this dataset is published
- INSTALL.md - replication guide (~2 pages)

Replication guide
=================

Step 0 - prerequisites
----------------------

- Unix-compatible OS (Linux or OS X)
- Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
- R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)

Depending on detalization level (see Step 2 for more details):
- up to 2Tb of disk space (see Step 2 detalization levels)
- at least 16Gb of RAM (64 preferable)
- few hours to few month of processing time

Step 1 - software
----------------

- unpack **ghd-0.1.0.zip**, or clone from gitlab:

   git clone https://gitlab.com/user2589/ghd.git
   git checkout 0.1.0
 
 `cd` into the extracted folder. 
 All commands below assume it as a current directory.
  
- copy `settings.py` into the extracted folder. Edit the file:
  * set `DATASET_PATH` to some newly created folder path
  * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
- install docker. For Ubuntu Linux, the command is 
  `sudo apt-get install docker-compose`
- install libarchive and headers: `sudo apt-get install libarchive-dev`
- (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
 Without this dependency, you might get an error on the next step, 
 but it's safe to ignore.
- install Python libraries: `pip install --user -r requirements.txt` . 
- disable all APIs except GitHub (Bitbucket and Gitlab support were
 not yet implemented when this study was in progress): edit
 `scraper/init.py`, comment out everything except GitHub support
 in `PROVIDERS`.

Step 2 - obtaining the dataset
-----------------------------

The ultimate goal of this step is to get output of the Python function 
`common.utils.survival_data()` and save it into a CSV file:

  # copy and paste into a Python console
  from common import utils
  survival_data = utils.survival_data('pypi', '2008', smoothing=6)
  survival_data.to_csv('survival_data.csv')

Since full replication will take several months, here are some ways to speedup
the process:

####Option 2.a, difficulty level: easiest

Just use the precomputed data. Step 1 is not necessary under this scenario.

- extract **dataset_minimal_Jan_2018.zip**
- get `survival_data.csv`, go to the next step

####Option 2.b, difficulty level: easy

Use precomputed longitudinal feature values to build the final table.
The whole process will take 15..30 minutes.

- create a folder `

Clear search

Close search

Google apps

Main menu

Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: A...

R-LOADEST files to produce results in the Heart River Basin, North Dakota,...

Data from: A dataset to model Levantine landcover and land-use change...

Data and scripts associated with a manuscript investigating impacts of solid...

Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...

Data from: Solar self-sufficient households as a driving factor for...

R Program - Claims-Based Frailty Index

Data from: Data and code from: Cover crop and crop rotation effects on...

Data from: Data and code from: Environmental influences on drying rate of...

ESG rating of general stock indices

Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

useNews

Data from: Data and code from: Stem borer herbivory dependent on...

MaizeB73_hic_map_v2_after_manual_curation

Data from: Reference transcriptomics of porcine peripheral immune cells...

Storage and Transit Time Data and Code

Code information

Single-cell Atlas Reveals Diagnostic Features Predicting Progressive Drug...

SDSS Galaxy Subset

Data from: Novakova.et.al_altruism.attractiveness.csv

96 wells fluorescence reading and R code statistic for analysis

Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: A Case Study of the PyPI Ecosystem - the datasetSee More Versions

Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: A Case Study of the PyPI Ecosystem - the dataset