95 datasets found
  1. Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

    • zenodo.org
    application/gzip, bin +2
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788
    Explore at:
    bin, application/gzip, zip, text/x-pythonAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb
    License

    https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

    Description
    Replication pack, FSE2018 submission #164:
    ------------------------------------------
    
    **Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
    A Case Study of the PyPI Ecosystem
    
    **Note:** link to data artifacts is already included in the paper. 
    Link to the code will be included in the Camera Ready version as well.
    
    
    Content description
    ===================
    
    - **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
     described below
    - **settings.py** - settings template for the code archive.
    - **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
     This dataset only includes stats aggregated by the ecosystem (PyPI)
    - **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
     statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
     themselves, which take around 2TB.
    - **build_model.r, helpers.r** - R files to process the survival data 
      (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
      `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
      **dataset_full_Jan_2018.tgz**)
    - **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
    - LICENSE - text of GPL v3, under which this dataset is published
    - INSTALL.md - replication guide (~2 pages)
    Replication guide
    =================
    
    Step 0 - prerequisites
    ----------------------
    
    - Unix-compatible OS (Linux or OS X)
    - Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
    - R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)
    
    Depending on detalization level (see Step 2 for more details):
    - up to 2Tb of disk space (see Step 2 detalization levels)
    - at least 16Gb of RAM (64 preferable)
    - few hours to few month of processing time
    
    Step 1 - software
    ----------------
    
    - unpack **ghd-0.1.0.zip**, or clone from gitlab:
    
       git clone https://gitlab.com/user2589/ghd.git
       git checkout 0.1.0
     
     `cd` into the extracted folder. 
     All commands below assume it as a current directory.
      
    - copy `settings.py` into the extracted folder. Edit the file:
      * set `DATASET_PATH` to some newly created folder path
      * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
    - install docker. For Ubuntu Linux, the command is 
      `sudo apt-get install docker-compose`
    - install libarchive and headers: `sudo apt-get install libarchive-dev`
    - (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
     Without this dependency, you might get an error on the next step, 
     but it's safe to ignore.
    - install Python libraries: `pip install --user -r requirements.txt` . 
    - disable all APIs except GitHub (Bitbucket and Gitlab support were
     not yet implemented when this study was in progress): edit
     `scraper/init.py`, comment out everything except GitHub support
     in `PROVIDERS`.
    
    Step 2 - obtaining the dataset
    -----------------------------
    
    The ultimate goal of this step is to get output of the Python function 
    `common.utils.survival_data()` and save it into a CSV file:
    
      # copy and paste into a Python console
      from common import utils
      survival_data = utils.survival_data('pypi', '2008', smoothing=6)
      survival_data.to_csv('survival_data.csv')
    
    Since full replication will take several months, here are some ways to speedup
    the process:
    
    ####Option 2.a, difficulty level: easiest
    
    Just use the precomputed data. Step 1 is not necessary under this scenario.
    
    - extract **dataset_minimal_Jan_2018.zip**
    - get `survival_data.csv`, go to the next step
    
    ####Option 2.b, difficulty level: easy
    
    Use precomputed longitudinal feature values to build the final table.
    The whole process will take 15..30 minutes.
    
    - create a folder `
  2. L

    Longitudinal Study on Reading and Writing at the Word, Sentence, and Text...

    • ldbase.org
    Updated Apr 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yusra Ahmed; Richard Wagner; Danielle Lopez (2025). Longitudinal Study on Reading and Writing at the Word, Sentence, and Text Levels [Dataset]. https://ldbase.org/datasets/29ea8617-957a-4d54-afc9-4754261c3d96
    Explore at:
    Dataset updated
    Apr 9, 2025
    Authors
    Yusra Ahmed; Richard Wagner; Danielle Lopez
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This dataset is longitudinal in nature, comprising data from school years (2007/2008-2010/2011) following students in grade 1 to grade 4. Measures were chosen to provide a wide array of both reading and writing measures, encompassing reading and writing skills at the word, sentence, and larger passage or text levels. Participants were tested on all measures once a year, approximately one year apart. Participants were first grade students in the fall of 2007 whose parents consented to participate in the longitudinal study. Participants attended six different schools in a metropolitan school district in Tallahassee, Florida. Data was gathered by trained testers during thirty to sixty minute sessions in a quiet room designated for testing at the schools. The test battery was scored in a lab by two or more raters and discrepancies in the scoring were resolved by an additional rater.

    Reading Measures Decoding Measures. The Woodcock Reading Mastery Tests-Revised (WRMT-R; Woodcock, 1987): Word Attack subtest was used to assess accuracy for decoding non-words. The Test of Word Reading Efficiency (TOWRE; Torgesen, Wagner, & Rashotte, 1999): Phonetic Decoding Efficiency (PDE) subtest was also used to assess pseudo-word reading fluency and accuracy. Both subtests were used to form a word level decoding latent factor. The WRMT-R Word Attack subtest consist of a list of non-words that are read out loud by the participant. The lists start off with letters and become increasingly more difficult to include complex non-words. Testing is discontinued after six consecutive incorrect items. The median reliability is reported to be .87 for Word Attack (Woodock, McGrew, & Mather, 2001). The TOWRE PDE requires accurately reading as many non-words as possible in 45 seconds. The TOWRE test manual reports test-retest reliability to be .90 for the PDE subtest. Sentence Reading Measures. Two forms of the Test of Silent Reading Efficiency and Comprehension (TOSREC, forms A and D; Wagner et al., 2010) were used as measures of silent reading fluency. Students were required to read brief statements (e.g., “a cow is an animal”) and verify the truthfulness of the statement by circling yes or no. Students are given three minutes to read and answer as many sentences as possible. The mean alternate forms reliability for the TOSREC ranges from .86 to .95.

    Reading Comprehension Measures. The Woodcock-Johnson-III (WJ-III) Passage Comprehension subtest (Woodcock et al., 2001) and the Woodcock Reading Mastery TestRevised Passage Comprehension subtest (WRMT-R; Woodcock, 1987) were used to provide two indicators of reading comprehension. For both of the passage comprehension subtests, students read brief passages to identify missing words. Testing is discontinued when the ceiling is reached (six consecutive wrong answers or until the last page was reached). According to the test manuals, test-retest reliability is reported to be above .90 for WRMT-R, and the median reliability coefficient for WJ-III is reported to be .92.

    Spelling Measures. The Spelling subtest from the Wide Range Achievement Test-3 (WRAT-3; Wilkinson, 1993) and the Spelling subtest from the Wechsler Individual Achievement Test-II (WIAT-II; The Psychological Corporation, 2002) were used to form a spelling factor. 14 Both spelling subtests required students to spell words with increasing difficulty from dictation. The ceiling for the WRAT3 Spelling subtest is misspelling ten consecutive words. If the first five words are not spelled correctly, the student is required to write his or her name and a series of letters and then continue spelling until they have missed ten consecutive items. The ceiling for WIAT-II is misspelling 6 consecutive words. The reliability of the WRAT-3 spelling subtest is reported to be .96 and the reliability of the WIAT-II Spelling subtest is reported to be .94.

    Written Expression Measures. The Written Expression subtest from the Wechsler Individual Achievement Test-II (WIAT-II; The Psychological Corporation, 2002) was administered. Written Expression score is based on a composite of Word Fluency and Combining Sentences in first and second grades and a composite of Word Fluency, Combining Sentences, and Paragraph tasks in third grade. In this study the Combining Sentences task was used as an indicator of writing ability at the sentence level. For this task students are asked to combine various sentences into one meaningful sentence. According to the manual, the test-retest reliability coefficient for the Written Expression subtest is .86.

    Writing Prompts. A writing composition task was also administered. Participants were asked to write a passage on a topic provided by the tester. Students were instructed to scratch out any mistakes and were not allowed to use erasers. The task was administered in groups and lasted 10 minutes. The passages for years 1 and 2 required expository writing and the passage for year 3 required narrative writing. The topics were as follows: choosing a pet for the classroom (year 1), favorite subject (year 2), a day off from school (year 3). The writing samples were transcribed into a computer database by two trained coders. In order to submit the samples to Coh-Metrix (described below) the coders also corrected the samples. Samples were corrected once for spelling and punctuation using a hard criterion (i.e., words were corrected individually for spelling errors regardless of the context, and run-on sentences were broken down into separate sentences). In addition, the samples were completely corrected using the soft criterion: corrections were made for spelling based on context (e.g., correcting there for their), punctuation, grammar, usage, and syntax (see Appendix A for examples of original and corrected transcripts). The samples that were corrected only for spelling and punctuation using the hard criterion were used for several reasons: (a) developing readers make many spelling errors which make their original samples illegible, and (b) the samples that were completely corrected do not stay true to the child’s writing ability. Accuracy of writing was not reflected in 15 the corrected samples because of the elimination of spelling errors. However, as mentioned above spelling ability was measured separately. Data on compositional fluency and complexity were obtained from Coh-Metrix. Compositional fluency refers to how much writing was done and complexity refers to the density of writing and length of sentences (Berninger et al., 2002; Wagner et al., 2010).

    Coh-Metrix Measures. The transcribed samples were analyzed using Coh-Metrix (McNamara et al., 2005; Graesser et al., 2004). Coh-Metrix is a computer scoring system that analyzes over 50 measures of coherence, cohesion, language, and readability of texts. Appendix B contains the list of variables provided by Coh-Metrix. In the present study, the variables were broadly grouped into the following categories: a) syntactic, b) semantic, c) compositional fluency, d) frequency, e) readability and f) situation model. Syntactic measures provide information on pronouns, noun phrases, verb and noun constituents, connectives, type-token ratio, and number of words before the main verb. Connectives are words such as so and because that are used to connect clauses. Causal, logical, additive and temporal connectives indicate cohesion and logical ordering of ideas. Type-token ratio is the ratio of unique words to the number of times each word is used. Semantic measures provide information on nouns, word stems, anaphors, content word overlap, Latent Semantic Analysis (LSA), concreteness, and hypernyms. Anaphors are words (such as pronouns) used to avoid repetition (e.g., she refers to a person that was previously described in the text). LSA refers to how conceptually similar each sentence is to every other sentence in the text. Concreteness refers to the level of imaginability of a word, or the extent to which words are not abstract. Concrete words have more distinctive features and can be easily pictured in the mind. Hypernym is also a measure of concreteness and refers to the conceptual taxonomic level of a word (for example, chair has 7 hypernym levels: seat -> furniture -> furnishings -> instrumentality -> artifact -> object -> entity). Compositional fluency measures include the number of paragraphs, sentences and words, as well as their average length and the frequencies of content words. Frequency indices provide information on the frequency of content words, including several transformations of the raw frequency score. Content words are nouns, adverbs, adjectives, main verbs, and other categories with rich conceptual content. Readability indices are related to fluency and include two traditional indices used to assess difficulty of text: Flesch Reading Ease Score and Flesch- 16 Kincaid Grade Level. Finally, situation model indices describe what the text is about, including causality of events and actions, intentionality of performing actions, tenses of actions and spatial information. Because Coh-Metrix hasn’t been widely used to study the development of writing in primary grade children (Puranik et al., 2010) the variables used in the present study were determined in an exploratory manner described below. Out of the 56 variables, 3 were used in the present study: total number of words, total number of sentences and average sentence length (or average number of words per sentence). Nelson and Van Meter (2007) report that total word productivity is a robust measure of developmental growth in writing. Therefore, indicators for a paragraph level factor included total number of words and total number of sentences. Average words per sentence was used as an indicator for a latent sentence level factor, along with the WIAT-II Combining Sentences task.

    Following the Sunshine State Standards, students are required to take the Florida

  3. Using R to get data from Twitter and Binance

    • kaggle.com
    Updated Nov 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Medou Neine (2019). Using R to get data from Twitter and Binance [Dataset]. https://www.kaggle.com/datasets/dodu63/using-r-to-get-data-from-twitter-and-binance
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 3, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Medou Neine
    Description

    Dataset

    This dataset was created by Medou Neine

    Contents

  4. s

    Data from: WoSIS snapshot - September 2019

    • repository.soilwise-he.eu
    • data.isric.org
    Updated Sep 6, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WoSIS snapshot - September 2019 [Dataset]. https://repository.soilwise-he.eu/cat/collections/metadata:main/items/ca880bd4-cff8-11e9-8046-0cc47adaa92c
    Explore at:
    Dataset updated
    Sep 6, 2019
    Description

    The World Soil Information Service (WoSIS) provides quality-assessed and standardised soil profile data to support digital soil mapping and environmental applications at broad scale levels. Since the release of the first ‘WoSIS snapshot’, in July 2016, many new soil data were shared with us, registered in the ISRIC data repository, and subsequently standardised in accordance with the licences specified by the data providers. Soil profile data managed in WoSIS were contributed by a wide range of data providers, therefore special attention was paid to measures for soil data quality and the standardisation of soil property definitions, soil property values (and units of measurement), and soil analytical method descriptions.

    We presently consider the following soil chemical properties (organic carbon, total carbon, total carbonate equivalent, total Nitrogen, Phosphorus (extractable-P, total-P, and P-retention), soil pH, cation exchange capacity, and electrical conductivity) and physical properties (soil texture (sand, silt, and clay), bulk density, coarse fragments, and water retention), grouped according to analytical procedures (aggregates) that are operationally comparable.

    Further, for each profile, we provide the original soil classification (FAO, WRB, USDA, and version) and horizon designations insofar as these have been specified in the source databases. Measures for geographical accuracy (i.e. location) of the point data as well as a first approximation for the uncertainty associated with the operationally defined analytical methods are presented, for possible consideration in digital soil mapping and subsequent earth system modelling.

    The present snapshot, referred to as ‘WoSIS snapshot - September 2019’, comprises 196,498 geo-referenced profiles originating from 173 countries. They represent over 832 thousand soil layers (or horizons), and over 6 million records. The actual number of observations for each property varies (greatly) between profiles and with depth, this generally depending on the objectives of the initial soil sampling programmes.

    The downloadable ZIP file has the data in TSV (tab separated values) and GeoPackage format. It contains the following files: - ReadmeFirst_WoSIS_2019dec04.pdf (546.7 KB) - wosis_201909.gpkg (2.2 GB, same data as in the tsv) - wosis_201909_attributes.tsv (8.7 KB) - wosis_201909_layers_chemical.tsv (893.5 MB) - wosis_201909_layers_physical.tsv (890.7 MB) - wosis_201909_profiles.tsv (18.8 MB)

    To read the data in R, please, uncompress the ZIP file and specify the uncompressed folder. Then use read_tsv to read the TSV files, specifying the data types for each column (c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time).

    setwd("/YourFolder/WoSIS_2019_September/") attributes = readr::read_tsv('wosis_201909_attributes.tsv', col_types='cccciicd') profiles = readr::read_tsv('wosis_201909_profiles.tsv', col_types='icccdddiicccciccccicccc') chemical = readr::read_tsv('wosis_201909_layers_chemical.tsv', col_types='iiddclcdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccc') physical = readr::read_tsv('wosis_201909_layers_physical.tsv', col_types='iiddclcdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccccdccccc')

    For more detailed instructions on how to read the data with R, please visit https://www.isric.org/accessing-wosis-using-r.

    Citation: Batjes N.H, Ribeiro E, and van Oostrum A.J.M, 2019. Standardised soil profile data for the world (WoSIS snapshot - September 2019), https://doi.org/10.17027/isric-wdcsoils.20190901. The dataset accompanies the following data paper: Batjes N.H., Ribeiro E., and van Oostrum A.J.M., 2019. Standardised soil profile data to support global mapping and modelling (WoSIS snapshot - 2019). Earth System Science Data, https://doi.org/10.5194/essd-12-299-2020.

  5. Z

    Storage and Transit Time Data and Code

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8136816
    Explore at:
    Dataset updated
    Jun 12, 2024
    Dataset authored and provided by
    Andrew Felton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Andrew J. FeltonDate: 5/5/2024

    This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis and figure production for the study entitled:

    "Global estimates of the storage and transit time of water through vegetation"

    Please note that 'turnover' and 'transit' are used interchangeably in this project.

    Data information:

    The data folder contains key data sets used for analysis. In particular:

    "data/turnover_from_python/updated/annual/multi_year_average/average_annual_turnover.nc" contains a global array summarizing five year (2016-2020) averages of annual transit, storage, canopy transpiration, and number of months of data. This is the core dataset for the analysis; however, each folder has much more data, including a dataset for each year of the analysis. Data are also available is separate .csv files for each land cover type. Oterh data can be found for the minimum, monthly, and seasonal transit time found in their respective folders. These data were produced using the python code found in the "supporting_code" folder given the ease of working with .nc and EASE grid in the xarray python module. R was used primarily for data visualization purposes. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here.

    Code information

    Python scripts can be found in the "supporting_code" folder.

    Each R script in this project has a particular function:

    01_start.R: This script loads the R packages used in the analysis, sets thedirectory, and imports custom functions for the project. You can also load in the main transit time (turnover) datasets here using the source() function.

    02_functions.R: This script contains the custom function for this analysis, primarily to work with importing the seasonal transit data. Load this using the source() function in the 01_start.R script.

    03_generate_data.R: This script is not necessary to run and is primarilyfor documentation. The main role of this code was to import and wranglethe data needed to calculate ground-based estimates of aboveground water storage.

    04_annual_turnover_storage_import.R: This script imports the annual turnover andstorage data for each landcover type. You load in these data from the 01_start.R scriptusing the source() function.

    05_minimum_turnover_storage_import.R: This script imports the minimum turnover andstorage data for each landcover type. Minimum is defined as the lowest monthlyestimate.You load in these data from the 01_start.R scriptusing the source() function.

    06_figures_tables.R: This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the manuscript_figures folder. Note that allmaps were produced using Python code found in the "supporting_code"" folder.

  6. g

    Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

    • datasearch.gesis.org
    • openicpsr.org
    Updated Feb 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaplan, Jacob (2020). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Property Stolen and Recovered (Supplement to Return A) 1960-2017 [Dataset]. http://doi.org/10.3886/E105403V3
    Explore at:
    Dataset updated
    Feb 19, 2020
    Dataset provided by
    da|ra (Registration agency for social science and economic data)
    Authors
    Kaplan, Jacob
    Description

    For any questions about this data please email me at jacob@crimedatatool.com. If you use this data, please cite it.Version 3 release notes:Adds data in the following formats: Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Version 2 release notes:Adds data for 2017.Adds a "number_of_months_reported" variable which says how many months of the year the agency reported data.Property Stolen and Recovered is a Uniform Crime Reporting (UCR) Program data set with information on the number of offenses (crimes included are murder, rape, robbery, burglary, theft/larceny, and motor vehicle theft), the value of the offense, and subcategories of the offense (e.g. for robbery it is broken down into subcategories including highway robbery, bank robbery, gas station robbery). The majority of the data relates to theft. Theft is divided into subcategories of theft such as shoplifting, theft of bicycle, theft from building, and purse snatching. For a number of items stolen (e.g. money, jewelry and previous metals, guns), the value of property stolen and and the value for property recovered is provided. This data set is also referred to as the Supplement to Return A (Offenses Known and Reported). All the data was received directly from the FBI as text or .DTA files. I created a setup file based on the documentation provided by the FBI and read the data into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here: https://github.com/jacobkap/crime_data. The Word document file available for download is the guidebook the FBI provided with the raw data which I used to create the setup file to read in data.There may be inaccuracies in the data, particularly in the group of columns starting with "auto." To reduce (but certainly not eliminate) data errors, I replaced the following values with NA for the group of columns beginning with "offenses" or "auto" as they are common data entry error values (e.g. are larger than the agency's population, are much larger than other crimes or months in same agency): 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99942. This cleaning was NOT done on the columns starting with "value."For every numeric column I replaced negative indicator values (e.g. "j" for -1) with the negative number they are supposed to be. These negative number indicators are not included in the FBI's codebook for this data but are present in the data. I used the values in the FBI's codebook for the Offenses Known and Clearances by Arrest data.To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. If an agency has used a different FIPS code in the past, check to make sure the FIPS code is the same as in this data.

  7. d

    Data from: Decision-Support Framework for Linking Regional-Scale Management...

    • catalog.data.gov
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Decision-Support Framework for Linking Regional-Scale Management Actions to Continental-Scale Conservation of Wide-Ranging Species [Dataset]. https://catalog.data.gov/dataset/decision-support-framework-for-linking-regional-scale-management-actions-to-continental-sc
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This data release presents the data, JAGS models, and R code used to manipulate data and to produce results and figures presented in the USGS Open File Report, "Decision-Support Framework for Linking Regional-Scale Management Actions to Continental-Scale Conservation of Wide-Ranging Species, (https://doi.org/10.5066/P93YTR3X). The zip folder is provided so that other can reproduce results from the integrated population model, inspect model structure and posterior simulations, conduct analyses not present in the report, and use and modify the code. Raw source data can be sourced from the USGS Bird Banding Laboratory, USFWS Surveys and Monitoring Branch, National Oceanic and Atmospheric administration, and Ducks Unlimited Canada. The zip file contains the following objects when extracted: Readme.txt: A plain text file describing each file in this directory. Figures-Pintail-IPM.r: R code that generates report figures in png, pdf, and eps format. Generates Figures 2-11 and calls source code for figures 12 and 13 found in other files. * get pintail IPM data.r: R source code that must be run to format data for the IPM code file. * getbandrecovs.r: R code that takes Bird Banding Lab data for pintail band releases and recoveries and formats for analysis. This file is called by 'get pintail IPM data.r'. File was originally written by Scott Boomer (USFWS) and modified by Erik Osnas for use for the IPM. * Model_1_post.txt: Text representation of the posterior simulations from Model 1. This file can be read by the R function dget() to produce an R list object that contain posterior draws from Model 1. The list is the BUGSoutput$sims.list object from a call to rjags::jags. * Model_2_post.txt: As above but for Model 2. * Model_S1_post.txt: As above but for Model S1. * Pintail IPM.r: This is the main file that defines the IPM models in JAGS, structures the data for JAGS, defines initial values, and calls runs the models. Outputs are text files that contains JAGS model files, R work spaces that contains all data models, and results, include the output from the jags() function. From this the BUGSoutput$sims.list object was written to text for each model. * MSY_metrics.txt: Summary of results produced from running code in source_figure_12.R. This table is a text representation of a summary of the maximum sustained yield analysis at various mean rainfall levels, used for Table 1 of report and can be reproduced by running the code in source_figure_12.R. To understand the structure of this file, you must consult the code file and understand the structure of the R objects created from that code. Otherwise, consult Figure 12 and Table 1 in report. * source_figure_12.R: R code to produce Figure 12. Code is written to work with Rworkspace output from Model 1, but can be modified to use the Model_1_post.txt file without re-running the model. This would allow use of the same posterior realizations as used in the report. * source_figure_13.R: This is the code used to product the results for Figure 13. Required here is the posterior from Model 1 and data for the Prairie Parkland Model based on Jim Devries/Ducks Unlimited data. These are described in the report text. * Data: A directory that contains the raw data used for this report. * Data/2015_LCC_Networks_shapefile: A directory that contain ESRI shapefiles for used in Figure 1 and to define the boundaries of the Landscape Conservation Cooperatives. Found at (https://www.sciencebase.gov/catalog/item/55b943ade4b09a3b01b65d78) * Data/bndg_1430_yr1960up_DBISC_03042014.csv: A comma delimited file for banded pintail from 1960 to 2014. Obtained from the USGS Bird Banding Lab. This file is used by 'getbandrecovs.r' to produce and 'm-array' used in the Integrated Population Model (IPM). A data dictionary describing the codes for each field can be found here, https://www.pwrc.usgs.gov/BBL/manual/summary.cfm * Data/cponds.csv: A comma delimited file of estimated Canadian ponds based on counts from the North American Breeding Waterfowl and Habitat Survey, 1955-2014. Given is the year, point estimate, and estimated standard error. * Data/enc_1430_yr1960up_DBISC_03042014.csv: A comma delimited file for encounters of banded pintail. Obtained from the USGS Bird Banding Lab. This file is use by 'getbandrecovs.r' to produce and 'm-array' used in the Integrated Population Model (IPM). A data dictionary describing the codes for each field can be found here, (https://www.pwrc.usgs.gov/BBL/manual/enc.cfm) * Data/nopiBPOP19552014.csv: A comma delimited file of estimated northern pintail based on counts from the North American Breeding Waterfowl and Habitat Survey, 1955-2014. Given is the year, pintail point estimate (bpop), and pintail estimated standard error (bpopSE), mean latitude of the pintail population (lat), latitude variance of the pintail population (latVAR), mean longitude of the pintail population (lon), and the variance in longitude of the pintail population (lonVAR). * Data/Summary Climate Data California CV 2.csv: Rainfall data for the California central valley downloaded from National Climate Data Center (www.ncdc.noaa.gov/cdo-web/) as described in report text (https://doi.org/10.5066/P93YTR3X) and publication found at https://doi.org/10.1002/jwmg.21124 . Used in 'get pintail IPM data.r' for IPM. * Data/Summary data MAV.csv: Rainfall data for the Mississippi Aluvial valley downloaded from National Climate Data Center (www.ncdc.noaa.gov/cdo-web/) as described in report text (https://doi.org/10.5066/P93YTR3X) and publication found at https://doi.org/10.1002/jwmg.21124 . Used in 'get pintail IPM data.r' for IPM. * Data/Wing data 1961 2011 NOPI.txt: Comma delimited text file for pintail wing age data for 1961 to 2011 from the Parts Collection Survey. Each row is an individual wing with sex cohorts 4 = male, 5 = female and age cohorts 1 = After Hatch Year and 2 = Hatch Year. Wt is a weighting factor that determines how many harvested pintails this wing represent. See USFWS documentation for the Part Collection survey for descriptions. Summing Wt for each age, sex, and year gives an estimate of the number of pintail harvested. Used in 'get pintail IPM data.r' for IPM. * Data/Wing data 2012 2013 NOPI.csv: Same as 'Wing data 1961 2011 NOPI.txt' but for years 2012 and 2013.

  8. Data from: Benzoxazinoids in roots and shoots of cereal rye (Secale cereale)...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Benzoxazinoids in roots and shoots of cereal rye (Secale cereale) and their fates in soil after cover crop termination [Dataset]. https://catalog.data.gov/dataset/data-from-benzoxazinoids-in-roots-and-shoots-of-cereal-rye-secale-cereale-and-their-fates--00c2e
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    Cover crops provide many agroecosystem services, including weed suppression, which is partially exerted through release of allelopathic benzoxazinoid (BX) compounds. This research characterizes (1) changes in concentrations of BX compounds in shoots, roots, and soil at three growth stages (GS) of cereal rye (Secale cereale L.), and (2) their degradation over time following termination. Concentrations of shoot dominant BX compounds, DIBOA-glc and DIBOA, were least at GS 83 (boot). The root dominant BX compound, HMBOA-glc, concentration was least at GS 54 (elongation). Rhizosphere soil BX concentrations were 1000 times smaller than in root tissues. Dominant compounds in soil were HMBOA-glc and HMBOA. Concentrations of BX compounds were similar for soil near root crowns and between-rows. Soil BX concentrations following cereal rye termination declined exponentially over time in three of four treatments: incorporated shoots (S) and roots (R), no-till S+R (cereal rye rolled flat), and no-till R (shoots removed), but not in no-till S. On the day following cereal rye termination, soil concentrations of HMBOA-glc and HMBOA in these three treatments increased above initial concentrations. Concentrations of these two compounds decreased the fastest while DIBOA-glc declined the slowest (half-life of 4 d in no-till S+R soil). Placement of shoots on the surface of an area where cereal rye had not grown (no-till S) did not increase soil concentrations of BX compounds. The short duration and complex dynamics of BX compounds in soil prior to and following termination illustrate the limited window for enhancing weed suppression by cereal rye allelochemicals; valuable information for programs breeding for enhanced weed suppression. In addition to the data analyzed for this article, we also include the R code. Resources in this dataset:Resource Title: BX data following termination. File Name: FinalBXsForMatt-20200908.csvResource Description: For each sample, gives the time, depth, location, and plot treatment, and then the compound concentrations. This is the principal data set analyzed with the R (anal2-cleaned.r) code, see that code for use.Resource Title: BX compounds from 3rd sampling time before termination. File Name: soil2-20201123.csvResource Description: These data are for comparison with the post termination data. They were taken at the 3rd sampling time (pre-termination), a day prior to termination. Each sample is identified with a treatment, date, and plot location, in addition to the BX concentrations. See R code (anal2-cleaned.r) for how this file is used.Resource Title: Soil location (within row versus between row) values of BX compounds. File Name: s2b.csvResource Description: Each row gives the average BX compound for each soil location (within row versus between row) for the second sample for each plot. These data are combined with bx3 (the data set read in from the file , "FinalBXsForMatt-20200908.csv"). See R (anal2-cleaned.r) code for use.Resource Title: R code for analysis of the decay (post-termination) BX data.. File Name: anal2-cleaned.rResource Description: This is the R code used to analyze the termination data. It also creates and writes out some data subsets (used for analysis and plots) that are later read in.Resource Software Recommended: R version 3.6.3,url: https://www.R-project.org/ Resource Title: Tissue BX compounds. File Name: tissues20210728b.csvResource Description: Data file holding results from a tissue analysis for BX compounds, in ug, from shoots and roots, and at various sampling times. Read into the R file, anal1-cleaned.r where it is used in a statistical analysis and to create figures.Resource Title: BX compounds from soil with a live rye cover crop. File Name: soil2-20201214.csvResource Description: BX compounds (in ng/g dry wt), by treatment, sampling time, date, and plot ID. These are data are read into the R program, anal1-cleaned.r, for analysis and to create figures. These are soil samples taken from locations with a live rye plant cover crop.Resource Title: R code for BX analyses of soil under rye and plant tissues. File Name: anal1-cleaned.rResource Description: R code for analysis of the soil BX compounds under a live rye cover crop at different growing stages, and for the analysis of tissue BX compounds. In addition to statistical analyses, code in this file creates figures, also some statistical output that is used to create a file that is later read in for figure creation (s2-CLD20220730-Stage.csv).Resource Software Recommended: R version 3.6.3,url: https://www.R-project.org/ Resource Title: Description of data files for anal2-cleaned.r. File Name: readme2.txtResource Description: Describes the input files used in the R code in anal2-cleaned.r, including descriptions and formats for each field. The file also describes some output (results) files that were uploaded to this site. This is a plain ASCII text file.Resource Title: Estimates produced by anal2-cleaned.r from statistical modeling.. File Name: Estimates20201110.csvResource Description: Estimates produced by anal2-cleaned.r from statistical modeling (see readme2.txt)Resource Title: Summary statistics from anal2-cleaned.r. File Name: CV20210412.csvResource Description: Summary statistics from anal2-cleaned.r, used for plotsResource Title: Data summaries (same as CV20210412.csv), rescaled. File Name: RESCALE-20210412.csvResource Description: Same as "CV20210412.csv" except log of data have been rescaled to minimum at least zero and maximum one, see readme2.txtResource Title: Statistical summaries for different stages. File Name: s2-CLD20220730-Stage.csvResource Description: Statistical summaries used for creating a figure (not used in paper), used in anal1-cleaned.r; data for soil BX under living rye.Resource Title: Description of data files for anal1-cleaned.r. File Name: readme1.txtResource Description: Contains general descriptions of data imported into anal1-cleaned.r, and a description of each field. Also contains some descriptions of files output by anal1-cleaned.r, used to create tables or figures.

  9. d

    Data from: The R package enerscape: A general energy landscape framework for...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Oct 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emilio Berti; Marco Davoli; Robert Buitenwerf; Alexander Dyer; Oskar Hansen; Myriam Hirt; Jens-Christian Svenning; Jördis Terlau; Ulrich Brose; Fritz Vollrath (2021). The R package enerscape: A general energy landscape framework for terrestrial movement ecology [Dataset]. http://doi.org/10.5061/dryad.wwpzgmskm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 20, 2021
    Dataset provided by
    Dryad
    Authors
    Emilio Berti; Marco Davoli; Robert Buitenwerf; Alexander Dyer; Oskar Hansen; Myriam Hirt; Jens-Christian Svenning; Jördis Terlau; Ulrich Brose; Fritz Vollrath
    Time period covered
    2021
    Description

    Ecological processes and biodiversity patterns are strongly affected by how animals move through the landscape. However, it remains challenging to predict animal movement and space use. Here we present our new R package enerscape to quantify and predict animal movement in real landscapes based on energy expenditure.

    Enerscape integrates a general locomotory model for terrestrial animals with GIS tools in order to map energy costs of movement in a given environment, resulting in energy landscapes that reflect how energy expenditures may shape habitat use. Enerscape only requires topographic data (elevation) and the body mass of the studied animal. To illustrate the potential of enerscape, we analyze the energy landscape for the Marsican bear (Ursus arctos marsicanus) in a protected area in central Italy in order to identify least-cost paths and high-connectivity areas with low energy costs of travel.
    
    
    Enerscape allowed us to identify travel routes for the bear that minimize...
    
  10. Z

    Ultra high-density 255-channel EEG-AAD dataset

    • data.niaid.nih.gov
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zink, Rob (2024). Ultra high-density 255-channel EEG-AAD dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4518753
    Explore at:
    Dataset updated
    Jun 13, 2024
    Dataset provided by
    Zink, Rob
    Mundanad Narayanan, Abhijith
    Bertrand, Alexander
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    If using this dataset, please cite the following paper above and the current Zenodo repository:A. Mundanad Narayanan, R. Zink, and A. Bertrand, "EEG miniaturization limits for stimulus decoding with EEG sensor networks", Journal of Neural Engineering, vol. 18, 2021, doi: 10.1088/1741-2552/ac2629

    Experiment*************

    This dataset contains 255-channel electroencephalography (EEG) data collected during an auditory attention decoding experiment (AAD). The EEG was recorded using a SynAmps RT device (Compumedics, Australia) at a sampling rate of 1 kHz and using active Ag/Cl electrodes. The electrodes were placed on the head according to the international 10-5 (5%) system. 30 normal hearing male subjects between 22 and 35 years old participated in the experiment. All of them signed an informed consent form approved by the KU Leuven ethical committee.

    Two Dutch stories narrated by different male speakers divided into two parts of 6 minutes each were used as the stimuli in the experiment [1]. A single trial of the experiment involved the presentation of these two parts (one of both stories) to the subject through insert phones (Etymotic ER3A) at 60dBA. These speech stimuli were filtered using a head-related transfer function (HRTF) such that the stories seemed to arrive from two distinct spatial locations, namely left and right with respect to the subject with 180 degrees separation. In each trial, the subjects were asked to attend to only one ear while ignoring the other. Four trials of 6 minutes each were carried out, in which each story part is used twice. The order of presentations was randomized and balanced over different subjects. Thus approximately 24 minutes of EEG data was recorded per subject.

    File organization and details********************************

    The EEG data of each of the 30 subjects are uploaded as a ZIP file with the name Sx.tar.gzip here x=0,1,2,..,29. When a zip file is extracted, the EEG data are in their original raw format as recorded by the CURRY software [2]. The data files of each recording consist of four files with the same name but different extensions, namely, .dat, ,dap, .rs3 and .ceo. The name of each file follows the following convention: Sx_AAD_P. With P taking one of the following values for each file:1. 1L2. 1R3. 2L4. 2R

    The letter 'L' or 'R' in P indicates the attended direction of each subject in a recording: left and right respectively. A MATLAB function to read the software is provided in the directory called scripts. A python function to read the file is available in this Github repository [3].The original version of stimuli presented to subjects, i.e. without the HRTF filtering, can be found after extracting the stimuli.zip file in WAV format. There are 4 WAV files corresponding to the two parts of each of the two stories. These files have been sampled at 44.1 kHz. The order of presentation of these WAV files is given in the table below: Stimuli presentation and attention information of files

    Trial (P) Stimuli: Left-ear Stimuli: Right-ear Attention

    1L part1_track1_dry part1_track2_dry Left

    1R part1_track1_dry part1_track2_dry Right

    2L part2_track2_dry part2_track1_dry Left

    2R part2_track2_dry part2_track1_dry Right

    Additional files (after extracting scripts.zip and misc.zip):

    scripts/sample_script.m: Demonstrates reading an EEG-AAD recording and extracting the start and end of the experiment.

    misc/channel-layout.jpeg: The 255-channel EEG cap layout

    misc/eeg255ch_locs.csv: The channel names, numbers and their spherical (theta and phi) scalp coordinates.

    [1] Radioboeken voor kinderen, http://radioboeken.eu/kinderradioboeken.php?lang=NL, 2007 (Accessed: 8 Feb 2021)

    [2] CURRY 8 X – Data Acquisition and Online Processing, https://compumedicsneuroscan.com/product/curry-data-acquisition-online-processing-x/ (Accessed: 8, Feb, 2021)

    [3] Abhijith Mundanad Narayanan, "EEG analysis in python", 2021. https://github.com/mabhijithn/eeg-analyse , (Accessed: 8 Feb, 2021)

  11. d

    R Program - Claims-Based Frailty Index

    • search.dataone.org
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bedell, Douglas (2024). R Program - Claims-Based Frailty Index [Dataset]. http://doi.org/10.7910/DVN/4Y3Y23
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Bedell, Douglas
    Description

    This R program calculates CFI for each patient from analytic data files containing information on patient identifiers, ICD-9-CM diagnosis codes (version 32), ICD-10-CM Diagnosis Codes (version 2020), CPT codes, and HCPCS codes. NOTE: When downloading, store "CFI_ICD9CM_V32.tab" and "CFI_ICD10CM_V2020.tab" as csv files (these files are originally stored as csv files, but Dataverse automatically converts them to tab files). Please read "Frailty-Index-R-code-Guide" before proceeding. Interpretation, validation data, and annotated references are provided in "Research Background - Claims-Based Frailty Index".

  12. Spire live and historical data

    • earth.esa.int
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Space Agency, Spire live and historical data [Dataset]. https://earth.esa.int/eogateway/catalog/spire-live-and-historical-data
    Explore at:
    Dataset authored and provided by
    European Space Agencyhttp://www.esa.int/
    License

    https://earth.esa.int/eogateway/documents/20142/1560778/ESA-Third-Party-Missions-Terms-and-Conditions.pdfhttps://earth.esa.int/eogateway/documents/20142/1560778/ESA-Third-Party-Missions-Terms-and-Conditions.pdf

    Description

    The data collected by Spire from its 100 satellites launched into Low Earth Orbit (LEO) has a diverse range of applications, from analysis of global trade patterns and commodity flows to aircraft routing to weather forecasting. The data also provides interesting research opportunities on topics as varied as ocean currents and GNSS-based planetary boundary layer height. The following products can be requested: GNSS Polarimetric Radio Occultation (STRATOS) Novel Polarimetric Radio Occultation (PRO) measurements collected by three Spire satellites are available over 15 May 2023 to 30 November 2023. PRO differ from regular RO (described below) in that the H and V polarizations of the signal are available, as opposed to only Right-Handed Circularly Polarized (RHCP) signals in regular RO. The differential phase shift between H and V correlates with the presence of hydrometeors (ice crystals, rain, snow, etc.). When combined, the H and V information provides the same information on atmospheric thermodynamic properties as RO: temperature, humidity, and pressure, based on the signal’s bending angle. Various levels of the products are provided. GNSS Reflectometry (STRATOS) GNSS Reflectometry (GNSS-R) is a technique to measure Earth’s surface properties using reflections of GNSS signals in the form of a bistatic radar. Spire collects two types of GNSS-R data: Near-Nadir incidence LHCP reflections collected by the Spire GNSS-R satellites, and Grazing-Angle GNSS-R (i.e., low elevation angle) RHCP reflections collected by the Spire GNSS-RO satellites. The Near-Nadir GNSS-R collects DDM (Delay Doppler Map) reflectivity measurements. These are used to compute ocean wind / wave conditions and soil moisture over land. The Grazing-Angle GNSS-R collects 50 Hz reflectivity and additionally carrier phase observations. These are used for altimetry and characterization of smooth surfaces (such as ice and inland water). Derived Level 1 and Level 2 products are available, as well as some special Level 0 raw intermediate frequency (IF) data. Historical grazing angle GNSS-R data are available from May 2019 to the present, while near-nadir GNSS-R data are available from December 2020 to the present. Name Temporal coverage Spatial coverage Description Data format and content Application Polarimetric Radio Occultation (PRO) measurements 15 May 2023 to 30 November 2023 Global PRO measurements observe the properties of GNSS signals as they pass through by Earth's atmosphere, similar to regular RO measurements. The polarization state of the signals is recorded separately for H and V polarizations to provide information on the anisotropy of hydrometeors along the propagation path leoOrb.sp3. This file contains the estimated position, velocity and receiver clock error of a given Spire satellite after processing of the POD observation file proObs. Level 0 - Raw open loop carrier phase measurements at 50 Hz sampling for both linear polarization components (horizontal and vertical) of the occulted GNSS signal. h(v)(c)atmPhs. Level 1B - Atmospheric excess phase delay computed for each individual linear polarization component (hatmPhs, vatmPhs) and for the combined (“H” + “V”) signal (catmPhs). Also contains values for signal-to-noise ratio, transmitter and receiver positions and open loop model information. polPhs. Level 1C - Combines the information from the hatmPhs and vatmPhs files while removing phase continuities due to phase wrapping and navigation bit modulation. patmPrf. Level 2 - Bending angle, dry refractivity, and dry temperature as a function of mean sea level altitude and impact parameter derived from the “combined” excess phase delay (catmPhs) PRO measurements add a sensitivity to ice and precipitation content alongside the traditional RO measurements of the atmospheric temperature, pressure, and water vapor. Near-Nadir GNSS Reflectometry (NN GNSS-R) measurements 25 January-2024 to 24 July 2024 Global Tracks of surface reflections as observed by the near-nadir pointing GNSS-R antennas, based on Delay Doppler Maps (DDMs). gbrRCS.nc. Level 1B - Along-track calibrated bistatic radar cross-sections measured by Spire conventional GNSS-R satellites. gbrNRCS.nc. Level 1B - Along-track calibrated bistatic and normalized radar cross-sections measured by Spire conventional GNSS-R satellites. gbrSSM.nc. Level 2 - Along-track SNR, reflectivity, and retrievals of soil moisture (and associated uncertainties) and probability of frozen ground. gbrOcn.nc. Level 2 - Along-track retrievals of mean square slope (MSS) of the sea surface, wind speed, sigma0, and associated uncertainties. NN GNSS-R measurements are used to measure ocean surface winds and characterize land surfaces for applications such as soil moisture, freeze/thaw monitoring, flooding detection, inland water body delineation, sea ice classification, etc. Grazing angle GNSS Reflectometry (GA GNSS-R) measurements 25 January 2024 to 24 July 2024 Global Tracks of surface reflections as observed by the limb-facing RO antennas, based on open-loop tracking outputs: 50 Hz collections of accumulated I/Q observations grzRfl.nc. Level 1B - Along-track SNR, reflectivity, phase delay (with respect to an open loop model) and low-level observables and bistatic radar geometries such as receiver, specular reflection, and the transmitter locations. grzIce.nc. Level 2 - Along-track water vs sea ice classification, along with sea ice type classification. grzAlt.nc. Level 2 - Along-track phase-delay, ionosphere-corrected altimetry, tropospheric delay, and ancillary models (mean sea surface, tides). GA GNSS-R measurements are used to 1) characterize land surfaces for applications such as sea ice classification, freeze/thaw monitoring, inland water body detection and delineation, etc., and 2) measure relative altimetry with dm-level precision for inland water bodies, river slopes, sea ice freeboard, etc., but also water vapor characterization from delay based on tropospheric delays. Additionally, the following products (better detailed in the ToA) can be requested but the acceptance is not guaranteed and shall be evaluated on a case-by-case basis: Other STRATOS measurements: profiles of the Earth’s atmosphere and ionosphere, from December 2018 ADS-B Data Stream: monthly subscription to global ADS-B satellite data, available from December 2018 AIS messages: AIS messages observed from Spire satellites (S-AIS) and terrestrial from partner sensor stations (T-AIS), monthly subscription available from June 2016 The products are available as part of the Spire provision with worldwide coverage. All details about the data provision, data access conditions and quota assignment procedure are described in the Terms of Applicability.

  13. o

    Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

    • openicpsr.org
    Updated May 18, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Kaplan (2018). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1991-2021 [Dataset]. http://doi.org/10.3886/E103500V9
    Explore at:
    Dataset updated
    May 18, 2018
    Dataset provided by
    Princeton University
    Authors
    Jacob Kaplan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1991 - 2021
    Area covered
    United States
    Description

    !!!WARNING~~~This dataset has a large number of flaws and is unable to properly answer many questions that people generally use it to answer, such as whether national hate crimes are changing (or at least they use the data so improperly that they get the wrong answer). A large number of people using this data (academics, advocates, reporting, US Congress) do so inappropriately and get the wrong answer to their questions as a result. Indeed, many published papers using this data should be retracted. Before using this data I highly recommend that you thoroughly read my book on UCR data, particularly the chapter on hate crimes (https://ucrbook.com/hate-crimes.html) as well as the FBI's own manual on this data. The questions you could potentially answer well are relatively narrow and generally exclude any causal relationships. ~~~WARNING!!!For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 9 release notes:Adds 2021 data.Version 8 release notes:Adds 2019 and 2020 data. Please note that the FBI has retired UCR data ending in 2020 data so this will be the last UCR hate crime data they release. Changes .rda file to .rds.Version 7 release notes:Changes release notes description, does not change data.Version 6 release notes:Adds 2018 dataVersion 5 release notes:Adds data in the following formats: SPSS, SAS, and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Adds data for 1991.Fixes bug where bias motivation "anti-lesbian, gay, bisexual, or transgender, mixed group (lgbt)" was labeled "anti-homosexual (gay and lesbian)" prior to 2013 causing there to be two columns and zero values for years with the wrong label.All data is now directly from the FBI, not NACJD. The data initially comes as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), made all character values lower case, reordered columns. I also generated incident month, weekday, and month-day variables from the incident date variable included in the original data.

  14. s

    Multi-site assessment of reproducibility in high-content live cell imaging...

    • figshare.scilifelab.se
    • researchdata.se
    bin
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jianjiang Hu; Xavier Serra-Picamal; Gert-Jan Bakker; Marleen Van Troys; Sabina Winograd-katz; Nil Ege; Xiaowei Gong; Yuliia Didan; Inna Grosheva; Omer Polansky; Karima Bakkali; Evelien Van Hamme; Merijn van Erp; Manon Vullings; Felix Weiss; Jarama Clucas; Anna Dowbaj; Erik Sahai; Christophe Ampe; Benjamin Geiger; Peter Friedl; Matteo Bottai; Staffan Strömblad (2025). Multi-site assessment of reproducibility in high-content live cell imaging data [Dataset]. http://doi.org/10.17044/scilifelab.21407402.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Karolinska institutet; Radboud University Medical Center
    Authors
    Jianjiang Hu; Xavier Serra-Picamal; Gert-Jan Bakker; Marleen Van Troys; Sabina Winograd-katz; Nil Ege; Xiaowei Gong; Yuliia Didan; Inna Grosheva; Omer Polansky; Karima Bakkali; Evelien Van Hamme; Merijn van Erp; Manon Vullings; Felix Weiss; Jarama Clucas; Anna Dowbaj; Erik Sahai; Christophe Ampe; Benjamin Geiger; Peter Friedl; Matteo Bottai; Staffan Strömblad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the raw images as well as the analysis pipelines and scripts used in the paper "Multi-site assessment of reproducibility in high-content live cell imaging data".

    The Original data-2D.rar file contains the raw timelapse images of HT1080 cell line stably expressing H2B-EGFP and Lifeact-mCherry seeded on collagen I coated glass surface. Migration behavior of the cells was recorded in 5 min intervals for 6 h with fluorescent light microscopes equipped with environmental chamber. The experiment was performed by 3 labs, 3 person in each lab, 3 independent experiments by each person, 3 technical replicates in each experiment, and two conditions (control and ROCK inhibition) for each technical repliates.

    The Data processing and analysis-2D.rar file contains the Matlab, CellProfiler, ImageJ, and R pipelines and scripts used in this study to process, quantify, and analyze the images. Detailed procedure could be found in the "Image processing and analysis procedures.txt" file within this .rar file.

    The 3D Image data from Lab 1.zip and 3D Image data from Lab 2.zip contain the raw images and the quantified results of the 3D migration assay from Lab 1 and Lab 2, respectively. The experiment was performed with HT1080 cell line stably expressing H2B-EGFP and Lifeact-mCherry embedded in 2.5mg/ml or 6mg/ml collagen I gels. The invasion of the cells from 3D spheroid was recorded with confocal microscopy 24 h after seeding. The experiment was performed by 2 labs, 3 independent experiments in each lab, 3 technical replicates in each experiment, and two conditions (2.5 mg/ml and 6 mg/ml of collagen I) for each technical repliates.

    The Meta data of the 3D experiment.zip contains the meta data of the 3D image data from Lab 1 (Radboudumc) and Lab 2 (Crick) as well as the software to read the meta data. After unzipping, ISAcreator program should be used to read the ISAfiles of Lab 1 or Lab 2.

    The Fiji Plugins and parameters for 3D image data analysis.rar contains the Fiji plugins and also the parameters used during the 3D image data analysis.

    The 3D Data Analysis Scripts.rar contains the R scripts used in this study to analyze the 3D data set, as well as the quantified results needed by the R scripts.

    The Supplementary Materials 2-8.rar contains 2D experimental protocol (supplementary materials 2-4), 2D experimental survey (supplementary materials 3), and 3D experimental and image analysis protocols (supplementary materials 5-8) that are used in this study.

    We encourage reuse using the same CC BY 4.0 License.

  15. Data and scripts for "The importance of within-log sampling replication in...

    • zenodo.org
    bin, csv
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Domenica Naranjo Orrico; Domenica Naranjo Orrico; Jenna Purhonen; Jenna Purhonen; Brendan Furneaux; Brendan Furneaux; Katri Ketola; Otso Ovaskainen; Otso Ovaskainen; Nerea Abrego; Nerea Abrego; Katri Ketola (2025). Data and scripts for "The importance of within-log sampling replication in bark- and wood-inhabiting fungal metabarcoding studies" [Dataset]. http://doi.org/10.5281/zenodo.15323471
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Domenica Naranjo Orrico; Domenica Naranjo Orrico; Jenna Purhonen; Jenna Purhonen; Brendan Furneaux; Brendan Furneaux; Katri Ketola; Otso Ovaskainen; Otso Ovaskainen; Nerea Abrego; Nerea Abrego; Katri Ketola
    Description

    Data and scripts for reproducing the analyses of Naranjo-Orrico et al, "The importance of within-log sampling replication in bark- and wood-inhabiting fungal metabarcoding studies".

    The input data consists of the following four files, "Alldata.Rdata", "data_SbVenn_meta&morpho.Rdata" , "Xmorpho.csv" and "Ymorpho_1.csv". The former two files are in R format and the latter two in CSV format. The R files need to be loaded using the function load, and the CSV files with the function read.csv2 in R.

    "Alldata.Rdata" includes in total 15 input data matrices:

    - Metadata for dataset A (meta22)

    - Metadata for dataset B (meta23)

    - Two sample x OTU tables for dataset A including the number of reads for each OTU (otu.table.plausible.2022 for the plausible OTU taxonomic identifications and otu.table.reliable.2022 for the reliable taxonomic identifications).

    - Two sample x OTU tables for dataset B including the number of reads for each OTU (otu.table.plausible.2023 for the plausible OTU taxonomic identifications and otu.table.reliable.2023 for the reliable OTU taxonomic identifications).

    - Two sample x OTU tables for dataset A including the relative read counts per OTU (otu.table.plausible.w.2022 and otu.table.reliable.w.2022).

    - Two sample x OTU tables for dataset B including the relative read counts per OTU (otu.table.plausible.w.2023 and otu.table.reliable.w.2023).

    - Read counts per sample during the different phases of the bioinformatics pipeline for dataset A (read.counts.plausible.2022) and for dataset B (read.counts.plausible.2023).

    - Taxonomic information at all taxonomic levels (i.e., form species to phylum) of the identified OTUs (taxonomy.plausible)

    - Guild assignments matrices for dataset A (Guilds_plausible_tax_2022) and for dataset B Guilds_plausible_tax_2023).

    "data_SbVenn_meta&morpho.Rdata" contains four matrices:

    - Occurrence of the lichenized OTUs identified through metabarcoding including identifications at any taxonomic level (i.e., genus or family levels when species level identifications were not achieved) (SbVenn_Lmeta).

    - Occurrence of the lichenized OTUs identified through metabarcoding including identifications at the species-only level (SB_Venn_clean_meta).

    - Occurrence of the morphologically identified lichenized fungi, including identifications at the genus level and morphospecies (SbVenn_Lmorpho)

    - Occurrences of the morphologically identified lichenized fungi, including identifications at the species-only level (SbVenn_clean_morpho).

    "Xmorpho.csv" and "Ymorpho_1.csv" contain respectively the metadata and the presence-absence data of the morphologically identified lichens.

    “Alldata.Rdata” is used in all the scripts, "data_SbVenn_meta&morpho.Rdata" is only needed for the script "S8_Venn Diagrams.R", and the the files "Xmorpho.csv" and "Ymorpho_1.csv" are used in "S11_Meta vs Morpho species richnes between tree sp and tree part.R".

    The statistical analyses consist of joint species distribution modelling with the package Hmsc, generalized linear mixed models (GLMM) with the package glmer, and non-metric multidimensional scaling analysis (NMDS) with the package vegan. To perform the HMSC analyses, the first FOUR scripts need to be run consecutively from S1 (A and B) to S3. S1A defines the first model using data A, and S1B defines the second model using dataset B. S2 fits the models fitted in the study (which include presence-absence with a different set of explanatory variables). S3 shows the parameter estimates from the fitted models, in particular the beta parameters and the variance partitioning across environmental covariates. For fitting and showing the outputs of the GLMM models only S4 is needed. In S5, runs the NMDS analyses. The rest of the scripts, S6-S11 are used to produce the different plots shown in the study of Naranjo-Orrico et al., including pieplots, boxplots, barplots, and Vennplots.

  16. u

    Data from: Data and code from: Mycotoxin contamination & the nutritional...

    • agdatacommons.nal.usda.gov
    • gimi9.com
    • +1more
    xlsx
    Updated Sep 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Pokoo-Aikins; Callie M. McDonough; Trevor R. Mitchell; Jaci A. Hawkins; Lincoln F. Adams; Quentin Read; Xiang Li; Revathi Shanmugasundaram; ElsiAnna Rodewald; Pratima Acharya; Anthony E. Glenn; Scott E. Gold (2024). Data and code from: Mycotoxin contamination & the nutritional content of corn targeted for animal feed [Dataset]. http://doi.org/10.15482/USDA.ADC/26956279.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Sep 13, 2024
    Dataset provided by
    Ag Data Commons
    Authors
    Anthony Pokoo-Aikins; Callie M. McDonough; Trevor R. Mitchell; Jaci A. Hawkins; Lincoln F. Adams; Quentin Read; Xiang Li; Revathi Shanmugasundaram; ElsiAnna Rodewald; Pratima Acharya; Anthony E. Glenn; Scott E. Gold
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains raw data (Excel spreadsheet, .xlsx), R statistical code (RMarkdown notebook, .Rmd), and rendered output of the R notebook (HTML). This comprises all raw data and code needed to reproduce the analyses in the manuscript:Pokoo-Aikins, A., C. M. McDonough, T. R. Mitchell, J. A. Hawkins, L. F. Adams, Q. D. Read, X. Li, R. Shanmugasundaram, E. Rodewald, P. Acharya, A. E. Glenn, and S. E. Gold. 2024. Mycotoxin contamination and the nutritional content of corn targeted for animal feed. Poultry Science, 104303. DOI: 10.1016/j.psj.2024.104303.The data consist of the mycotoxin concentration, nutrient content, and color of different samples of corn (maize). We model the effect of mycotoxin concentration on the concentration of several different nutrients in corn. We include main effects of the different mycotoxins as well as two-way interactions between each pair of mycotoxins. We also include analysis of mycotoxin effects on the L variable from the color analysis, because it seems to be the one most important for determining the overall color of the corn. We use AIC to compare the models with and without interaction terms. We find that the models without interaction terms are better so we omit the interactions. We present adjusted R-squared values for each model as well as the p-values associated with the average slopes (effect of each mycotoxin on each nutrient). Finally, we produce the figures that appear in the above cited manuscript.Column metadata can be found in the Excel spreadsheet.Included filesCombined LCMS NIR Color data.xlsx: Excel file with all raw data (sheet 1) and column metadata (sheet 2).corn_mycotoxin_analysis_archived.Rmd: RMarkdown notebook with all analysis codecorn_mycotoxin_analysis_archived.html: rendered output of R notebook

  17. Data: An injectable meta-biomaterial

    • zenodo.org
    Updated Aug 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amélie Béduer; Fabien Bonini; Connor Verheyen; Patrick Burch; Thomas Braschler; Thomas Braschler; Amélie Béduer; Fabien Bonini; Connor Verheyen; Patrick Burch (2021). Data: An injectable meta-biomaterial [Dataset]. http://doi.org/10.5281/zenodo.2653804
    Explore at:
    Dataset updated
    Aug 27, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Amélie Béduer; Fabien Bonini; Connor Verheyen; Patrick Burch; Thomas Braschler; Thomas Braschler; Amélie Béduer; Fabien Bonini; Connor Verheyen; Patrick Burch
    Description

    Dataset supporting the manuscript "An injectable meta-biomaterial" by the authors of this dataset.

    Where to start

    A recommended starting point in using this data set is to reproduce the figures of the manuscript "An injectable meta-biomaterial". This illustrates the use and meaning of some of the major variables.

    For this, unzip "08 Evaluation.zip" and, best in a separate location, "09 R Figure plotting.zip". The file "09 R Figure plotting.zip" contains the scripts used to generate the figures of the manuscript. Each of these scripts explains which data files, contained in "08 Evaluation.zip", need to be loaded for this purposes. Basic plotting of the figures does not require any R library installation; however, if the publicly available effsize R library is installed, some effect size statistics are also evaluated by the scripts in "09 Figure plotting.zip"

    Another starting point is the particleShear simulation. There is a quick install guide in the supplementaries of fo the manuscript that should be followed for this purpose (Supplementary 17).

    Detailed file description

    It follows a more in-depth description of the various files contained in this archive

    Raw data files

    zip "01 Raw Simulation.zip": Simulation data of spherical particle assemblies under shear with various degress and geometries of crosslinking, created by the Python package particleShear. For each simulation, there is one text file containing a description of the simulation parameters and the main results, and one .rda file containing the various stress tensor components tabulated over time. These .rda files were generated from the original text output file produced by the particleShear package to both save space and decrease file reading time.

    zip file "02 Raw Rheology.zip": Rheological characterization of the novel "EPI biomaterial" described in the manuscript "An injectable meta-biomaterial", along with reference materials (Juvederm Voluma, Sephacryl S200)

    zip file "03 Raw Porosity.zip": Porosity characterziation of the "EPI biomaterial" and Sephacryl S200.

    zip "04 Raw Ejection Force.zip" Ejection force for injectability of the "EPI biomaterial".

    zip "05 Raw Uniaxial compression.zip" of the "EPI biomaterial".

    zip "06 Raw In vivo.zip" In-vivo performance data of the "EPI biomaterial", comparison with Juvederm Voluma.

    zip "07 Raw Hydrostatic pressure response.zip". Pressure swelling. This is only used to estimate the concentration of the polymer in certain rheological or compression experiments where the material was kept hydrated by constant pressure rather than in a closed container.

    Each raw data file is a zip of numerous files. At the root of each zip, there is a single folder, so that all the zips can be unzipped at the same location without intermingling.

    Data evaluation

    zip "08 Evaluation.zip" contains processed data (.rda files). The aim of these files is to directly produce the figures of the manuscript "An injectable meta-biomaterial" with the R scripts contained in "09 R Figure plotting.zip" below. The data in "08 Evaluation.zip" is provided exclusively in .rda format (R data files) so that it no time-consuming file reading, nor the installation of external libraries should be necessary.

    R script files

    zip "09 R Figure plotting.zip" contains R scripts that can be used to reproduce the data figures in the Manuscript "An injectable meta-biomaterial".

    zip "10 R libraries.zip" contains R libraries that we used for processing. These R libraries are only required when reproducing the entire deta treatment, but not for simple graphing. That is, they are required to run the scripts in "11 R Raw data reading.zip" but not in "11 R Graphing.zip"

    zip "11 R Raw data reading.zip" contains the R scripts that we used to read the raw data files and make the data accessible (or summarize, particularly for the simulation data) in .rda files. The output of these files is stored in the "08 Evaluation.zip" file. To run the R scripts in "11 Raw data reading.zip", the libraries in "10 R libraries.zip" are generally required, in addition to general R libraries stated in the script. For simole replotting of the figures and re-running the statistical tests, this is not necessary. Also, the scripts in 11 R Raw data reading.zip" read and write files on disk, so they must be individually configured by the user to work. This is not the case with the scripts in "09 R Figure plotting.zip", which run on data loaded into memory.

    Python module

    zip "12 particleShear.zip" contains the installable Python module used to generate the simulation data (i.e. the data now stored in "01 Raw Simulation.zip").

  18. Z

    Data from: A dataset to model Levantine landcover and land-use change...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10396147
    Explore at:
    Dataset updated
    Dec 16, 2023
    Dataset authored and provided by
    Kempf, Michael
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset is the repository for the following paper submitted to Data in Brief:

    Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

    The Data in Brief article contains the supplement information and is the related data paper to:

    Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

    Description/abstract

    The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

    Folder structure

    The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

    “code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

    “MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

    “mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

    “yield_productivity” contains .csv files of yield information for all countries listed above.

    “population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

    “GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

    “built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

    Code structure

    1_MODIS_NDVI_hdf_file_extraction.R

    This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.

    2_MERGE_MODIS_tiles.R

    In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").

    3_CROP_MODIS_merged_tiles.R

    Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS. The repository provides the already clipped and merged NDVI datasets.

    4_TREND_analysis_NDVI.R

    Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.

    5_BUILT_UP_change_raster.R

    Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.

    6_POPULATION_numbers_plot.R

    For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.

    7_YIELD_plot.R

    In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.

    8_GLDAS_read_extract_trend

    The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection). Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.

    (9_workflow_diagramme) this simple code can be used to plot a workflow diagram and is detached from the actual analysis.

    Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization, Supervision, Project administration, and Funding acquisition: Michael

  19. d

    UAE6 - Wind Tunnel Tests Data - UAE6 - Sequence R - Raw Data

    • catalog.data.gov
    • data.openei.org
    • +1more
    Updated Aug 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wind Energy Technologies Office (WETO) (2021). UAE6 - Wind Tunnel Tests Data - UAE6 - Sequence R - Raw Data [Dataset]. https://catalog.data.gov/dataset/uae6-wind-tunnel-tests-data-uae6-sequence-k-raw-data
    Explore at:
    Dataset updated
    Aug 7, 2021
    Dataset provided by
    Wind Energy Technologies Office (WETO)
    Description

    Overview Sequence R: Step AOA, No Probes (P) This sequence was designed to quantify the effect of the five-hole probes on the 3-D blade static angle-of-attack response in the presence of rotational influences by repeating Sequence K without five-hole probes. This test sequence used an upwind, rigid turbine with a 0° cone angle. The wind speeds ranged from 6 m/s to 20 m/s, and data were collected at yaw angles of 0° and 30°. The rotor rotated at 72 RPM. Blade pressure measurements were collected. The five-hole probes were removed and the plugs were installed. Plastic tape 0.03-mm-thick was used to smooth the interface between the plugs and the blade. The teeter dampers were replaced with rigid links, and these two channels were flagged as not applicable by setting the measured values in the data file to –99999.99 Nm. The teeter link load cell was pre-tensioned to 40,000 N. During post-processing, the probe channels were set to read –99999.99. The blade pitch angle ramped continuously at 0.18°/s over a wide range of increasing and decreasing pitch angles. A step sequence was also performed. The blade pitch was stepped 5°; the flow was allowed to stabilize; and the pitch angle was held for 5 seconds. Then the pitch angle step was repeated. Again, a wide range of pitch angles was obtained, both increasing and decreasing. The file lengths for this sequence varied from 96 seconds to 6 minutes, depending on the pitch angle range. Some short points were collected at 0° yaw and 3° pitch to ascertain the functionality of the instrumentation and repeatability over time. The file name convention used the initial letter R, followed by two digits specifying wind speed, followed by two digits for yaw angle, followed by RU, RD, or ST, followed by the repetition digit. The angle of attack motion was differentiated by RU (ramp up), RD (ramp down), and ST (step down, then step up). This sequence is related to Sequences K and L. Data Details File naming information can be found in the attached Word document "Sequence R Filename Key", copied from the Phase VI Test Report.

  20. m

    A time-sorting pitfall trap and temperature datalogger for the sampling of...

    • data.mendeley.com
    Updated Feb 10, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marshall McMunn (2017). A time-sorting pitfall trap and temperature datalogger for the sampling of surface-active arthropods - Supplemental Files [Dataset]. http://doi.org/10.17632/pn2k2bty4r.1
    Explore at:
    Dataset updated
    Feb 10, 2017
    Authors
    Marshall McMunn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplemental file list “A time-sorting pitfall trap and temperature datalogger for the sampling of surface-active arthropods.”

    Parts List Epitfall_partsList.xslx – spreadsheet of parts needed for construction, prices, and vendors

    CAD Epitfall_sampleWheel2d.dwg – 2D CAD file of sampling wheel Epitfall_sampleWheel2d.pdf - PDF version of 2D CAD file of sampling wheel Epitfall_sampleWheel3d.dwg - 3D CAD file of sampling wheel Epitfall_sampleWheel3d.stl – File for 3D printing of sampling wheel

    Example Data TRAP2.TXT – example data created by pitfall trap TRAP5.TXT – example data created by pitfall trap community_matrix_EMPTY.csv – empty matrix with rows of sampled time intervals with unique sample ID codes. This file is generated by “Epitfall_dataPull.R” community_matrix_FULL.csv – same as above, but with ant identities and abundances entered.

    Software Epitfall_24hourlySamples.ino – Arduino script to delay start time by 1 day, collect 24 hourly samples, with temperature measurements every 5 minutes during sample collection Epitfall_dataPull.R – R script to read data files from trap, create summaries of each sampling interval, and create an empty spreadsheet with rows of unique sample ID’s in which to enter arthropod abundance data.

    Wiring Epitfall_wiringDiagram.fzz – Fritzing file of wiring schematic Epitfall_wiringDiagram.png – image of wiring schematic

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788
Organization logo

Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem

Related Article
Explore at:
bin, application/gzip, zip, text/x-pythonAvailable download formats
Dataset updated
Aug 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb
License

https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

Description
Replication pack, FSE2018 submission #164:
------------------------------------------
**Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
A Case Study of the PyPI Ecosystem

**Note:** link to data artifacts is already included in the paper. 
Link to the code will be included in the Camera Ready version as well.


Content description
===================

- **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
 described below
- **settings.py** - settings template for the code archive.
- **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
 This dataset only includes stats aggregated by the ecosystem (PyPI)
- **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
 statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
 themselves, which take around 2TB.
- **build_model.r, helpers.r** - R files to process the survival data 
  (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
  `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
  **dataset_full_Jan_2018.tgz**)
- **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
- LICENSE - text of GPL v3, under which this dataset is published
- INSTALL.md - replication guide (~2 pages)
Replication guide
=================

Step 0 - prerequisites
----------------------

- Unix-compatible OS (Linux or OS X)
- Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
- R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)

Depending on detalization level (see Step 2 for more details):
- up to 2Tb of disk space (see Step 2 detalization levels)
- at least 16Gb of RAM (64 preferable)
- few hours to few month of processing time

Step 1 - software
----------------

- unpack **ghd-0.1.0.zip**, or clone from gitlab:

   git clone https://gitlab.com/user2589/ghd.git
   git checkout 0.1.0
 
 `cd` into the extracted folder. 
 All commands below assume it as a current directory.
  
- copy `settings.py` into the extracted folder. Edit the file:
  * set `DATASET_PATH` to some newly created folder path
  * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
- install docker. For Ubuntu Linux, the command is 
  `sudo apt-get install docker-compose`
- install libarchive and headers: `sudo apt-get install libarchive-dev`
- (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
 Without this dependency, you might get an error on the next step, 
 but it's safe to ignore.
- install Python libraries: `pip install --user -r requirements.txt` . 
- disable all APIs except GitHub (Bitbucket and Gitlab support were
 not yet implemented when this study was in progress): edit
 `scraper/init.py`, comment out everything except GitHub support
 in `PROVIDERS`.

Step 2 - obtaining the dataset
-----------------------------

The ultimate goal of this step is to get output of the Python function 
`common.utils.survival_data()` and save it into a CSV file:

  # copy and paste into a Python console
  from common import utils
  survival_data = utils.survival_data('pypi', '2008', smoothing=6)
  survival_data.to_csv('survival_data.csv')

Since full replication will take several months, here are some ways to speedup
the process:

####Option 2.a, difficulty level: easiest

Just use the precomputed data. Step 1 is not necessary under this scenario.

- extract **dataset_minimal_Jan_2018.zip**
- get `survival_data.csv`, go to the next step

####Option 2.b, difficulty level: easy

Use precomputed longitudinal feature values to build the final table.
The whole process will take 15..30 minutes.

- create a folder `
Search
Clear search
Close search
Google apps
Main menu