90 datasets found
  1. A Baseflow Filter for Hydrologic Models in R

    • catalog.data.gov
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). A Baseflow Filter for Hydrologic Models in R [Dataset]. https://catalog.data.gov/dataset/a-baseflow-filter-for-hydrologic-models-in-r-41440
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    A Baseflow Filter for Hydrologic Models in R Resources in this dataset:Resource Title: A Baseflow Filter for Hydrologic Models in R. File Name: Web Page, url: https://www.ars.usda.gov/research/software/download/?softwareid=383&modecode=20-72-05-00 download page

  2. Z

    Data from: Dataset from : Browsing is a strong filter for savanna tree...

    • data.niaid.nih.gov
    Updated Oct 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archibald, Sally (2021). Dataset from : Browsing is a strong filter for savanna tree seedlings in their first growing season [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4972083
    Explore at:
    Dataset updated
    Oct 1, 2021
    Dataset provided by
    Archibald, Sally
    Wayne Twine
    Craddock Mthabini
    Nicola Stevens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data presented here were used to produce the following paper:

    Archibald, Twine, Mthabini, Stevens (2021) Browsing is a strong filter for savanna tree seedlings in their first growing season. J. Ecology.

    The project under which these data were collected is: Mechanisms Controlling Species Limits in a Changing World. NRF/SASSCAL Grant number 118588

    For information on the data or analysis please contact Sally Archibald: sally.archibald@wits.ac.za

    Description of file(s):

    File 1: cleanedData_forAnalysis.csv (required to run the R code: "finalAnalysis_PostClipResponses_Feb2021_requires_cleanData_forAnalysis_.R"

    The data represent monthly survival and growth data for ~740 seedlings from 10 species under various levels of clipping.

    The data consist of one .csv file with the following column names:

    treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes)

    File 2: Herbivory_SurvivalEndofSeason_march2017.csv (required to run the R code: "FinalAnalysisResultsSurvival_requires_Herbivory_SurvivalEndofSeason_march2017.R"

    The data consist of one .csv file with the following column names:

    treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes) genus Genus MAR Mean Annual Rainfall for that Species distribution (mm) rainclass High/medium/low

    File 3: allModelParameters_byAge.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"

    Consists of a .csv file with the following column headings

    Age.of.plant Age in days species_code Species pred_SD_mm Predicted stem diameter in mm pred_SD_up top 75th quantile of stem diameter in mm pred_SD_low bottom 25th quantile of stem diameter in mm treatdate date when clipped pred_surv Predicted survival probability pred_surv_low Predicted 25th quantile survival probability pred_surv_high Predicted 75th quantile survival probability species_code species code Bite.probability Daily probability of being eaten max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species duiker_sd standard deviation of bite diameter for a duiker for this species max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species kudu_sd standard deviation of bite diameter for a kudu for this species mean_bite_diam_duiker_mm mean etc duiker_mean_sd standard devaition etc mean_bite_diameter_kudu_mm mean etc kudu_mean_sd standard deviation etc genus genus rainclass low/med/high

    File 4: EatProbParameters_June2020.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"

    Consists of a .csv file with the following column headings

    shtspec species name species_code species code genus genus rainclass low/medium/high seed mass mass of seed (g per 1000seeds)
    Surv_intercept coefficient of the model predicting survival from age of clip for this species Surv_slope coefficient of the model predicting survival from age of clip for this species GR_intercept coefficient of the model predicting stem diameter from seedling age for this species GR_slope coefficient of the model predicting stem diameter from seedling age for this species species_code species code max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species duiker_sd standard deviation of bite diameter for a duiker for this species max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species kudu_sd standard deviation of bite diameter for a kudu for this species mean_bite_diam_duiker_mm mean etc duiker_mean_sd standard devaition etc mean_bite_diameter_kudu_mm mean etc kudu_mean_sd standard deviation etc AgeAtEscape_duiker[t] age of plant when its stem diameter is larger than a mean duiker bite AgeAtEscape_duiker_min[t] age of plant when its stem diameter is larger than a min duiker bite AgeAtEscape_duiker_max[t] age of plant when its stem diameter is larger than a max duiker bite AgeAtEscape_kudu[t] age of plant when its stem diameter is larger than a mean kudu bite AgeAtEscape_kudu_min[t] age of plant when its stem diameter is larger than a min kudu bite AgeAtEscape_kudu_max[t] age of plant when its stem diameter is larger than a max kudu bite

  3. Meta-Analysis and modeling of vegetated filter removal of sediment using...

    • catalog.data.gov
    Updated Nov 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Meta-Analysis and modeling of vegetated filter removal of sediment using global dataset [Dataset]. https://catalog.data.gov/dataset/meta-analysis-and-modeling-of-vegetated-filter-removal-of-sediment-using-global-dataset
    Explore at:
    Dataset updated
    Nov 22, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Data on vegetated filter strips, sediment loading into and out of riparian corridors/buffers (VFS), removal efficiency of sediment, meta-analysis of removal efficiencies, dimensional analysis of predictor variables, and regression modeling of VFS removal efficiencies. This dataset is associated with the following publication: Ramesh, R., L. Kalin, M. Hantush, and A. Chaudhary. A secondary assessment of sediment trapping effectiveness by vegetated buffers. ECOLOGICAL ENGINEERING. Elsevier Science Ltd, New York, NY, USA, 159: 106094, (2021).

  4. Filter Import Data | Soluciones En Logistica Rcl S De R

    • seair.co.in
    Updated Jan 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2025). Filter Import Data | Soluciones En Logistica Rcl S De R [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Jan 29, 2025
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    United States
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  5. h

    nemotron-en-on-filter

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcus Cedric R. Idia, nemotron-en-on-filter [Dataset]. https://huggingface.co/datasets/marcuscedricridia/nemotron-en-on-filter
    Explore at:
    Authors
    Marcus Cedric R. Idia
    Description

    post train nemotron dataset filtered for english only and reasoning on entries

  6. d

    Data from: Parentage and relatedness reconstruction in Pinus sylvestris...

    • datadryad.org
    • zenodo.org
    zip
    Updated Mar 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Hall; Wei Zhao; Ulfstand Wennström; Bengt Andersson Gull; Xiao-Ru Wang (2020). Parentage and relatedness reconstruction in Pinus sylvestris using genotyping by sequencing [Dataset]. http://doi.org/10.5061/dryad.h44j0zpg5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 4, 2020
    Dataset provided by
    Dryad
    Authors
    David Hall; Wei Zhao; Ulfstand Wennström; Bengt Andersson Gull; Xiao-Ru Wang
    Time period covered
    2020
    Description

    The dataset contains several files:

    Vasthus_001_m06.vcf.gz: VCF-file which has been slightly pre filtered to reduce size, see vcf_filter.txt

    Vasterhus.txt: Sample names of samples in the study

    refkeep.txt: Sample names of the samples used as allele frequency reference

    Parental_ID.txt: The registered names for the parental trees in the study

    vcf_filter.txt: Description on how to filter the VCF file according to the manuscript

    Rfiles Dataset-2.RData: The data set resulting from working through the vcf filtering and rhe R-script files

    Rcode_related.R: R-script for relatedness estimation using the R-package 'related'

    view_relationships.R: Utilizing the result from the previous R-script to visualize pairwise relatedness and reconstructing some figures from the manuscript

  7. C

    Theft Filter

    • data.cityofchicago.org
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chicago Police Department (2025). Theft Filter [Dataset]. https://data.cityofchicago.org/Public-Safety/Theft-Filter/aqvv-ggim
    Explore at:
    csv, tsv, xml, application/rdfxml, application/rssxml, application/geo+json, kml, kmzAvailable download formats
    Dataset updated
    Jul 4, 2025
    Authors
    Chicago Police Department
    Description

    This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e

  8. Small form factor filter based sampling data - Ambient and chamber

    • catalog.data.gov
    Updated Mar 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). Small form factor filter based sampling data - Ambient and chamber [Dataset]. https://catalog.data.gov/dataset/small-form-factor-filter-based-sampling-data-ambient-and-chamber
    Explore at:
    Dataset updated
    Mar 12, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Small form factor filter based PM collection data from both co-located ambient sampling and chamber studies conducted on controlled smoke environments. Metadata is contained within files. This dataset is associated with the following publication: Krug, J.D., R. Long, M. Colon, A. Habel, S. Urbanski, and M. Landis. Evaluation of Small Form Factor, Filter-Based PM2.5 Samplers for Temporary Non-Regulatory Monitoring During Wildland Fire Smoke Events. ATMOSPHERIC ENVIRONMENT. Elsevier Science Ltd, New York, NY, USA, 265: 0, (2021).

  9. Datasets and Supporting Materials for the IPIN 2016 Competition Track 3...

    • zenodo.org
    • producciocientifica.uv.es
    • +1more
    zip
    Updated Apr 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Ramón Jimenez Ruiz; Antonio Ramón Jimenez Ruiz; Germán Martín Mendoza-Silva; Germán Martín Mendoza-Silva; Raul Montoliu; Raul Montoliu; Fernando Seco; Fernando Seco; Joaquín Torres-Sospedra; Joaquín Torres-Sospedra (2020). Datasets and Supporting Materials for the IPIN 2016 Competition Track 3 (Smartphone-based, off-site) [Dataset]. http://doi.org/10.5281/zenodo.2791530
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 30, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Antonio Ramón Jimenez Ruiz; Antonio Ramón Jimenez Ruiz; Germán Martín Mendoza-Silva; Germán Martín Mendoza-Silva; Raul Montoliu; Raul Montoliu; Fernando Seco; Fernando Seco; Joaquín Torres-Sospedra; Joaquín Torres-Sospedra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This package contains the datasets and supplementary materials used in the IPIN 2016 Competition (Alcalá, Spain).

    Contents:

    1. Track3_LogfileDescription_and_SupplementaryMaterial.pdf: Description of the logfiles and supplemental materials.
    2. Track3_TechnicalAnnex.pdf: Technical annex describing the competition
    3. 01-Logfiles: This folder contains a subfolder with the 17 training logfiles and a subfolder with the 9 blind evaluation logfiles as provided to competitors.
    4. 02-Supplementary_Materials: This folder contains the Matlab/octave parser, the raster maps and the visualization of the training routes.
    5. 03-Evaluation: This folder contains the scripts used to calculate the competition metric, the 75th percentile on the 578 evaluation points. The ground truth is also provided in MatLab format and as a CSV file. Since the results must be provided with a 2Hz freq. starting from apptimestamp 0, the GT includes the closest timestamp matching the timing provided by competitors.

    Please, cite the following works when using the datasets included in this package:

    • Torres-Sospedra, J.; Jiménez, A.; Knauth, A.; Moreira, A.; Beer, Y.; Fetzer, T.; Ta, V.-C.; Montoliu, R.; Seco, F.; Mendoza, G.; Belmonte, O.; Koukofikis, A.; Nicolau, M.J.; Costa, A.; Meneses, F.; Ebner, F.; Deinzer, F.; Vaufreydaz, D.; Dao, T.-K.; and Castelli, E. The Smartphone-based Off-Line Indoor Location Competition at IPIN 2016: Analysis and Future work Sensors Vol. 17(3), 2017. http://dx.doi.org/10.3390/s17030557
    • Jimenez, A.R.; Mendoza-Silva, G.M.; Montoliu, R.; Seco, F.; Torres-Sospedra, J. Datasets and Supporting Materials for the IPIN 2016 Competition Track 3 (Smartphone-based, off-site). http://dx.doi.org/10.5281/zenodo.2791530

    Additional information can be found at:

    For any further questions about the database and this competition track, please contact:

    • Joaquín Torres (jtorres@uji.es) Institute of New Imaging Technologies, Universitat Jaume I, Spain.
    • Antonio R. Jiménez (antonio.jimenez@csic.es) Center of Automation and Robotics (CAR)-CSIC/UPM, Spain.

  10. Simulation data and code

    • figshare.com
    zip
    Updated Feb 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlotte de Vries; E Yagmur Erten (2022). Simulation data and code [Dataset]. http://doi.org/10.6084/m9.figshare.19232535.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 24, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Charlotte de Vries; E Yagmur Erten
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description
    • PF_simulation_data.zipcontains Simulation data to create figure 2 of de Vries, Erten and Kokko- Code_PF.zip contains C++ code to create the data used to create figure 2 (see PF_simulation_data.zip for the datafiles produced), and it also contains the R script to create figure 2 from the data (Figure2_cloud_25.R). All code files were created by Pen, I., & Flatt, T. (2021). Asymmetry, division of labour and the evolution of ageing in multicellular organisms. Philosophical Transactions of the Royal Society B, 376(1823), 20190729. C++ code is slightly adjusted to change output. Note that the R script takes a long time to run (multiple days on our laptops), and uses a lot of swap memory, we advice running it on a server. Alternatively, you can edit the code to use less than the last 25 days bychanging this line: ddead% filter(t>4975)to for example ddead% filter(t>4998)to use the last 2 time steps only. However, note that therewill be insufficient data at high ages to estimate mortality rates.
  11. Case Study: Cyclist

    • kaggle.com
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PatrickRCampbell (2021). Case Study: Cyclist [Dataset]. https://www.kaggle.com/patrickrcampbell/case-study-cyclist/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    PatrickRCampbell
    Description

    Phase 1: ASK

    Key Objectives:

    1. Business Task * Cyclist is looking to increase their earnings, and wants to know if creating a social media campaign can influence "Casual" users to become "Annual" members.

    2. Key Stakeholders: * The main stakeholder from Cyclist is Lily Moreno, whom is the Director of Marketing and responsible for the development of campaigns and initiatives to promote their bike-share program. The other teams involved with this project will be Marketing & Analytics, and the Executive Team.

    3. Business Task: * Comparing the two kinds of users and defining how they use the platform, what variables they have in common, what variables are different, and how can they get Casual users to become Annual members

    Phase 2: PREPARE:

    Key Objectives:

    1. Determine Data Credibility * Cyclist provided data from years 2013-2021 (through March 2021), all of which is first-hand data collected by the company.

    2. Sort & Filter Data: * The stakeholders want to know how the current users are using their service, so I am focusing on using the data from 2020-2021 since this is the most relevant period of time to answer the business task.

    #Installing packages
    install.packages("tidyverse", repos = "http://cran.us.r-project.org")
    install.packages("readr", repos = "http://cran.us.r-project.org")
    install.packages("janitor", repos = "http://cran.us.r-project.org")
    install.packages("geosphere", repos = "http://cran.us.r-project.org")
    install.packages("gridExtra", repos = "http://cran.us.r-project.org")
    
    library(tidyverse)
    library(readr)
    library(janitor)
    library(geosphere)
    library(gridExtra)
    
    #Importing data & verifying the information within the dataset
    all_tripdata_clean <- read.csv("/Data Projects/cyclist/cyclist_data_cleaned.csv")
    
    glimpse(all_tripdata_clean)
    
    summary(all_tripdata_clean)
    
    

    Phase 3: PROCESS

    Key Objectives:

    1. Cleaning Data & Preparing for Analysis: * Once the data has been placed into one dataset, and checked for errors, we began cleaning the data. * Eliminating data that correlates to the company servicing the bikes, and any ride with a traveled distance of zero. * New columns will be added to assist in the analysis, and to provide accurate assessments of whom is using the bikes.

    #Eliminating any data that represents the company performing maintenance, and trips without any measureable distance
    all_tripdata_clean <- all_tripdata_clean[!(all_tripdata_clean$start_station_name == "HQ QR" | all_tripdata_clean$ride_length<0),] 
    
    #Creating columns for the individual date components (days_of_week should be run last)
    all_tripdata_clean$day_of_week <- format(as.Date(all_tripdata_clean$date), "%A")
    all_tripdata_clean$date <- as.Date(all_tripdata_clean$started_at)
    all_tripdata_clean$day <- format(as.Date(all_tripdata_clean$date), "%d")
    all_tripdata_clean$month <- format(as.Date(all_tripdata_clean$date), "%m")
    all_tripdata_clean$year <- format(as.Date(all_tripdata_clean$date), "%Y")
    
    

    ** Now I will begin calculating the length of rides being taken, distance traveled, and the mean amount of time & distance.**

    #Calculating the ride length in miles & minutes
    all_tripdata_clean$ride_length <- difftime(all_tripdata_clean$ended_at,all_tripdata_clean$started_at,units = "mins")
    
    all_tripdata_clean$ride_distance <- distGeo(matrix(c(all_tripdata_clean$start_lng, all_tripdata_clean$start_lat), ncol = 2), matrix(c(all_tripdata_clean$end_lng, all_tripdata_clean$end_lat), ncol = 2))
    all_tripdata_clean$ride_distance = all_tripdata_clean$ride_distance/1609.34 #converting to miles
    
    #Calculating the mean time and distance based on the user groups
    userType_means <- all_tripdata_clean %>% group_by(member_casual) %>% summarise(mean_time = mean(ride_length))
    
    
    userType_means <- all_tripdata_clean %>% 
     group_by(member_casual) %>% 
     summarise(mean_time = mean(ride_length),mean_distance = mean(ride_distance))
    

    Adding in calculations that will differentiate between bike types and which type of user is using each specific bike type.

    #Calculations
    
    with_bike_type <- all_tripdata_clean %>% filter(rideable_type=="classic_bike" | rideable_type=="electric_bike")
    
    with_bike_type %>%
     mutate(weekday = wday(started_at, label = TRUE)) %>% 
     group_by(member_casual,rideable_type,weekday) %>%
     summarise(totals=n(), .groups="drop") %>%
     
    with_bike_type %>%
     group_by(member_casual,rideable_type) %>%
     summarise(totals=n(), .groups="drop") %>%
    
     #Calculating the ride differential
     
     all_tripdata_clean %>% 
     mutate(weekday = wkday(started_at, label = TRUE)) %>% 
     group_by(member_casual, weekday) %>% 
     summarise(number_of_rides = n()
          ,average_duration = mean(ride_length),.groups = 'drop') %>% 
     arrange(me...
    
  12. n

    Processed data for the analysis of human mobility changes from COVID-19...

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jin Bai; Michael Caslin; Madhusudan Katti (2024). Processed data for the analysis of human mobility changes from COVID-19 lockdown on bird occupancy in North Carolina, USA [Dataset]. http://doi.org/10.5061/dryad.gb5mkkwxr
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    North Carolina State University
    Authors
    Jin Bai; Michael Caslin; Madhusudan Katti
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    United States, North Carolina
    Description

    The COVID-19 pandemic lockdown worldwide provided a unique research opportunity for ecologists to investigate the human-wildlife relationship under abrupt changes in human mobility, also known as Anthropause. Here we chose 15 common non-migratory bird species with different levels of synanthrope and we aimed to compare how human mobility changes could influence the occupancy of fully synanthropic species such as House Sparrow (Passer domesticus) versus casual to tangential synanthropic species such as White-breasted Nuthatch (Sitta carolinensis). We extracted data from the eBird citizen science project during three study periods in the spring and summer of 2020 when human mobility changed unevenly across different counties in North Carolina. We used the COVID-19 Community Mobility reports from Google to examine how community mobility changes towards workplaces, an indicator of overall human movements at the county level, could influence bird occupancy. Methods The data source we used for bird data was eBird, a global citizen science project run by the Cornell Lab of Ornithology. We used the COVID-19 Community Mobility Reports by Google to represent the pause of human activities at the county level in North Carolina. These data are publicly available and were last updated on 10/15/2022. We used forest land cover data from NC One Map that has a high resolution (1-meter pixel) raster data from 2016 imagery to represent canopy cover at each eBird checklist location. We also used the raster data of the 2019 National Land Cover Database to represent the degree of development/impervious surface at each eBird checklist location. All three measurements were used for the highest resolution that was available to use. We downloaded the eBird Basic Dataset (EBD) that contains the 15 study species from February to June 2020. We also downloaded the sampling event data that contains the checklist efforts information. First, we used the R package Auk (version 0.6.0) in R (version 4.2.1) to filter data in the following conditions: (1) Date: 02/19/2020 - 03/29/2020; (2) Checklist type: stationary; (3) Complete checklist; (4) Time: 07:00 am - 06:00 pm; (5) Checklist duration: 5-20 mins; (6) Location: North Carolina. After filtering data, we used the zero fill function from Auk to create detection/non-detection data of each study species in NC. Then we used the repeat visits filter from Auk to filter eBird checklist locations where at least 2 checklists (max 10 checklists) have been submitted to the same location by the same observer, allowing us to create a hierarchical data frame where both detection and state process can be analyzed using Occupancy Modeling. This data frame was in a matrix format that each row represents a sampling location and the columns represent the detection and non-detection of the 2-10 repeat sampling events. For the Google Community Mobility data, we chose the “Workplaces” categoriy of mobility data to analyze the Anthropause effect because it was highly relevant to the pause of human activities in urban areas. The mobility data from Google is a percentage change compared to a baseline for each day. A baseline day represents a normal value for the day of the week from the 5-week period (01/03/2020-02/06/2020). For example, a mobility value of -30.0 for Wake County on Apr 15, 2020, means the overall mobility in Wake County on that day decreased by 30% compared to the baseline day a few months ago. Because the eBird data we used covers a wider range of dates rather than each day, we took the average value of mobility before lockdown, during lockdown, and after lockdown in each county in NC. For the environmental variables, we calculated the values in ArcGIS Pro (version 3.1.0). We created a 200 m buffer at each eligible eBird checklist location. For the forest cover data, we used “Zonal Statistics as Table” to extract the percentage of forest cover at each checklist location’s 200-meter circular buffer. For the National Land Cover Database (NLCD) data, we combined low-intensity, medium-intensity, and high-intensity development as development covers and used “Summarize Within” to extract the percentage of development cover using the polygon version of NLCD. We used a correlation matrix of the three predictors (workplace mobility, percent forest cover, and percent development cover) and found no co-linearity. Thus, these three predictors plus the interaction between workplace mobility and percent development cover were the site covariates of the Occupancy Models. For the detection covariates, four predictors were considered including time of observation, checklist duration, number of observers, and workplace mobility. These detection covariates were also not highly correlated. We then merged all data into an unmarked data frame using the “unmarked” R package (version 1.2.5). The unmarked data frame has eBird sampling locations as sites (rows in the data frame) and repeat checklists at the same sampling locations as repeat visits (columns in the data frame).

  13. Data from: Comparison of capture and storage methods for aqueous macrobial...

    • zenodo.org
    • dataone.org
    • +1more
    txt
    Updated May 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johan Spens; Alice R. Evans; David Halfmaerten; Steen W. Knudsen; Mita E. Sengupta; Sarah S. T. Mak; Eva E. Sigsgaard; Micaela Hellström; Johan Spens; Alice R. Evans; David Halfmaerten; Steen W. Knudsen; Mita E. Sengupta; Sarah S. T. Mak; Eva E. Sigsgaard; Micaela Hellström (2022). Data from: Comparison of capture and storage methods for aqueous macrobial eDNA using an optimized extraction protocol: advantage of enclosed filter [Dataset]. http://doi.org/10.5061/dryad.p2q4r
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 29, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Johan Spens; Alice R. Evans; David Halfmaerten; Steen W. Knudsen; Mita E. Sengupta; Sarah S. T. Mak; Eva E. Sigsgaard; Micaela Hellström; Johan Spens; Alice R. Evans; David Halfmaerten; Steen W. Knudsen; Mita E. Sengupta; Sarah S. T. Mak; Eva E. Sigsgaard; Micaela Hellström
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Aqueous environmental DNA (eDNA) is an emerging efficient non-invasive tool for species inventory studies. To maximize performance of downstream quantitative PCR (qPCR) and next-generation sequencing (NGS) applications, quality and quantity of the starting material is crucial, calling for optimized capture, storage and extraction techniques of eDNA. Previous comparative studies for eDNA capture/storage have tested precipitation and 'open' filters. However, practical 'enclosed' filters which reduce unnecessary handling have not been included. Here, we fill this gap by comparing a filter capsule (Sterivex-GP polyethersulfone, pore size 0·22 μm, hereafter called SX) with commonly used methods. Our experimental set-up, covering altogether 41 treatments combining capture by precipitation or filtration with different preservation techniques and storage times, sampled one single lake (and a fish-free control pond). We selected documented capture methods that have successfully targeted a wide range of fauna. The eDNA was extracted using an optimized protocol modified from the DNeasy® Blood & Tissue kit (Qiagen). We measured total eDNA concentrations and Cq-values (cycles used for DNA quantification by qPCR) to target specific mtDNA cytochrome b (cyt b) sequences in two local keystone fish species. SX yielded higher amounts of total eDNA along with lower Cq-values than polycarbonate track-etched filters (PCTE), glass fibre filters (GF) or ethanol precipitation (EP). SX also generated lower Cq-values than cellulose nitrate filters (CN) for one of the target species. DNA integrity of SX samples did not decrease significantly after 2 weeks of storage in contrast to GF and PCTE. Adding preservative before storage improved SX results. In conclusion, we recommend SX filters (originally designed for filtering micro-organisms) as an efficient capture method for sampling macrobial eDNA. Ethanol or Longmire's buffer preservation of SX immediately after filtration is recommended. Preserved SX capsules may be stored at room temperature for at least 2 weeks without significant degradation. Reduced handling and less exposure to outside stress compared with other filters may contribute to better eDNA results. SX capsules are easily transported and enable eDNA sampling in remote and harsh field conditions as samples can be filtered/preserved on site.

  14. u

    Development of Interactive Data Visualization Tool for the Predictive...

    • open.library.ubc.ca
    • borealisdata.ca
    • +1more
    Updated Apr 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chan, Wai Chung Wilson (2022). Development of Interactive Data Visualization Tool for the Predictive Ecosystem Mapping Project [Dataset]. http://doi.org/10.14288/1.0412884
    Explore at:
    Dataset updated
    Apr 19, 2022
    Authors
    Chan, Wai Chung Wilson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 14, 2022
    Area covered
    Babine Mountains Provincial Park, British Columbia
    Description

    Biogeoclimatic Ecosystem Classification (BEC) system is the ecosystem classification adopted in the forest management within British Columbia based on vegetation, soil, and climate characteristics whereas Site Series is the smallest unit of the system. The Ministry of Forests, Lands, Natural Resource Operations and Rural Development held under the Government of British Columbia (“the Ministry”) developed a web-based tool known as BEC Map for maintaining and sharing the information of the BEC system, but the Site Series information was not included in the tool due to its quantity and complexity. In order to allow users to explore and interact with the information, this project aimed to develop a web-based tool with high data quality and flexibility to users for the Site Series classes using the “Shiny” and “Leaflet” packages in R. The project started with data classification and pre-processing of the raster images and attribute tables through identification of client requirements, spatial database design and data cleaning. After data transformation was conducted, spatial relationships among these data were developed for code development. The code development included the setting-up of web map and interactive tools for facilitating user friendliness and flexibility. The codes were further tested and enhanced to meet the requirements of the Ministry. The web-based tool provided an efficient and effective platform to present the complicated Site Series features with the use of Web Mapping System (WMS) in map rendering. Four interactive tools were developed to allow users to examine and interact with the information. The study also found that the mode filter performed well in data preservation and noise minimization but suffered from long processing time and creation of tiny sliver polygons.

  15. f

    Open data: Visual load effects on the auditory steady-state responses to...

    • su.figshare.com
    • researchdata.se
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefan Wiens; Malina Szychowska (2023). Open data: Visual load effects on the auditory steady-state responses to 20-, 40-, and 80-Hz amplitude-modulated tones [Dataset]. http://doi.org/10.17045/sthlmuni.12582002.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Stockholm University
    Authors
    Stefan Wiens; Malina Szychowska
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The main results file are saved separately:- ASSR2.html: R output of the main analyses (N = 33)- ASSR2_subset.html: R output of the main analyses for the smaller sample (N = 25)FIGSHARE METADATACategories- Biological psychology- Neuroscience and physiological psychology- Sensory processes, perception, and performanceKeywords- crossmodal attention- electroencephalography (EEG)- early-filter theory- task difficulty- envelope following responseReferences- https://doi.org/10.17605/OSF.IO/6FHR8- https://github.com/stamnosslin/mn- https://doi.org/10.17045/sthlmuni.4981154.v3- https://biosemi.com/- https://www.python.org/- https://mne.tools/stable/index.html#- https://www.r-project.org/- https://rstudio.com/products/rstudio/GENERAL INFORMATION1. Title of Dataset:Open data: Visual load effects on the auditory steady-state responses to 20-, 40-, and 80-Hz amplitude-modulated tones2. Author Information A. Principal Investigator Contact Information Name: Stefan Wiens Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.su.se/profiles/swiens-1.184142 Email: sws@psychology.su.se B. Associate or Co-investigator Contact Information Name: Malina Szychowska Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.researchgate.net/profile/Malina_Szychowska Email: malina.szychowska@psychology.su.se3. Date of data collection: Subjects (N = 33) were tested between 2019-11-15 and 2020-03-12.4. Geographic location of data collection: Department of Psychology, Stockholm, Sweden5. Information about funding sources that supported the collection of the data:Swedish Research Council (Vetenskapsrådet) 2015-01181SHARING/ACCESS INFORMATION1. Licenses/restrictions placed on the data: CC BY 4.02. Links to publications that cite or use the data: Szychowska M., & Wiens S. (2020). Visual load effects on the auditory steady-state responses to 20-, 40-, and 80-Hz amplitude-modulated tones. Submitted manuscript.The study was preregistered:https://doi.org/10.17605/OSF.IO/6FHR83. Links to other publicly accessible locations of the data: N/A4. Links/relationships to ancillary data sets: N/A5. Was data derived from another source? No 6. Recommended citation for this dataset: Wiens, S., & Szychowska M. (2020). Open data: Visual load effects on the auditory steady-state responses to 20-, 40-, and 80-Hz amplitude-modulated tones. Stockholm: Stockholm University. https://doi.org/10.17045/sthlmuni.12582002DATA & FILE OVERVIEWFile List:The files contain the raw data, scripts, and results of main and supplementary analyses of an electroencephalography (EEG) study. Links to the hardware and software are provided under methodological information.ASSR2_experiment_scripts.zip: contains the Python files to run the experiment. ASSR2_rawdata.zip: contains raw datafiles for each subject- data_EEG: EEG data in bdf format (generated by Biosemi)- data_log: logfiles of the EEG session (generated by Python)ASSR2_EEG_scripts.zip: Python-MNE scripts to process the EEG dataASSR2_EEG_preprocessed_data.zip: EEG data in fif format after preprocessing with Python-MNE scriptsASSR2_R_scripts.zip: R scripts to analyze the data together with the main datafiles. The main files in the folder are: - ASSR2.html: R output of the main analyses- ASSR2_subset.html: R output of the main analyses but after excluding eight subjects who were recorded as pilots before preregistering the studyASSR2_results.zip: contains all figures and tables that are created by Python-MNE and R.METHODOLOGICAL INFORMATION1. Description of methods used for collection/generation of data:The auditory stimuli were amplitude-modulated tones with a carrier frequency (fc) of 500 Hz and modulation frequencies (fm) of 20.48 Hz, 40.96 Hz, or 81.92 Hz. The experiment was programmed in python: https://www.python.org/ and used extra functions from here: https://github.com/stamnosslin/mnThe EEG data were recorded with an Active Two BioSemi system (BioSemi, Amsterdam, Netherlands; www.biosemi.com) and saved in .bdf format.For more information, see linked publication.2. Methods for processing the data:We conducted frequency analyses and computed event-related potentials. See linked publication3. Instrument- or software-specific information needed to interpret the data:MNE-Python (Gramfort A., et al., 2013): https://mne.tools/stable/index.html#Rstudio used with R (R Core Team, 2020): https://rstudio.com/products/rstudio/Wiens, S. (2017). Aladins Bayes Factor in R (Version 3). https://www.doi.org/10.17045/sthlmuni.4981154.v34. Standards and calibration information, if appropriate:For information, see linked publication.5. Environmental/experimental conditions:For information, see linked publication.6. Describe any quality-assurance procedures performed on the data:For information, see linked publication.7. People involved with sample collection, processing, analysis and/or submission:- Data collection: Malina Szychowska with assistance from Jenny Arctaedius.- Data processing, analysis, and submission: Malina Szychowska and Stefan WiensDATA-SPECIFIC INFORMATION:All relevant information can be found in the MNE-Python and R scripts (in EEG_scripts and analysis_scripts folders) that process the raw data. For example, we added notes to explain what different variables mean.

  16. THEMIS-C: Probe Electric Field Instrument and Search Coil Magnetometer...

    • heliophysicsdata.gsfc.nasa.gov
    application/x-cdf +2
    Updated Jul 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angelopoulos, Vassilis; Bonnell, John, W.; Ergun, Robert, E.; Mozer, Forrest, S.; Roux, Alain (2023). THEMIS-C: Probe Electric Field Instrument and Search Coil Magnetometer Instrument, Digital Fields Board - digitally computed Filter Bank spectra and E12 peak and average in HF band (FBK). [Dataset]. http://doi.org/10.48322/ken1-sn21
    Explore at:
    csv, bin, application/x-cdfAvailable download formats
    Dataset updated
    Jul 30, 2023
    Dataset provided by
    NASAhttp://nasa.gov/
    Authors
    Angelopoulos, Vassilis; Bonnell, John, W.; Ergun, Robert, E.; Mozer, Forrest, S.; Roux, Alain
    License

    https://spdx.org/licenses/CC0-1.0https://spdx.org/licenses/CC0-1.0

    Variables measured
    thc_fbh, thc_fb_yaxis, thc_fbh_time, thc_fbk_fband, thc_fb_v1_time, thc_fb_v2_time, thc_fb_v3_time, thc_fb_v4_time, thc_fb_v5_time, thc_fb_v6_time, and 12 more
    Description

    The Filter Bank is part of the Digital fields board and provides band-pass filtering for EFI and SCM spectra as well as E12HF peak and average value calculations. The Filter Bank provides band-pass filtering for less computationally and power intensive spectra than the FFT would provide. The process is as follows: Signals are fed to the Filter Bank via a low-pass FIR filter with a cut-off frequency half that of the original signal maximum. The output is passed to the band-pass filters, is differenced from the original signal, then absolute value of the data is taken and averaged. The output from the low-pass filter is also sent to a second FIR filter with 2:1 decimation. This output is then fed back through the system. The process runs through 12 cascades for input at 8,192 samples/s and 13 for input at 16,384 samples/sec (EAC input only), reducing the signal and computing power by a factor 2 at each cascade. At each cascade a set of data is produced at a sampling frequency of 2^n from 2 Hz to the initial sampling frequency (frequency characteristics for each step are shown below in Table 1). The average from the Filter Bank is compressed to 8 bits with a pseudo-logarithmic encoder. The data is stored in sets of six frequency bins at 2.689 kHz, 572 Hz, 144.2 Hz, 36.2 Hz, 9.05 Hz, and 2.26 Hz. The average of the coupled E12HF signal and it's peak value are recorded over 62.5 ms windows (i.e. a 16 Hz sampling rate). Accumulation of values from signal 31.25 ms windows is performed externally. The analog signals fed into the FBK are E12DC and SCM1. Sensor and electronics design provided by UCB (J. W. Bonnell, F. S. Mozer), Digital Fields Board provided by LASP (R. Ergun), Search coil data provided by CETP (A. Roux). Table 1: Frequency Properties. Cascade Frequency content of Input Signal Low-pass Filter Cutoff Frequency Freuency Content of Low-pass Output Signal Filter Bank Frequency Band 0* 0 - 8 kHz 4 kHz 0 - 4 kHz 4 - 8 kHz 1 0 - 4 kHz 2 kHz 0 - 2 kHz 2 - 4 kHz 2 0 - 2 kHz 1 kHz 0 - 1 kHz 1 - 2 kHz 3 0 - 1 kHz 512 Hz 0 - 512 Hz 512 Hz - 1 kHz 4 0 - 512 Hz 256 Hz 0 - 256 Hz 256 - 512 Hz 5 0 - 256 Hz 128 Hz 0 - 128 Hz 128 - 256 Hz 6 0 - 128 Hz 64 Hz 0 - 64 Hz 64 - 128 Hz 7 0 - 64 Hz 32 Hz 0 - 32 Hz 32 - 64 Hz 8 0 - 32 Hz 16 Hz 0 - 16 Hz 16 - 32 Hz 9 0 - 16 Hz 8 Hz 0 - 8 Hz 8 - 16 Hz 10 0 - 8 Hz 4 Hz 0 - 4 Hz 4 - 8 Hz 11 0 - 4 Hz 2 Hz 0 - 2 Hz 2 - 4 Hz 12 0 - 2 Hz 1 Hz 0 - 1 Hz 1 - 2 Hz *Only available for 16,384 Hz sampling.

  17. R

    airGRdatassim: Ensemble-Based Data Assimilation in GR Hydrological Models. R...

    • entrepot.recherche.data.gouv.fr
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaia Piazzi; Gaia Piazzi; Olivier Delaigue; Olivier Delaigue (2025). airGRdatassim: Ensemble-Based Data Assimilation in GR Hydrological Models. R package version 0.1.4 [Dataset]. http://doi.org/10.15454/WEYYVZ
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Recherche Data Gouv
    Authors
    Gaia Piazzi; Gaia Piazzi; Olivier Delaigue; Olivier Delaigue
    License

    https://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.5/customlicense?persistentId=doi:10.15454/WEYYVZhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.5/customlicense?persistentId=doi:10.15454/WEYYVZ

    Description

    Add-on to the 'airGR' package which provides the tools to assimilate observed discharges in daily GR hydrological model. The package consists in two functions allowing to perform the assimilation of observed discharges via Ensemble Kalman filter or Particle filter.

  18. Market Basket Analysis

    • kaggle.com
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  19. d

    Data from: Variations in tree growth provide limited evidence of species...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Oct 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Looney; Wilfred Previant; Linda Nagel (2020). Variations in tree growth provide limited evidence of species mixture effects in Interior West U.S.A. mixed-conifer forests [Dataset]. http://doi.org/10.5061/dryad.0vt4b8gx2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 9, 2020
    Dataset provided by
    Dryad
    Authors
    Christopher Looney; Wilfred Previant; Linda Nagel
    Time period covered
    2020
    Area covered
    United States
    Description
    1. In mixed stands, species complementarity (e.g., facilitation and competition reduction) may enhance forest tree productivity. Although positive mixture effects have been identified in forests worldwide, the majority of studies have focused on two-species interactions in managed systems with high functional diversity. We extended this line of research to examine mixture effects on tree productivity across landscape-scale compositional and environmental gradients in the low functional diversity, fire-suppressed, mixed-conifer forests of the U.S. Interior West.

    2. We investigated mixture effects on the productivity of Pinus ponderosa, Pseudotsuga menziesii, and Abies concolor. Using region-wide forest inventory data, we created individual-tree generalized linear mixed models and examined the growth of these species across community gradients. We compared the relative influences of stand structure, age, competition, and environmental stress on mixture effects using multi-model inference...

  20. n

    Data from: Generalizable EHR-R-REDCap pipeline for a national...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +2more
    zip
    Updated Jan 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller (2022). Generalizable EHR-R-REDCap pipeline for a national multi-institutional rare tumor patient registry [Dataset]. http://doi.org/10.5061/dryad.rjdfn2zcm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2022
    Dataset provided by
    Massachusetts General Hospital
    Harvard Medical School
    Authors
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.

    Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.

    Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.

    Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.

    Methods eLAB Development and Source Code (R statistical software):

    eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).

    eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.

    Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.

    The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).

    Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.

    Data Dictionary (DD)

    EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.

    Study Cohort

    This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.

    Statistical Analysis

    OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Agricultural Research Service (2025). A Baseflow Filter for Hydrologic Models in R [Dataset]. https://catalog.data.gov/dataset/a-baseflow-filter-for-hydrologic-models-in-r-41440
Organization logo

A Baseflow Filter for Hydrologic Models in R

Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description

A Baseflow Filter for Hydrologic Models in R Resources in this dataset:Resource Title: A Baseflow Filter for Hydrologic Models in R. File Name: Web Page, url: https://www.ars.usda.gov/research/software/download/?softwareid=383&modecode=20-72-05-00 download page

Search
Clear search
Close search
Google apps
Main menu