Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This report discusses some problems that can arise when attempting to import PostScript images into R, when the PostScript image contains coordinate transformations that skew the image. There is a description of some new features in the ‘grImport’ package for R that allow these sorts of images to be imported into R successfully.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Video on importing data into R from the Research Experiences in Microbiomes Network
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. Felton
Date: 10/29/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.
Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.
#Code information
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a role:
"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).
"02_functions.R": This script contains custom functions. Load this using the
`source()` function in the 01_start.R script.
"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.
"04_figures_tables.R": This is the main workhouse for figure/table production and
supporting analyses. This script generates the key figures and summary statistics
used in the study that then get saved in the manuscript_figures folder. Note that all
maps were produced using Python code found in the "supporting_code"" folder.
"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.
"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.
analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D
Original data, R script (code) and code output for the paper published on Journal of Dairy Science. For best use, replicate analysis using R. Importing data using the .csv file may cause some variables (columns of the spreadsheet) to be imported with the wrong format. Any issues, do not hesitate in contact. Happy coding!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The LSC (Leicester Scientific Corpus)
April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk) Supervised by Prof Alexander Gorban and Dr Evgeny MirkesThe data are extracted from the Web of Science [1]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.[Version 2] A further cleaning is applied in Data Processing for LSC Abstracts in Version 1*. Details of cleaning procedure are explained in Step 6.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v1.Getting StartedThis text provides the information on the LSC (Leicester Scientific Corpus) and pre-processing steps on abstracts, and describes the structure of files to organise the corpus. This corpus is created to be used in future work on the quantification of the meaning of research texts and make it available for use in Natural Language Processing projects.LSC is a collection of abstracts of articles and proceeding papers published in 2014, and indexed by the Web of Science (WoS) database [1]. The corpus contains only documents in English. Each document in the corpus contains the following parts:1. Authors: The list of authors of the paper2. Title: The title of the paper 3. Abstract: The abstract of the paper 4. Categories: One or more category from the list of categories [2]. Full list of categories is presented in file ‘List_of _Categories.txt’. 5. Research Areas: One or more research area from the list of research areas [3]. Full list of research areas is presented in file ‘List_of_Research_Areas.txt’. 6. Total Times cited: The number of times the paper was cited by other items from all databases within Web of Science platform [4] 7. Times cited in Core Collection: The total number of times the paper was cited by other papers within the WoS Core Collection [4]The corpus was collected in July 2018 online and contains the number of citations from publication date to July 2018. We describe a document as the collection of information (about a paper) listed above. The total number of documents in LSC is 1,673,350.Data ProcessingStep 1: Downloading of the Data Online
The dataset is collected manually by exporting documents as Tab-delimitated files online. All documents are available online.Step 2: Importing the Dataset to R
The LSC was collected as TXT files. All documents are extracted to R.Step 3: Cleaning the Data from Documents with Empty Abstract or without CategoryAs our research is based on the analysis of abstracts and categories, all documents with empty abstracts and documents without categories are removed.Step 4: Identification and Correction of Concatenate Words in AbstractsEspecially medicine-related publications use ‘structured abstracts’. Such type of abstracts are divided into sections with distinct headings such as introduction, aim, objective, method, result, conclusion etc. Used tool for extracting abstracts leads concatenate words of section headings with the first word of the section. For instance, we observe words such as ConclusionHigher and ConclusionsRT etc. The detection and identification of such words is done by sampling of medicine-related publications with human intervention. Detected concatenate words are split into two words. For instance, the word ‘ConclusionHigher’ is split into ‘Conclusion’ and ‘Higher’.The section headings in such abstracts are listed below:
Background Method(s) Design Theoretical Measurement(s) Location Aim(s) Methodology Process Abstract Population Approach Objective(s) Purpose(s) Subject(s) Introduction Implication(s) Patient(s) Procedure(s) Hypothesis Measure(s) Setting(s) Limitation(s) Discussion Conclusion(s) Result(s) Finding(s) Material (s) Rationale(s) Implications for health and nursing policyStep 5: Extracting (Sub-setting) the Data Based on Lengths of AbstractsAfter correction, the lengths of abstracts are calculated. ‘Length’ indicates the total number of words in the text, calculated by the same rule as for Microsoft Word ‘word count’ [5].According to APA style manual [6], an abstract should contain between 150 to 250 words. In LSC, we decided to limit length of abstracts from 30 to 500 words in order to study documents with abstracts of typical length ranges and to avoid the effect of the length to the analysis.
Step 6: [Version 2] Cleaning Copyright Notices, Permission polices, Journal Names and Conference Names from LSC Abstracts in Version 1Publications can include a footer of copyright notice, permission policy, journal name, licence, author’s right or conference name below the text of abstract by conferences and journals. Used tool for extracting and processing abstracts in WoS database leads to attached such footers to the text. For example, our casual observation yields that copyright notices such as ‘Published by Elsevier ltd.’ is placed in many texts. To avoid abnormal appearances of words in further analysis of words such as bias in frequency calculation, we performed a cleaning procedure on such sentences and phrases in abstracts of LSC version 1. We removed copyright notices, names of conferences, names of journals, authors’ rights, licenses and permission policies identified by sampling of abstracts.Step 7: [Version 2] Re-extracting (Sub-setting) the Data Based on Lengths of AbstractsThe cleaning procedure described in previous step leaded to some abstracts having less than our minimum length criteria (30 words). 474 texts were removed.Step 8: Saving the Dataset into CSV FormatDocuments are saved into 34 CSV files. In CSV files, the information is organised with one record on each line and parts of abstract, title, list of authors, list of categories, list of research areas, and times cited is recorded in fields.To access the LSC for research purposes, please email to ns433@le.ac.uk.References[1]Web of Science. (15 July). Available: https://apps.webofknowledge.com/ [2]WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [3]Research Areas in WoS. Available: https://images.webofknowledge.com/images/help/WOS/hp_research_areas_easca.html [4]Times Cited in WoS Core Collection. (15 July). Available: https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-Times-Cited-accessibility-and-variation?language=en_US [5]Word Count. Available: https://support.office.com/en-us/article/show-word-count-3c9e6a11-a04d-43b4-977c-563a0e0d5da3 [6]A. P. Association, Publication manual. American Psychological Association Washington, DC, 1983.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please note this dataset is the most recent version of the Administrative Boundaries (AB). For previous versions of the AB please go to this url: https://data.gov.au/dataset/ds-dga-b4ad5702-ea2b-4f04-833c-d0229bfd689e/details?q=previous\r \r ----------------------------------\r \r Geoscape Administrative Boundaries is Australia’s most comprehensive national collection of boundaries, including government, statistical and electoral boundaries. It is built and maintained by Geoscape Australia using authoritative government data. Further information about contributors to Administrative Boundaries is available here.\r \r This dataset comprises seven Geoscape products:\r \r * Localities\r * Local Government Areas (LGAs)\r * Wards\r * Australian Bureau of Statistics (ABS) Boundaries\r * Electoral Boundaries\r * State Boundaries and\r * Town Points\r \r Updated versions of Administrative Boundaries are published on a quarterly basis.\r \r Users have the option to download datasets with feature coordinates referencing either GDA94 or GDA2020 datums.\r \r Notable changes in the May 2025 release\r \r * Victorian Wards have seen almost half of the dataset change now reflecting the boundaries from the 2024 subdivision review. https://www.vec.vic.gov.au/electoral-boundaries/council-reviews/ subdivision-reviews.\r \r * There have been spatial changes (area) greater than 1 km2 to 66 wards in Victoria. \r \r * One new locality ‘Kenwick Island’ has been added to the local Government area ‘Mackay Regional’ in Queensland.\r \r * There have been spatial changes(area) greater than 1 km2 to the local government areas 'Burke Shire' and 'Mount Isa City' in Queensland.\r \r * There have been spatial changes(area) greater than 1 km2 to the localities ‘Nicholson’, ‘Lawn Hill’ and ‘Coral Sea’ in Queensland and ‘Calguna’, ‘Israelite Bay’ and ‘Balladonia’ in Western Australia.\r \r * An update to the NT Commonwealth Electoral Boundaries has been applied to reflect the redistribution of the boundaries gazetted on 4 March 2025.\r \r * Geoscape has become aware that the DATE_CREATED and DATE_RETIRED attributes in the commonwealth_electoral_polygon MapInfo TAB tables were incorrectly ordered and did not match the product data model. These attributes have been re-ordered to match the data model for the May 2025 release.\r \r IMPORTANT NOTE: correction of issues with the 22 November 2022 release\r \r * On 28 November 2022, the Administrative Boundaries dataset originally released on 22 November 2022 was amended and re-uploaded after Geoscape identified some issues with the original data for 'Electoral Boundaries'.\r * As a result of the error, some shapefiles were published in 3D rather than 2D, which may affect some users when importing data into GIS applications.\r * The error affected the Electoral Boundaries dataset, specifically the Commonwealth boundary data for Victoria and Western Australia, including 'All States'.\r * Only the ESRI Shapefile formats were affected (both GDA94 and GDA2020). The MapInfo TAB format was not affected.\r * Because the datasets are zipped into a single file, once the error was fixed by Geoscape all of Administrative Boundaries shapefiles had to be re-uploaded, rather than only the affected files.\r * If you downloaded either of the two Administrative Boundary ESRI Shapefiles between 22 November and 28 November 2022 and plan to use the Electoral Boundary component, you are advised to download the revised version dated 28 November 2022. Apologies for any inconvenience.\r \r Further information on Administrative Boundaries, including FAQs on the data, is available here or through Geoscape Australia’s network of partners. They provide a range of commercial products based on Administrative Boundaries, including software solutions, consultancy and support.\r \r Note: On 1 October 2020, PSMA Australia Limited began trading as Geoscape Australia. \r \r \r The Australian Government has negotiated the release of Administrative Boundaries to the whole economy under an open CCBY 4.0 licence.\r \r Users must only use the data in ways that are consistent with the Australian Privacy Principles issued under the Privacy Act 1988 (Cth).\r \r Users must also note the following attribution requirements:\r \r Preferred attribution for the Licensed Material:\r \r
Administrative Boundaries © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International license (CC BY 4.0).\r \r Preferred attribution for Adapted Material:\r \r Incorporates or developed using Administrative Boundaries © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International licence (CC BY 4.0).\r \r
What to Expect When You Download Administrative Boundaries\r
\r Administrative Boundaries is large dataset (around 1.5GB unpacked), made up of seven themes each containing multiple layers.\r \r Users are advised to read the technical documentation including the product change notices and the individual product descriptions before downloading and using the product.\r \r Please note this dataset is the most recent version of the Administrative Boundaries (AB). For previous versions of the AB please go to this url: https://data.gov.au/dataset/ds-dga-b4ad5702-ea2b-4f04-833c-d0229bfd689e/details?q=previous\r
License Information\r
\r
Phase 1: ASK
1. Business Task * Cyclist is looking to increase their earnings, and wants to know if creating a social media campaign can influence "Casual" users to become "Annual" members.
2. Key Stakeholders: * The main stakeholder from Cyclist is Lily Moreno, whom is the Director of Marketing and responsible for the development of campaigns and initiatives to promote their bike-share program. The other teams involved with this project will be Marketing & Analytics, and the Executive Team.
3. Business Task: * Comparing the two kinds of users and defining how they use the platform, what variables they have in common, what variables are different, and how can they get Casual users to become Annual members
Phase 2: PREPARE:
1. Determine Data Credibility * Cyclist provided data from years 2013-2021 (through March 2021), all of which is first-hand data collected by the company.
2. Sort & Filter Data: * The stakeholders want to know how the current users are using their service, so I am focusing on using the data from 2020-2021 since this is the most relevant period of time to answer the business task.
#Installing packages
install.packages("tidyverse", repos = "http://cran.us.r-project.org")
install.packages("readr", repos = "http://cran.us.r-project.org")
install.packages("janitor", repos = "http://cran.us.r-project.org")
install.packages("geosphere", repos = "http://cran.us.r-project.org")
install.packages("gridExtra", repos = "http://cran.us.r-project.org")
library(tidyverse)
library(readr)
library(janitor)
library(geosphere)
library(gridExtra)
#Importing data & verifying the information within the dataset
all_tripdata_clean <- read.csv("/Data Projects/cyclist/cyclist_data_cleaned.csv")
glimpse(all_tripdata_clean)
summary(all_tripdata_clean)
Phase 3: PROCESS
1. Cleaning Data & Preparing for Analysis: * Once the data has been placed into one dataset, and checked for errors, we began cleaning the data. * Eliminating data that correlates to the company servicing the bikes, and any ride with a traveled distance of zero. * New columns will be added to assist in the analysis, and to provide accurate assessments of whom is using the bikes.
#Eliminating any data that represents the company performing maintenance, and trips without any measureable distance
all_tripdata_clean <- all_tripdata_clean[!(all_tripdata_clean$start_station_name == "HQ QR" | all_tripdata_clean$ride_length<0),]
#Creating columns for the individual date components (days_of_week should be run last)
all_tripdata_clean$day_of_week <- format(as.Date(all_tripdata_clean$date), "%A")
all_tripdata_clean$date <- as.Date(all_tripdata_clean$started_at)
all_tripdata_clean$day <- format(as.Date(all_tripdata_clean$date), "%d")
all_tripdata_clean$month <- format(as.Date(all_tripdata_clean$date), "%m")
all_tripdata_clean$year <- format(as.Date(all_tripdata_clean$date), "%Y")
** Now I will begin calculating the length of rides being taken, distance traveled, and the mean amount of time & distance.**
#Calculating the ride length in miles & minutes
all_tripdata_clean$ride_length <- difftime(all_tripdata_clean$ended_at,all_tripdata_clean$started_at,units = "mins")
all_tripdata_clean$ride_distance <- distGeo(matrix(c(all_tripdata_clean$start_lng, all_tripdata_clean$start_lat), ncol = 2), matrix(c(all_tripdata_clean$end_lng, all_tripdata_clean$end_lat), ncol = 2))
all_tripdata_clean$ride_distance = all_tripdata_clean$ride_distance/1609.34 #converting to miles
#Calculating the mean time and distance based on the user groups
userType_means <- all_tripdata_clean %>% group_by(member_casual) %>% summarise(mean_time = mean(ride_length))
userType_means <- all_tripdata_clean %>%
group_by(member_casual) %>%
summarise(mean_time = mean(ride_length),mean_distance = mean(ride_distance))
Adding in calculations that will differentiate between bike types and which type of user is using each specific bike type.
#Calculations
with_bike_type <- all_tripdata_clean %>% filter(rideable_type=="classic_bike" | rideable_type=="electric_bike")
with_bike_type %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual,rideable_type,weekday) %>%
summarise(totals=n(), .groups="drop") %>%
with_bike_type %>%
group_by(member_casual,rideable_type) %>%
summarise(totals=n(), .groups="drop") %>%
#Calculating the ride differential
all_tripdata_clean %>%
mutate(weekday = wkday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length),.groups = 'drop') %>%
arrange(me...
[Note: Integrated as part of FoodData Central, April 2019.] The database consists of several sets of data: food descriptions, nutrients, weights and measures, footnotes, and sources of data. The Nutrient Data file contains mean nutrient values per 100 g of the edible portion of food, along with fields to further describe the mean value. Information is provided on household measures for food items. Weights are given for edible material without refuse. Footnotes are provided for a few items where information about food description, weights and measures, or nutrient values could not be accommodated in existing fields. Data have been compiled from published and unpublished sources. Published data sources include the scientific literature. Unpublished data include those obtained from the food industry, other government agencies, and research conducted under contracts initiated by USDA’s Agricultural Research Service (ARS). Updated data have been published electronically on the USDA Nutrient Data Laboratory (NDL) web site since 1992. Standard Reference (SR) 28 includes composition data for all the food groups and nutrients published in the 21 volumes of "Agriculture Handbook 8" (US Department of Agriculture 1976-92), and its four supplements (US Department of Agriculture 1990-93), which superseded the 1963 edition (Watt and Merrill, 1963). SR28 supersedes all previous releases, including the printed versions, in the event of any differences. Attribution for photos: Photo 1: k7246-9 Copyright free, public domain photo by Scott Bauer Photo 2: k8234-2 Copyright free, public domain photo by Scott Bauer Resources in this dataset:Resource Title: READ ME - Documentation and User Guide - Composition of Foods Raw, Processed, Prepared - USDA National Nutrient Database for Standard Reference, Release 28. File Name: sr28_doc.pdfResource Software Recommended: Adobe Acrobat Reader,url: http://www.adobe.com/prodindex/acrobat/readstep.html Resource Title: ASCII (6.0Mb; ISO/IEC 8859-1). File Name: sr28asc.zipResource Description: Delimited file suitable for importing into many programs. The tables are organized in a relational format, and can be used with a relational database management system (RDBMS), which will allow you to form your own queries and generate custom reports.Resource Title: ACCESS (25.2Mb). File Name: sr28db.zipResource Description: This file contains the SR28 data imported into a Microsoft Access (2007 or later) database. It includes relationships between files and a few sample queries and reports.Resource Title: ASCII (Abbreviated; 1.1Mb; ISO/IEC 8859-1). File Name: sr28abbr.zipResource Description: Delimited file suitable for importing into many programs. This file contains data for all food items in SR28, but not all nutrient values--starch, fluoride, betaine, vitamin D2 and D3, added vitamin E, added vitamin B12, alcohol, caffeine, theobromine, phytosterols, individual amino acids, individual fatty acids, or individual sugars are not included. These data are presented per 100 grams, edible portion. Up to two household measures are also provided, allowing the user to calculate the values per household measure, if desired.Resource Title: Excel (Abbreviated; 2.9Mb). File Name: sr28abxl.zipResource Description: For use with Microsoft Excel (2007 or later), but can also be used by many other spreadsheet programs. This file contains data for all food items in SR28, but not all nutrient values--starch, fluoride, betaine, vitamin D2 and D3, added vitamin E, added vitamin B12, alcohol, caffeine, theobromine, phytosterols, individual amino acids, individual fatty acids, or individual sugars are not included. These data are presented per 100 grams, edible portion. Up to two household measures are also provided, allowing the user to calculate the values per household measure, if desired.Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/ Resource Title: ASCII (Update Files; 1.1Mb; ISO/IEC 8859-1). File Name: sr28upd.zipResource Description: Update Files - Contains updates for those users who have loaded Release 27 into their own programs and wish to do their own updates. These files contain the updates between SR27 and SR28. Delimited file suitable for import into many programs.
This dataset includes all the raw data and all the R statistical software code that we used to analyze the data and produce all the outputs that are in the figures, tables, and text of the associated manuscript:Mengistu, A., Q. D. Read, C. R. Little, H. M. Kelly, P. M. Henry, and N. Bellaloui. 2025. Severity of charcoal rot disease in soybean genotypes inoculated with Macrophomina phaseolina isolates differs among growth environments. Plant Disease. DOI: 10.1094/PDIS-10-24-2230-RE.The data included here come from a series of tests designed to evaluate methods for identifying soybean genotypes that are resistant or susceptible to charcoal rot, a widespread and economically significant disease. Four independent experiments were performed to determine the variability in disease severity by soybean genotype and by isolated variant of the charcoal rot fungus: two field tests, a greenhouse test, and a growth chamber test. The tests differed in the number of genotypes and isolates used, as well as the method of inoculation. The accuracy of identifying resistant and susceptible genotypes varied by study, and the same isolate tested across different studies often had highly variable disease severity. Our results indicate that the non-field methods are not reliable ways to identify sources of charcoal rot resistance in soybean.The models fit in the R script archived here are Bayesian general linear mixed models with AUDPC (area under the disease progress curve) as the response variable. One-dimensional clustering is used to divide the genotypes into resistant and susceptible based on their model-predicted AUDPC values, and this result is compared with the preexisting resistance classification. Posterior distributions of the marginal means for different combinations of genotype, isolate, and other covariates are estimated and compared. Code to reproduce the tables and figures of the manuscript is also included.The following files are included:README.pdf: Full description, with column metadata for the data spreadsheets and text description of each R scriptdata2023-04-18.xlsx: Excel sheet with data from three of the four trialscleaned_data.RData: all data in analysis-ready format; generates a set of data frames when imported into an R environmentModified Cut-Tip Inoculation on DT974290 and LS980358 on first 32 isolates.xlsx: Excel spreadsheet with data from the fourth trialdata_cleaning.R: Script required to format data from .xlsx files into analysis-ready format (running this script is not necessary to reproduce the analysis; instead you may begin with the following script importing the cleaned .RData object)AUDPC_fits.R: Script containing code for all model fitting, model predictions and comparisons, and figure and table generation
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R code used to produce Figures 1 and 2. (R)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Bellabeat, a small company manufacturing high-tech products focused on bringing Health-focused smart devices and other Wellness products to Women around the world. Since Urška Sršen and Sando Mur founded the company in 2013 they have seen it grow tremendously. Now they have asked for an analysis on non-Bellabeat smart device usage and how we can use this data to create new campaign strategies and drive future growth.
library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
I utilized Fitbit Fitness tracker data, located here for this project.
6. activity <- read.csv("Fitabase_Data/dailyActivity_merged.csv")
7. calories <- read.csv("Fitabase_Data/dailyCalories_merged.csv")
8. sleep <- read.csv("Fitabase_Data/sleepDay_merged.csv")
9. weight <- read.csv("Fitabase_Data/weightLogInfo_merged.csv")
While using the view function I'm able to skim through the datasets and make sure everything is imported correctly. I will also use this time to see if I need to clean the data in anyway or format the data differently.
10. View(activity)
11. View(calories)
12. View(sleep)
13. View(weight)
After viewing the datasets I see that I will need to format the Dates and Times to matching formats on all the datasets.
14. sleep$SleepDay=as.POSIXct(sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
15. sleep$date <- format(sleep$SleepDay, format = "%m/%d/%y")
16. activity$ActivityDate=as.POSIXct(activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
17. activity$date <- format(activity$ActivityDate, format = "%m/%d/%y")
18. weight$Date=as.POSIXct(weight$Date, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
19. weight$time <- format(weight$Date, format = "%H:%M:%S")
20. weight$date <- format(weight$Date, format = "%m/%d/%y")
21. calories$date <- format(calories$ActivityDay, format = "%m/%d/%y")
Here I will be using the summary function to gather information about minimum, medians, averages, and maximums for certain column in the datasets (ie; Total Steps, Calories, Active Minutes, Minutes Asleep, Sedentary Minutes)
22. activity %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes, Calories) %>%
summary()
23. activity %>%
select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes) %>%
summary()
24. calories %>%
select(Calories) %>%
summary()
25. sleep %>%
select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
summary()
26. weight %>%
select(WeightKg, BMI) %>%
summary()
ggplot(data=activity, aes(x=TotalSteps, y=Calories)) +
geom_point(color='purple') + geom_smooth() + labs(title="Total Steps vs. Calories")
_https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16489441%2Fe7a12b855837b0c6b7a2a5b1736e0fe1%2Fminsleepvsedentarymin.png?generation=1700709515785307&alt=media" alt="">
- The second scatter plot showcas...
We provide four datasets as csv files and one R code file that contains code for analyzing the data in these files. See the "README" file for metadata for each dataset. The contents of each file are as follows: * 2018_knockout_CFU_data.csv contains nodule culturing data (counts of colony forming units) from the 2018 Knockout Experiment * 2018_knockout_greenhouse_data.csv contains plant harvest data from the 2018 Knockout Experiment * 2019_GxG_knockout_CFU_data_330plants.csv contains nodule culturing data (counts of colony forming units) from the 2019 GxG Knockout Experiment * 2019_GxG_knockout_greenhouse_data_330plants.csv contains plant harvest data from the 2019 GxG Knockout Experiment * Wendlandt_et_al_2022_JEvolBiol_code.R contains R code for importing, processing, and analyzing data from the above four datasets. It also contains code for producing figures from these data.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
File List collyer_adams_Rcode.txt -- R code for running analysis collyer_adams_example_data.csv -- example data to input into R routine collyer_adams_example_xmat.csv -- coding for the design matrix used collyer_adams_ESA_supplement.zip -- all files at once
Description The collyer_adams_Rcode.txt file contains a procedure for performing the analysis described in Appendix A, using R. The procedure imports data and a design matrix (collyer_adams_example_data.csv and collyer_adams_example_xmat.csv are provided, and correspond to the example in Appendix A). The default number of permutations is 999, but can be changed. A matrix of random values (distances, contrasts, angles) and a results summary are created from the program. Users should be aware that importing different data sets will require altering some of the R code to accommodate their data (e.g., matrix dimensions would need to be changed).
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Overview:
The Copernicus DEM is a Digital Surface Model (DSM) which represents the surface of the Earth including buildings, infrastructure and vegetation. The original GLO-30 provides worldwide coverage at 30 meters (refers to 10 arc seconds). Note that ocean areas do not have tiles, there one can assume height values equal to zero. Data is provided as Cloud Optimized GeoTIFFs. Note that the vertical unit for measurement of elevation height is meters.
The Copernicus DEM for Europe at 3 arcsec (0:00:03 = 0.00083333333 ~ 90 meter) in COG format has been derived from the Copernicus DEM GLO-30, mirrored on Open Data on AWS, dataset managed by Sinergise (https://registry.opendata.aws/copernicus-dem/).
Processing steps:
The original Copernicus GLO-30 DEM contains a relevant percentage of tiles with non-square pixels. We created a mosaic map in VRT format and defined within the VRT file the rule to apply cubic resampling while reading the data, i.e. importing them into GRASS GIS for further processing. We chose cubic instead of bilinear resampling since the height-width ratio of non-square pixels is up to 1:5. Hence, artefacts between adjacent tiles in rugged terrain could be minimized:
gdalbuildvrt -input_file_list list_geotiffs_MOOD.csv -r cubic -tr 0.000277777777777778 0.000277777777777778 Copernicus_DSM_30m_MOOD.vrt
In order to reduce the spatial resolution to 3 arc seconds, weighted resampling was performed in GRASS GIS (using r.resamp.stats -w
and the pixel values were scaled with 1000 (storing the pixels as integer values) for data volume reduction. In addition, a hillshade raster map was derived from the resampled elevation map (using r.relief
, GRASS GIS). Eventually, we exported the elevation and hillshade raster maps in Cloud Optimized GeoTIFF (COG) format, along with SLD and QML style files.
Projection + EPSG code:
Latitude-Longitude/WGS84 (EPSG: 4326)
Spatial extent:
north: 82:00:30N
south: 18N
west: 32:00:30W
east: 70E
Spatial resolution:
3 arc seconds (approx. 90 m)
Pixel values:
meters * 1000 (scaled to Integer; example: value 23220 = 23.220 m a.s.l.)
Software used:
GDAL 3.2.2 and GRASS GIS 8.0.0 (r.resamp.stats -w; r.relief)
Original dataset license:
https://spacedata.copernicus.eu/documents/20126/0/CSCDA_ESA_Mission-specific+Annex.pdf
Processed by:
mundialis GmbH & Co. KG, Germany (https://www.mundialis.de/)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EEG data analysis repository for Shalu et al. Similar does not mean the same: ERP correlates of processing mental and physical experiencer verbs in Malayalam complex constructions. The preprocessed data from https://doi.org/10.5281/zenodo.14986232 was imported in R (Version 4.4.2; R Core Team, 2024) using the eeguana package (Version 0.1.11.9001; Nicenboim, 2018) for epoching and statistical analysis. Single trial EEG epochs were used for statistically analysing the mean amplitudes in selected time-windows of interest by fitting linear mixed effects models (LMEM) using the lme4 package (Bates et al., 2015) in R.
The names of the zipped folders describe their respective contents. The Code_In_Context_Analysis_Output_R_Notebooks folder contains the R Notebooks of the LMEM analyses of the behavioural and ERP data. These notebooks show the code, data and output in context, and provide full model summaries and details for all the models reported in the article. The R scripts themselves reside in a folder of their own, which also includes the version of the eeguana package we used for the analyses, and the custom-made helper scripts for processing the EEG data, epoching them and extracting mean amplitudes from them. The EEG and Epochs data resulting from the ERP analysis, as well as the single-trial mean amplitudes and prestimulus mean amplitudes extracted for each time-window of interest are in EEG_And_Epochs_Data_Files. Since the LMEM models computed were quite complex, the model objects are provided as RDS files for easily importing them into R without having to compute the models again. The plots generated at various stages of the analysis are in the Plots folder.
References
Bates, D., Mächler, M., Bolker, B., and Walker, S. (2015). Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1. doi: 10.18637/jss.v067.i01.
Nicenboim, B. (2018). eeguana: A package for manipulating EEG data in R. Version 0.1.11.9001. Retrieved from https://github.com/bnicenboim/eeguana. http://doi.org/10.5281/zenodo.2533138.
R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org.
Shalu, S., Choudhary, K. K., & Muralikrishnan, R. (2025). Dataset with EEG data from Malayalam speakers. (Shalu et al., 2025) [Dataset]. Zenodo. https://doi.org/10.5281/ZENODO.14986232
We generated a collection of Ensifer medicae (rhizobia) strains from field soils and used PCR to screen each strain for the presence of hrrP, a gene that can change how cooperative rhizobia are with their legume hosts. We provide the results of the PCR screen for each strain in our collection. A subset of rhizobia from this collection was inoculated onto Medicago polymorpha plants in a greenhouse experiment, and we measured plant growth benefits from inoculation (leaf count, shoot mass) and symbiosis traits (number of root nodules, nodule size, and number of colony-forming units of E. medicae within the nodules). We provide raw plant-level and nodule-level data from this experiment.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
File List README (MD5: c3c7b6e8b8c2e42f967dff8c15b6d9e6) SiteAndVegParams.RData (MD5: 86dcf70da2c1e4766cde4a222b78f80b) RunSimulations2014-06-11.R (MD5: a7ad23be70e82619e104dbe42746820b) AmerifluxToClimateFile.R (MD5: 093dfe9d2ff6e61fd2e1b92615d60a70) pnetcn_NewPhenology.R (MD5: 7e069c490ec0321a0572c7f73943e96a) pnetcn_NewPhenology_cont.R (MD5: 44d4689a457b964f48e497f9b4d3d0df) pnetcn.R (MD5: d4168f06879c698f46b12d4dfb748d74) spinup_pnetcn_NewPhenology.R (MD5: a64a22fd94e06389292c8c417d2d75d8) spinup_pnetcn.R (MD5: 1dc1a10f75c4e14e8108919e0a2d31a3) run_pnetcn.R (MD5: 204dcd9176dfe57b83474fc4677253d0) AllocateMo.R (MD5: 7bd2c8e7c2573b64fddef23878be7be6) AllocateYr_NewPhenology.R (MD5: 68c4686f579ae19453fbdd73fdaca927) AllocateYr.R (MD5: 17e49c1aaf9d3d045af11344683eaa45) AtmEnviron_NewPhenologyOLD.R (MD5: f9f9ce07b2d1d29b719f58c0c2bf5eec) AtmEnviron_NewPhenology.R (MD5: 6d11dce44856009c05af514b8960cbf5) AtmEnviron.R (MD5: 6a352d22899c3fe1c10cecd2f2bae41d) CalculateYr_NewPhenology.R (MD5: b0d9d700ba3c9503c8d280a6bd50b0f4) CNTrans.R (MD5: 15f7ea8951957fe0b5714d3164bcb739) Decomp.R (MD5: 6f851eea37ba654a2434a3a9df4f7160) GrowthInit_NewPhenology.R (MD5: ce50ce5fa8a8282e7ad38ec913c9f948) initvars.R (MD5: 8503263de5d593fde793622256556209) Leach.R (MD5: 49667570a1eae670cb4121f7ca9bfd64) PhenologyNew.R (MD5: 088b2c1b406fab16c4bc55c27c85a5c6) PhenologyNew2.R (MD5: 6d881331616e178c4b79d0bda723a934) Phenology.R (MD5: 07f72b99a8e981d76100548247d948f2) Photosyn.R (MD5: 44755415a892bedc86b0f6bad0f50853) SoilResp.R (MD5: 018dc6e0d4eb5a5f66bb46179099f74b) storeoutput.R (MD5: f527f8a285c169bdb53aa518f15880c5) StoreYrOutput.R (MD5: 984cbf2a2f518b5795c319e48bac1ac4) Waterbal.R (MD5: c3d9d4516e009b9fad7d08ac488f2abc) YearInit_NewPhenology.R (MD5: 0e9957a828f64f25495bdca9c409d070) YearInit.R (MD5: ceb4bab739a52b78cfcf7ca411939174) Ameriflux/ Daymet/ FluxDataFunctions.R (MD5: ae89d6b9cf4e01ba501941ce12690e4b)
Description
README - Brief notes to get started
SiteAndVegParams.RData – R data structures containing site-specific parameters for the six sites analyzed in the paper
RunSimulations2014-06-11.R – Template code for importing AmeriFlux and Daymet data and running simulations with climate files including spinup years (must be modified)
AmerifluxToClimateFile.R – Functions needed to import Ameriflux and Daymet data and generate climate files
pnetcn_NewPhenology.R - Top level function to run the version of PnET-CN used in the paper (new phenology routine)
pnetcn.R - Top level function to run traditional version of PnET-CN (old phenology routine)
spinup_pnetcn_NewPhenology.R – Alternative version of top-level function that repeats the climate data from the input climate file an arbitrary number of times for spinup (new phenology routine)
pnetcn_NewPhenology_cont.R – Function to continue a simulation starting with spinup data from a previous run (called by spinup_pnetcn_NewPhenology.R)
spinup_pnetcn.R - Alternative version of top-level function for traditional version of PnET-CN that repeats the climate data from the input climate file an arbitrary number of times for spinup (old phenology routine)
run_pnetcn.R - Functions to run PnET-CN with output as data frames of monthly or annual data instead of as a list containing both formats
AllocateMo.R - Monthly allocation routine for PnET-CN
AllocateYr_NewPhenology.R - Yearly allocation routine for PnET-CN (with new phenology routine)
AllocateYr.R - Yearly allocation routine for PnET-CN (with old phenology routine)
AtmEnviron_NewPhenology.R - Environmental calculations for PnET-CN (for new phenology routine)
AtmEnviron_NewPhenologyOLD.R – Older version of environmental calculations for PnET-CN, with less data saved to data structure (for new phenology routine)
AtmEnviron.R - Environmental calculations for PnET-CN (for old phenology routine)
CalculateYr_NewPhenology.R - Calculate yearly output values for PnET-CN (for new phenology routine)
CNTrans.R - Carbon and nitrogen translocation routine for PnET-CN
Decomp.R - Decompositin routine for PnET-CN
GrowthInit_NewPhenology.R – Initialize annual aggregation variables for each year in PnET-CN
initvars.R - Initialize internal shared variable structures for PnET-CN
Leach.R - Leaching routine for PnET-CN
PhenologyNew.R - Functions to calculate phenology for PnET-CN (new phenology routine)
PhenologyNew2.R - Skeleton code for new functions to calculate phenology for PnET-CN that would use alternative (e.g., water-driven) phenology cues for grasslands (new phenology routine - INCOMPLETE)
Phenology.R - Functions to calculate phenology for PnET-CN (old phenology routine)
Photosyn.R - Photosynthesis routine for PnET-CN
SoilResp.R - Soil respiration routine for PnET-CN
storeoutput.R - Adds variable values to the returned output structure so that the user may work with them (or save them) at the command line after running PnET-CN
StoreYrOutput.R - Routine to save annual results to an output file for PnET-CN (not used)
Waterbal.R - Ecosystem water balance routine for PnET-CN
...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This report discusses some problems that can arise when attempting to import PostScript images into R, when the PostScript image contains coordinate transformations that skew the image. There is a description of some new features in the ‘grImport’ package for R that allow these sorts of images to be imported into R successfully.