https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Subsetting of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Simulation of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Loop Functions of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
Multi-angle Imaging SpectroRadiometer (MISR) is an instrument designed to view Earth with cameras pointed in 9 different directions. As the instrument flies overhead, each piece of Earth's surface below is successively imaged by all 9 cameras, in each of 4 wavelengths (blue, green, red, and near-infrared). The goal of MISR is to improve our understanding of the fate of sunlight in Earth environment, as well as distinguish different types of clouds, particles and surfaces. Specifically, MISR monitors the monthly, seasonal, and long-term trends in three areas: 1) amount and type of atmospheric particles (aerosols), including those formed by natural sources and by human activities; 2) amounts, types, and heights of clouds, and 3) distribution of land surface cover, including vegetation canopy structure. MISR Level 1B2 Ellipsoid Data subset for the UAE region V002 contains Ellipsoid-projected TOA Radiance, resampled at the surface and topographically corrected.
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Getting started, Background of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Sloan Digital Sky Survey (SDSS) is a comprehensive survey of the northern sky. This dataset contains a subset of this survey, of 100077 objects classified as galaxies, it includes a CSV file with a collection of information and a set of files for each object, namely JPG image files, FITS and spectra data. This dataset is used to train and explore the astromlp-models collection of deep learning models for galaxies characterisation.
The dataset includes a CSV data file where each row is an object from the SDSS database, and with the following columns (note that some data may not be available for all objects):
objid: unique SDSS object identifier
mjd: MJD of observation
plate: plate identifier
tile: tile identifier
fiberid: fiber identifier
run: run number
rerun: rerun number
camcol: camera column
field: field number
ra: right ascension
dec: declination
class: spectroscopic class (only objetcs with GALAXY are included)
subclass: spectroscopic subclass
modelMag_u: better of DeV/Exp magnitude fit for band u
modelMag_g: better of DeV/Exp magnitude fit for band g
modelMag_r: better of DeV/Exp magnitude fit for band r
modelMag_i: better of DeV/Exp magnitude fit for band i
modelMag_z: better of DeV/Exp magnitude fit for band z
redshift: final redshift from SDSS data z
stellarmass: stellar mass extracted from the eBOSS Firefly catalog
w1mag: WISE W1 "standard" aperture magnitude
w2mag: WISE W2 "standard" aperture magnitude
w3mag: WISE W3 "standard" aperture magnitude
w4mag: WISE W4 "standard" aperture magnitude
gz2c_f: Galaxy Zoo 2 classification from Willett et al 2013
gz2c_s: simplified version of Galaxy Zoo 2 classification (labels set)
Besides the CSV file a set of directories are included in the dataset, in each directory you'll find a list of files named after the objid column from the CSV file, with the corresponding data, the following directories tree is available:
sdss-gs/ ├── data.csv ├── fits ├── img ├── spectra └── ssel
Where, each directory contains:
img: RGB images from the object in JPEG format, 150x150 pixels, generated using the SkyServer DR16 API
fits: FITS data subsets around the object across the u, g, r, i, z bands; cut is done using the ImageCutter library
spectra: full best fit spectra data from SDSS between 4000 and 9000 wavelengths
ssel: best fit spectra data from SDSS for specific selected intervals of wavelengths discussed by Sánchez Almeida 2010
Changelog
v0.0.4 - Increase number of objects to ~100k.
v0.0.3 - Increase number of objects to ~80k.
v0.0.2 - Increase number of objects to ~60k.
v0.0.1 - Initial import.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Proposing relevant catalyst descriptors that can relate the information on a catalyst’s composition to its actual performance is an ongoing area in catalyst informatics, as it is a necessary step to improve our understanding on the target reactions. Herein, a small descriptor-engineered data set containing 3289 descriptor variables and the performance of 200 catalysts for the oxidative coupling of methane (OCM) is analyzed, and a descriptor search algorithm based on the workflow of the Basin-hopping optimization methodology is proposed to select the descriptors that better fit a predictive model. The algorithm, which can be considered wrapper in nature, consists of the successive generation of random-based modifications to the descriptor subset used in a regression model and adopting them depending on their effect on the model’s score. The results are presented after being tested on linear and Support Vector Regression models with average cross-validation r2 scores of 0.8268 and 0.6875, respectively.
This file contains Ellipsoid-projected TOA Radiance,resampled at the surface and topographically corrected, as well as geometrically corrected by PGE22 for the SAMUM_2006 theme.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”
A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org
Please cite this when using the dataset.
Detailed description of the dataset:
1 Film Dataset: Festival Programs
The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.
The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.
The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.
The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.
2 Survey Dataset
The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.
The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.
The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.
The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.
3 IMDb & Scripts
The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.
The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.
The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.
The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.
The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.
The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.
The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.
The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.
The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.
The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.
The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.
The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.
The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.
The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.
The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.
The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.
The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.
The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.
The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.
4 Festival Library Dataset
The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.
The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories, units of measurement, data sources and coding and missing data.
The csv file “4_festival-library_dataset_imdb-and-survey” contains data on all unique festivals collected from both IMDb and survey sources. This dataset appears in wide format, all information for each festival is listed in one row. This
This dataset provides calculated camera-NDVI data for individual regions-of-interest (ROI's) for the phenocam named 'GRCA1PJ' (part of the Phenocam Network, https://phenocam.sr.unh.edu/webcam/). The GRCA1PJ phenocam is within a pinyon-juniper woodland in Grand Canyon National Park. Camera-NDVI refers to a modified version of NDVI calculated by the phenopix package (Filippa et al., 2016). The camera-calculated NDVI data are in the folder FinalOutput. File attributes within that folder are described in detail in the entity and attribute information section of this metadata. It should be possible for the user to use only the ROI definitions, image data downloaded from the phenocam network, and the phenopix R-package to reproduce the final NDVI dataset. However, the dataset also contains scripts and intermediate files that may be helpful in reproducing or extending the processing, but are not essential to reproducing the data. The complete dataset release includes 1) A workflow spreadsheet file that describes the processing steps, associated scripts, and output filenames (filename:Workflow_With_Filenames.ods). 2) R-code script files used in processing (folder:'Code'). 3) ROI boundary files and jpg images for the ROIs presented in the linked publication. (folder:"Phenocamdata/grca1pj/ROI") 4) Ancillary files used to create the NDVI dataset; these include exposure coordinates and training files (folder:'Phenocamdata/grca1pj/Ancillary'). 5) Files listing exposures for individual photos within the initial processing time period (folder:'Exposures'). 6) Screening parameters for cloud and poor-light-condition screening of photos, as well as a list of photos that meet the cloud-screening standards (folder:'Phenocamdata/grca1pj/BlueSkyScreening'). 7) Vegetation index files produced by the phenopix package, organized by ROI and month-year group (folder:"Phenocamdata/grca1pj/VI_Tables"). 8) Supplemental R-project and packrat files that allow the user to apply the workflow by opening a project that will use the same package versions used in this study (eg, .folders Rproj.user, and packrat, and files .RData, and PhenocamPR.Rproj). 9) The graphic 'Fig_4_ROIWithLabels.jpg' shows the phenocam field of view with labelled ROIs. outline colors correspond to juniper (red), pinyon (blue), 238 and other species (yellow). Labels correspond to NDVI curves in 'Fig_7_PhenocamCurves.JPG', (also included in this data release). The composite area comprises the field of view beneath the approximate horizon line labelled ‘J’ (gray). This image corresponds to Figure 4 in the associated journal article. 10) The graphic 'Fig_7_PhenocamCurves.JPG' shows NDVI curves derived from phenocam images from September 2017 - December 2018 for individual regions of interest (ROIs). Letter designations correspond to ROI labels in Fig_4_ROIWithLabels.jpg (also included in this data release). Data were screened to remove cloudy photos during Aqua and Terra flyover hours. Black ellipses indicate times when the ROI target vegetation was shaded. Red ellipses indicate times when the background of the ROI was shaded. To improve visibility, the Y axis is restricted and excludes 37 extreme values out of a total of 6698 values. The exposure adjustment method used by the phenopix package produces NDVI values that have a strong linear correlation with spectroradiometer-derived NDVI but are negatively shifted so that vegetated areas often have NDVI values below zero. This image corresponds to Figure 7 in the associated journal article. The file types .Rdata or .rds are commonly used in this release because these are the types created by the phenopix processing package, and these files will be needed (or the user will need to recreate new versions) for further processing. The scripts enable the user to replicate processing or to extend it to different times or areas of interest; however, these scripts require as additional input phenocam imagery that the user must download. To successfully use the packrat information to replicate the exact processing steps that were used, the user should refer to packrat documentation available at https://cran.r-project.org/web/packages/packrat/index.html and at https://www.rdocumentation.org/packages/packrat/versions/0.5.0. Alternatively, the user may also use the descriptive documentation phenopix package documentation, and description/references provided in the associated journal article to process the data to achieve the same results using newer packages or other software programs. Species-specific phenological curves included in the NDVI output section this dataset: Juniperus osteosperma, Pinus edulis, Purshia stansburiana, Artemisia tridentata, and Chamaebatiaria millefolium
This data provides results from chemistry and field analyses, from the California Environmental Data Exchange Network (CEDEN). The data set contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result.
Due to file size limitations, the data has been split into individual resources by year. The entire dataset can also be downloaded in bulk using the zip files on this page (in csv format or parquet format), and developers can also use the API associated with each year's dataset to access the data. Example R code using the API to access data across all years can be found here.
Users who want to manually download more specific subsets of the data can also use the CEDEN query tool, at: https://ceden.waterboards.ca.gov/AdvancedQueryTool
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The zip-file contains the data and code accompanying the paper 'Effects of nutrient enrichment on freshwater macrophyte and invertebrate abundance: A meta-analysis'. Together, these files should allow for the replication of the results.
The 'raw_data' folder contains the 'MA_database.csv' file, which contains the extracted data from all primary studies that are used in the analysis. Furthermore, this folder contains the file 'MA_database_description.txt', which gives a description of each data column in the database.
The 'derived_data' folder contains the files that are produced by the R-scripts in this study and used for data analysis. The 'MA_database_processed.csv' and 'MA_database_processed.RData' files contain the converted raw database that is suitable for analysis. The 'DB_IA_subsets.RData' file contains the 'Individual Abundance' (IA) data subsets based on taxonomic group (invertebrates/macrophytes) and inclusion criteria. The 'DB_IA_VCV_matrices.RData' contains for all IA data subsets the variance-covariance (VCV) matrices. The 'DB_AM_subsets.RData' file contains the 'Total Abundance' (TA) and 'Mean Abundance' (MA) data subsets based on taxonomic group (invertebrates/macrophytes) and inclusion criteria.
The 'output_data' folder contains maps with the output data for each data subset (i.e. for each metric, taxonomic group and set of inclusion criteria). For each data subset, the map contains random effects selection results ('Results1_REsel_
The 'scripts' folder contains all R-scripts that we used for this study. The 'PrepareData.R' script takes the database as input and adjusts the file so that it can be used for data analysis. The 'PrepareDataIA.R' and 'PrepareDataAM.R' scripts make subsets of the data and prepare the data for the meta-regression analysis and mixed-effects regression analysis, respectively. The regression analyses are performed in the 'SelectModelsIA.R' and 'SelectModelsAM.R' scripts to calculate the regression model results for the IA metric and MA/TA metrics, respectively. These scripts require the 'RandomAndFixedEffects.R' script, containing the random and fixed effects parameter combinations, as well as the 'Functions.R' script. The 'CreateMap.R' script creates a global map with the location of all studies included in the analysis (figure 1 in the paper). The 'CreateForestPlots.R' script creates plots showing the IA data distribution for both taxonomic groups (figure 2 in the paper). The 'CreateHeatMaps.R' script creates heat maps for all metrics and taxonomic groups (figure 3 in the paper, figures S11.1 and S11.2 in the appendix). The 'CalculateStatistics.R' script calculates the descriptive statistics that are reported throughout the paper, and creates the figures that describe the dataset characteristics (figures S3.1 to S3.5 in the appendix). The 'CreateFunnelPlots.R' script creates the funnel plots for both taxonomic groups (figures S6.1 and S6.2 in the appendix) and performs Egger's tests. The 'CreateControlGraphs.R' script creates graphs showing the dependency of the nutrient response to control concentrations for all metrics and taxonomic groups (figures S10.1 and S10.2 in the appendix).
The 'figures' folder contains all figures that are included in this study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
aBefore initiating analysis data exploration (not shown) was carried out to compare the entire cohort of all enrolled children to the subset of children with available information from Health and Demographic Surveillance System data and no differences were noted.bFor the multivariable logistic regression model, n = 84 for children who died and n = 781 for children who survived due to missing data.
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Scoping Rules of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
The dataset and its description are available at (https://sites.google.com/site/michelevolpiresearch/data/zurich-dataset) A subset of 4 crops is selected. Morphological Profiles are computed over each band (R,G,B,NIR) with attribute area. The data are provided in Libsvm format. To evaluation is performed with a leave-one-out estimation: held one image out and trained the model on the remaining 3 scenes. 1) training_set_sdap_1 test_set_sdap_1 2) training_set_sdap_2 test_set_sdap_2 3) training_set_sdap_3 test_set_sdap_3 4) training_set_sdap_4 test_set_sdap_4
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Renewable variable energy resources (VER) - solar and wind energy are becoming increasingly important sources of electricity worldwide. Assessing the potential and the reliability of the resources requires long-term historical data. Directly measured solar radiation and wind speed are limited to locations of weather stations, and even when available, the observations are not directly suitable for the evaluation of VERs potential (as an example, the wind speed is rarely measured at wind turbines heights). Reanalysis data based on satellite imagery and Earth system models, such as MERRA-2 offer a broad set of long-term time series on a global grid.
`merra2ools` is a preprocessed subset of MERRA-2 variables and a software (R-package) designed for quick estimation of hourly output of solar photovoltaics and wind turbines. The grid of the MERRA-2 dataset has 0.625° step length along longitude (- 180° to 180°) and 0.5° along latitude (- 90° to 90°), making 576 x 361 grid or 207936 locations. The subset of the hourly data covers the period from 1980-Jan-01 00:30 UTC to 2020-Jan-31 23:30 UTC. It includes eight variables: wind speed at 10- and 50-meters height (W10M and W50M), wind direction (WDIR), the atmospheric temperature at 10 meters height (T10M), surface incoming shortwave flux (SWGDN), surface albedo (ALBEDO), bias-corrected total precipitation (PRECTOTCORR), and air density at the surface (RHOA). The dataset's key variables are date-time in Coordinated Universal Time timezone (UTC) and location identifiers (locid). In total, the subset has 290,357,084,160 data points (362,946,355,200 including the key variables). To reduce the dataset's memory footprint (~3TB uncompressed), the original MERRA-2 variables have been rounded, scaled, and stored as integers in highly compressed data format with high speed full random access (`fst` package for R). The resulting dataset is saved in separate files by months (41 years x 12 months, 492 data-files in total). Additionally, some summary statistics such as mean values of each variable by month and location ID, annual spatial correlations with the nearest neighbors have been calculated for wind speed and solar irradiance and added to the dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement. This dataset is a shapefile which is a subset for the Hunter subregion containing geographical locations and other characteristics (see below) of streamflow gauging stations. T There are 3 …Show full descriptionAbstract The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement. This dataset is a shapefile which is a subset for the Hunter subregion containing geographical locations and other characteristics (see below) of streamflow gauging stations. T There are 3 files that have been extracted from the Hydstra database to aid in identifying sites in the Hunter subregion and the type of data collected from each on. the 3 files are: Site - lists all sites available in Hydstra from data providers. The data provider is listed in the #Station as _xxx. For example, sites in NSW are _77, QLD are _66. Some sites do not have locational information and will not be able to be plotted. Period - the period table lists all the variables that are recorded at each site and the period of record. Variable - the variable table shows variable codes and names which can be linked to the period table. Purpose Locations are used as pour points in order to define reach areas for river system modelling. Dataset History Subset of data for the Hunter subregion that was extracted from the Bureau of Meteorology's hydstra system and includes all gauges where data has been received from the lead water agency of each jurisdiction. The gauges shapefile for all bioregions was intersected with the Hunter subregion boundary to identify and extract gauges within the subregion. Dataset Citation Bioregional Assessment Programme (2016) HUN AWRA-R calibration nodes v01. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/f2da394a-3d08-4cf4-8c24-bf7751ea06a1. Dataset Ancestors Derived From Gippsland Project boundary Derived From Bioregional Assessment areas v04 Derived From Natural Resource Management (NRM) Regions 2010 Derived From Bioregional Assessment areas v03 Derived From Victoria - Seamless Geology 2014 Derived From Bioregional Assessment areas v05 Derived From National Surface Water sites Hydstra Derived From Bioregional Assessment areas v01 Derived From Bioregional Assessment areas v02 Derived From GEODATA TOPO 250K Series 3 Derived From NSW Catchment Management Authority Boundaries 20130917 Derived From Geological Provinces - Full Extent Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The zip-file contains the data and code accompanying the paper 'Effects of nutrient enrichment on freshwater macrophyte and invertebrate abundance: A meta-analysis'. Together, these files should allow for the replication of the results.
The 'raw_data' folder contains the 'MA_database.csv' file, which contains the extracted data from all primary studies that are used in the analysis. Furthermore, this folder contains the file 'MA_database_description.txt', which gives a description of each data column in the database.
The 'derived_data' folder contains the files that are produced by the R-scripts in this study and used for data analysis. The 'MA_database_processed.csv' and 'MA_database_processed.RData' files contain the converted raw database that is suitable for analysis. The 'DB_IA_subsets.RData' file contains the 'Individual Abundance' (IA) data subsets based on taxonomic group (invertebrates/macrophytes) and inclusion criteria. The 'DB_IA_VCV_matrices.RData' contains for all IA data subsets the variance-covariance (VCV) matrices. The 'DB_AM_subsets.RData' file contains the 'Total Abundance' (TA) and 'Mean Abundance' (MA) data subsets based on taxonomic group (invertebrates/macrophytes) and inclusion criteria.
The 'output_data' folder contains maps with the output data for each data subset (i.e. for each metric, taxonomic group and set of inclusion criteria). For each data subset, the map contains random effects selection results ('Results1_REsel_
The 'scripts' folder contains all R-scripts that we used for this study. The 'PrepareData.R' script takes the database as input and adjusts the file so that it can be used for data analysis. The 'PrepareDataIA.R' and 'PrepareDataAM.R' scripts make subsets of the data and prepare the data for the meta-regression analysis and mixed-effects regression analysis, respectively. The regression analyses are performed in the 'SelectModelsIA.R' and 'SelectModelsAM.R' scripts to calculate the regression model results for the IA metric and MA/TA metrics, respectively. These scripts require the 'RandomAndFixedEffects.R' script, containing the random and fixed effects parameter combinations, as well as the 'Functions.R' script. The 'CreateMap.R' script creates a global map with the location of all studies included in the analysis (figure 1 in the paper). The 'CreateForestPlots.R' script creates plots showing the IA data distribution for both taxonomic groups (figure 2 in the paper). The 'CreateHeatMaps.R' script creates heat maps for all metrics and taxonomic groups (figure 3 in the paper, figures S11.1 and S11.2 in the appendix). The 'CalculateStatistics.R' script calculates the descriptive statistics that are reported throughout the paper, and creates the figures that describe the dataset characteristics (figures S3.1 to S3.5 in the appendix). The 'CreateFunnelPlots.R' script creates the funnel plots for both taxonomic groups (figures S6.1 and S6.2 in the appendix) and performs Egger's tests. The 'CreateControlGraphs.R' script creates graphs showing the dependency of the nutrient response to control concentrations for all metrics and taxonomic groups (figures S10.1 and S10.2 in the appendix).
The 'figures' folder contains all figures that are included in this study.
This dataset is provided in support of the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States". The data and code provided allow users to replicate, test, or further explore results. The dataset includes 2 raster datasets (folder:Rasters): 1) 'cntWinterPks2003_2018DR' provides a count of years with winter peaks from 2003-2018 in an 11-state area in the western United States. 2) 'VegClassGte5_2003_2018' raster, within the zip file 'WinterPeaksVegTypes.zip' identifies the broad vegetation types for locations with common winter peaks (5 or more years out of 16). The dataset also includes Google Earth Engine and R code files used to create the datasets. Additional files/folders provided include 1) Google Earth Engine scripts used to download MODIS data the GEE - javascript interface (folder: 'Code'). 2) Scripts used to manipulate rasters and to calculate and map the occurrence winter NDVI peaks from 2003-2018 using the statistical software package 'R'. 3) Supplemental R-project and packrat files that allow the user to apply the workflow by opening a project that will use the same package versions used in this study, for example the folders 'Rproj.user', and 'packrat', and files '.RData', and 'WinterPeakExtentPR.Rproj'. 4) Empty folders ('GEE_DataAnnPeak', 'GEE_DataLoose', and 'GEE_DataStrict') that should be used to contain the output from the GEE code files as follows: 'GEE_DataAnnPeak' should contain output from the S3 and S4 scripts, 'GEE_DataLoose' should contain output from the S1 script, and 'GEE_DataStrict' should contain output from the S2 script. 5) Graphic file 'Fig_9_MapsOfExtentPortrait2.jpg' shows temporal and ecosystem distribution of winter NDVI peaks in the western continental US, 2003 to 2018, derived from MODIS MCD43A4 product. TOP: Number of years with winter peaks in areas that meet defined thresholds for biomass (median annual peak NDVI >= 0.15) and temperature (mean December minimum daily temperature <= 0°C). BOTTOM: Predominant LANDFIRE Existing Vegetation Type physiognomy (i.e., mode of each 500-m MODIS pixel) in areas with >= 5 years of winter peaks. Present in lesser proportions but not identified on the map for legibility reasons are conifer-hardwood, exotics, riparian, and sparsely vegetated physiognomic categories as well as non-natural/non-terrestrial ecosystem categories. State abbreviations are AZ (Arizona), CA (California), CO (Colorado), ID (Idaho), MT (Montana), NV (Nevada), NM (New Mexico), OR (Oregon), WA (Washington), and WY (Wyoming). The final steps of overlaying common winter peak extent data on the Landfire data were done using ArcGIS and the publicly available Landfire dataset (see source datasets section of metadata and process steps). To successfully use the packrat information to replicate the exact processing steps that were used, the user should refer to packrat documentation available at https://cran.r-project.org/web/packages/packrat/index.html and at https://www.rdocumentation.org/packages/packrat/versions/0.5.0. Alternatively, the user may also use the descriptive documentation within this metadata along with the workflow described in the associated journal article to process the data to achieve the same results using newer packages or other software programs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract The dataset was derived by the Bioregional Assessment Programme from multiple datasets. The source dataset is identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement. The dataset consists of an excel spreadsheet and shapefile representing the locations of simulation nodes used in the AWRA-R model. Some of the nodes correspond to gauging station locations or dam …Show full descriptionAbstract The dataset was derived by the Bioregional Assessment Programme from multiple datasets. The source dataset is identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement. The dataset consists of an excel spreadsheet and shapefile representing the locations of simulation nodes used in the AWRA-R model. Some of the nodes correspond to gauging station locations or dam locations whereas other locations represent river confluences or catchment outlets which have no gauging. These are marked as "Dummy". Purpose Locations are used as pour points in oder to define reach areas for river system modelling. Dataset History Subset of data for the Hunter that was extracted from the Bureau of Meteorology's hydstra system and includes all gauges where data has been received from the lead water agency of each jurisdiction. Simulation nodes were added in locations in which the model will provide simulated streamflow. There are 3 files that have been extracted from the Hydstra database to aid in identifying sites in each bioregion and the type of data collected from each on. These data were used to determine the simulation node locations where model outputs were generated. The 3 files contained within the source dataset used for this determination are: Site - lists all sites available in Hydstra from data providers. The data provider is listed in the #Station as _xxx. For example, sites in NSW are _77, QLD are _66. Some sites do not have locational information and will not be able to be plotted. Period - the period table lists all the variables that are recorded at each site and the period of record. Variable - the variable table shows variable codes and names which can be linked to the period table. Relevant location information and other data were extracted to construct the spreadsheet and shapefile within this dataset. Dataset Citation Bioregional Assessment Programme (XXXX) HUN AWRA-R simulation nodes v01. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/fda20928-d486-49d2-b362-e860c1918b06. Dataset Ancestors Derived From National Surface Water sites Hydstra
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Subsetting of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024