62 datasets found

Data from: Regression with Empirical Variable Selection: Description of a...
plos.figshare.com
txt
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anne E. Goodenough; Adam G. Hart; Richard Stafford (2023). Regression with Empirical Variable Selection: Description of a New Method and Application to Ecological Datasets [Dataset]. http://doi.org/10.1371/journal.pone.0034338
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0034338
Dataset updated
Jun 8, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Anne E. Goodenough; Adam G. Hart; Richard Stafford
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset – habitat and offspring quality in the great tit (Parus major) – the optimal REVS model explained more variance (higher R2), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R2 values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of “core” variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines.
e
Subsetting
paper.erudition.co.in
html
Updated Dec 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Subsetting [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2023-2024/2/data-analysis-with-r/subsetting
Explore at:
htmlAvailable download formats
Dataset updated
Dec 2, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Subsetting of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
d
Data release for solar-sensor angle analysis subset associated with the...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data release for solar-sensor angle analysis subset associated with the journal article "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States" [Dataset]. https://catalog.data.gov/dataset/data-release-for-solar-sensor-angle-analysis-subset-associated-with-the-journal-article-so
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Western United States, United States
Description
This dataset provides geospatial location data and scripts used to analyze the relationship between MODIS-derived NDVI and solar and sensor angles in a pinyon-juniper ecosystem in Grand Canyon National Park. The data are provided in support of the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States". The data and scripts allow users to replicate, test, or further explore results. The file GrcaScpnModisCellCenters.csv contains locations (latitude-longitude) of all the 250-m MODIS (MOD09GQ) cell centers associated with the Grand Canyon pinyon-juniper ecosystem that the Southern Colorado Plateau Network (SCPN) is monitoring through its land surface phenology and integrated upland monitoring programs. The file SolarSensorAngles.csv contains MODIS angle measurements for the pixel at the phenocam location plus a random 100 point subset of pixels within the GRCA-PJ ecosystem. The script files (folder: 'Code') consist of 1) a Google Earth Engine (GEE) script used to download MODIS data through the GEE javascript interface, and 2) a script used to calculate derived variables and to test relationships between solar and sensor angles and NDVI using the statistical software package 'R'. The file Fig_8_NdviSolarSensor.JPG shows NDVI dependence on solar and sensor geometry demonstrated for both a single pixel/year and for multiple pixels over time. (Left) MODIS NDVI versus solar-to-sensor angle for the Grand Canyon phenocam location in 2018, the year for which there is corresponding phenocam data. (Right) Modeled r-squared values by year for 100 randomly selected MODIS pixels in the SCPN-monitored Grand Canyon pinyon-juniper ecosystem. The model for forward-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle. The model for back-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle + sensor zenith angle. Boxplots show interquartile ranges; whiskers extend to 10th and 90th percentiles. The horizontal line marking the average median value for forward-scatter r-squared (0.835) is nearly indistinguishable from the back-scatter line (0.833). The dataset folder also includes supplemental R-project and packrat files that allow the user to apply the workflow by opening a project that will use the same package versions used in this study (eg, .folders Rproj.user, and packrat, and files .RData, and PhenocamPR.Rproj). The empty folder GEE_DataAngles is included so that the user can save the data files from the Google Earth Engine scripts to this location, where they can then be incorporated into the r-processing scripts without needing to change folder names. To successfully use the packrat information to replicate the exact processing steps that were used, the user should refer to packrat documentation available at https://cran.r-project.org/web/packages/packrat/index.html and at https://www.rdocumentation.org/packages/packrat/versions/0.5.0. Alternatively, the user may also use the descriptive documentation phenopix package documentation, and description/references provided in the associated journal article to process the data to achieve the same results using newer packages or other software programs.
OpenML R Bot Benchmark Data (final subset)
figshare.com
application/gzip
Updated May 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Kühn; Philipp Probst; Janek Thomas; Bernd Bischl (2018). OpenML R Bot Benchmark Data (final subset) [Dataset]. http://doi.org/10.6084/m9.figshare.5882230.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5882230.v2
Dataset updated
May 18, 2018
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Daniel Kühn; Philipp Probst; Janek Thomas; Bernd Bischl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a clean subset of the data that was created by the OpenML R Bot that executed benchmark experiments on binary classification task of the OpenML100 benchmarking suite with six R algorithms: glmnet, rpart, kknn, svm, ranger and xgboost. The hyperparameters of these algorithms were drawn randomly. In total it contains more than 2.6 million benchmark experiments and can be used by other researchers. The subset was created by taking 500000 results of each learner (except of kknn for which only 1140 results are available). The csv-file for each learner is a table that for each benchmark experiment has a row that contains: OpenML-Data ID, hyperparameter values, performance measures (AUC, accuracy, brier score), runtime, scimark (runtime reference of the machine), and some meta features of the dataset.OpenMLRandomBotResults.RData (format for R) contains all data in seperate tables for the results, the hyperparameters, the meta features, the runtime, the scimark results and reference results.
SDSS Galaxy Subset
zenodo.org
application/gzip
Updated Sep 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nuno Ramos Carvalho; Nuno Ramos Carvalho (2022). SDSS Galaxy Subset [Dataset]. http://doi.org/10.5281/zenodo.6696565
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6696565
Dataset updated
Sep 5, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nuno Ramos Carvalho; Nuno Ramos Carvalho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Sloan Digital Sky Survey (SDSS) is a comprehensive survey of the northern sky. This dataset contains a subset of this survey, of 60247 objects classified as galaxies, it includes a CSV file with a collection of information and a set of files for each object, namely JPG image files, FITS and spectra data. This dataset is used to train and explore the astromlp-models collection of deep learning models for galaxies characterisation.

The dataset includes a CSV data file where each row is an object from the SDSS database, and with the following columns (note that some data may not be available for all objects):

objid: unique SDSS object identifier

mjd: MJD of observation

plate: plate identifier

tile: tile identifier

fiberid: fiber identifier

run: run number

rerun: rerun number

camcol: camera column

field: field number

ra: right ascension

dec: declination

class: spectroscopic class (only objetcs with GALAXY are included)

subclass: spectroscopic subclass

modelMag_u: better of DeV/Exp magnitude fit for band u

modelMag_g: better of DeV/Exp magnitude fit for band g

modelMag_r: better of DeV/Exp magnitude fit for band r

modelMag_i: better of DeV/Exp magnitude fit for band i

modelMag_z: better of DeV/Exp magnitude fit for band z

redshift: final redshift from SDSS data z

stellarmass: stellar mass extracted from the eBOSS Firefly catalog

w1mag: WISE W1 "standard" aperture magnitude

w2mag: WISE W2 "standard" aperture magnitude

w3mag: WISE W3 "standard" aperture magnitude

w4mag: WISE W4 "standard" aperture magnitude

gz2c_f: Galaxy Zoo 2 classification from Willett et al 2013

gz2c_s: simplified version of Galaxy Zoo 2 classification (labels set)

Besides the CSV file a set of directories are included in the dataset, in each directory you'll find a list of files named after the objid column from the CSV file, with the corresponding data, the following directories tree is available:

sdss-gs/ ├── data.csv ├── fits ├── img ├── spectra └── ssel

Where, each directory contains:

img: RGB images from the object in JPEG format, 150x150 pixels, generated using the SkyServer DR16 API

fits: FITS data subsets around the object across the u, g, r, i, z bands; cut is done using the ImageCutter library

spectra: full best fit spectra data from SDSS between 4000 and 9000 wavelengths

ssel: best fit spectra data from SDSS for specific selected intervals of wavelengths discussed by Sánchez Almeida 2010

Changelog

v0.0.3 - Increase number of objects to ~80k.

v0.0.2 - Increase number of objects to ~60k.

v0.0.1 - Initial import.
Data from: Effects of nutrient enrichment on freshwater macrophyte and...
zenodo.org
Updated Dec 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Floris K. Neijnens; Floris K. Neijnens; Hadassa Moreira; Hadassa Moreira; Melinda M.J. De Jonge; Melinda M.J. De Jonge; Bart B.H.P. Linssen; Mark A.J. Huijbregts; Mark A.J. Huijbregts; Gertjan W. Geerling; Gertjan W. Geerling; Aafke M. Schipper; Aafke M. Schipper; Bart B.H.P. Linssen (2023). Effects of nutrient enrichment on freshwater macrophyte and invertebrate abundance: A meta-analysis [Dataset]. http://doi.org/10.5281/zenodo.10372444
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10372444
Dataset updated
Dec 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Floris K. Neijnens; Floris K. Neijnens; Hadassa Moreira; Hadassa Moreira; Melinda M.J. De Jonge; Melinda M.J. De Jonge; Bart B.H.P. Linssen; Mark A.J. Huijbregts; Mark A.J. Huijbregts; Gertjan W. Geerling; Gertjan W. Geerling; Aafke M. Schipper; Aafke M. Schipper; Bart B.H.P. Linssen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The zip-file contains the data and code accompanying the paper 'Effects of nutrient enrichment on freshwater macrophyte and invertebrate abundance: A meta-analysis'. Together, these files should allow for the replication of the results.

The 'raw_data' folder contains the 'MA_database.csv' file, which contains the extracted data from all primary studies that are used in the analysis. Furthermore, this folder contains the file 'MA_database_description.txt', which gives a description of each data column in the database.

The 'derived_data' folder contains the files that are produced by the R-scripts in this study and used for data analysis. The 'MA_database_processed.csv' and 'MA_database_processed.RData' files contain the converted raw database that is suitable for analysis. The 'DB_IA_subsets.RData' file contains the 'Individual Abundance' (IA) data subsets based on taxonomic group (invertebrates/macrophytes) and inclusion criteria. The 'DB_IA_VCV_matrices.RData' contains for all IA data subsets the variance-covariance (VCV) matrices. The 'DB_AM_subsets.RData' file contains the 'Total Abundance' (TA) and 'Mean Abundance' (MA) data subsets based on taxonomic group (invertebrates/macrophytes) and inclusion criteria.

The 'output_data' folder contains maps with the output data for each data subset (i.e. for each metric, taxonomic group and set of inclusion criteria). For each data subset, the map contains random effects selection results ('Results1_REsel_

The 'scripts' folder contains all R-scripts that we used for this study. The 'PrepareData.R' script takes the database as input and adjusts the file so that it can be used for data analysis. The 'PrepareDataIA.R' and 'PrepareDataAM.R' scripts make subsets of the data and prepare the data for the meta-regression analysis and mixed-effects regression analysis, respectively. The regression analyses are performed in the 'SelectModelsIA.R' and 'SelectModelsAM.R' scripts to calculate the regression model results for the IA metric and MA/TA metrics, respectively. These scripts require the 'RandomAndFixedEffects.R' script, containing the random and fixed effects parameter combinations, as well as the 'Functions.R' script. The 'CreateMap.R' script creates a global map with the location of all studies included in the analysis (figure 1 in the paper). The 'CreateForestPlots.R' script creates plots showing the IA data distribution for both taxonomic groups (figure 2 in the paper). The 'CreateHeatMaps.R' script creates heat maps for all metrics and taxonomic groups (figure 3 in the paper, figures S11.1 and S11.2 in the appendix). The 'CalculateStatistics.R' script calculates the descriptive statistics that are reported throughout the paper, and creates the figures that describe the dataset characteristics (figures S3.1 to S3.5 in the appendix). The 'CreateFunnelPlots.R' script creates the funnel plots for both taxonomic groups (figures S6.1 and S6.2 in the appendix) and performs Egger's tests. The 'CreateControlGraphs.R' script creates graphs showing the dependency of the nutrient response to control concentrations for all metrics and taxonomic groups (figures S10.1 and S10.2 in the appendix).

The 'figures' folder contains all figures that are included in this study.
Source Code - Characterizing Variability and Uncertainty for Parameter...
catalog.data.gov
s.cnmilf.com
Updated May 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2025). Source Code - Characterizing Variability and Uncertainty for Parameter Subset Selection in PBPK Models [Dataset]. https://catalog.data.gov/dataset/source-code-characterizing-variability-and-uncertainty-for-parameter-subset-selection-in-p
Explore at:
Dataset updated
May 1, 2025
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Source Code for the manuscript "Characterizing Variability and Uncertainty for Parameter Subset Selection in PBPK Models" -- This R code generates the results presented in this manuscript; the zip folder contains PBPK model files (for chloroform and DCM) and corresponding scripts to compile the models, generate human equivalent doses, and run sensitivity analysis.
Data Mining Project - Boston
kaggle.com
zip
Updated Nov 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SophieLiu (2019). Data Mining Project - Boston [Dataset]. https://www.kaggle.com/sliu65/data-mining-project-boston
Explore at:
zip(59313797 bytes)Available download formats
Dataset updated
Nov 25, 2019
Authors
SophieLiu
Area covered
Boston
Description
Context

To make this a seamless process, I cleaned the data and delete many variables that I thought were not important to our dataset. I then uploaded all of those files to Kaggle for each of you to download. The rideshare_data has both lyft and uber but it is still a cleaned version from the dataset we downloaded from Kaggle.

Use of Data Files

You can easily subset the data into the car types that you will be modeling by first loading the csv into R, here is the code for how you do this:

This loads the file into R

df<-read.csv('uber.csv')

The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

df_black<-subset(uber_df, uber_df$name == 'Black')

This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.

write.csv(df_black, "nameofthefileyouwanttosaveas.csv")

The file will appear in you working directory. If you are not familiar with your working directory. Run this code:

getwd()

The output will be the file path to your working directory. You will find the file you just created in that folder.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
ECG Chagas Disease [Balanced]
kaggle.com
zip
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matteo Fasulo (2025). ECG Chagas Disease [Balanced] [Dataset]. https://www.kaggle.com/datasets/matteofasuloo/code15-ecg-chagas-balanced/code
Explore at:
zip(741625662 bytes)Available download formats
Dataset updated
Feb 3, 2025
Authors
Matteo Fasulo
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This code is not mine. The dataset provided here is a balanced subset derived from the original dataset, and I do not claim ownership over the original data.

The CODE dataset was collected by the Telehealth Network of Minas Gerais (TNMG) in the period between 2010 and 2016. TNMG is a public telehealth system assisting 811 out of the 853 municipalities in the state of Minas Gerais, Brazil.

The CODE 15% dataset is obtained from stratified sampling from the CODE dataset. This subset of the CODE dataset is described in and used for assessing model performance:

"Deep neural network estimated electrocardiographic-age as a mortality predictor"
Emilly M Lima, Antônio H Ribeiro, Gabriela MM Paixão, Manoel Horta Ribeiro, Marcelo M Pinto Filho, Paulo R Gomes, Derick M Oliveira, Ester C Sabino, Bruce B Duncan, Luana Giatti, Sandhi M Barreto, Wagner Meira Jr, Thomas B Schön, Antonio Luiz P Ribeiro. MedRXiv (2021) https://www.doi.org/10.1101/2021.02.19.21251232

This dataset is a subset of the CODE 15% dataset obtained by random sampling from the negative class while maintaining all the observations of the positive class to create a balanced dataset without the need to focus on class imbalance.

The code15_hdf5 folder contains the exams and labels for the entire CODE 15% dataset. The code15_wfdb folder contains the exam records file in .dat format.

An additional file (signals_features.csv) is provided, containing handcrafted features from the ECG records (lead II) related to P, Q, R, S, and T waves. Features such as P wave duration, PR interval, PR segment, QRS duration, ST segment, and ST slope were computed by first extracting all the points using the neurokit2 Python library and then aggregated for each record ID using descriptive statistics. Heart rate variability features were also included along with the P, Q, R, S, and T waves.

Link to the original dataset: https://doi.org/10.5281/zenodo.4916206
e
Loop Functions
paper.erudition.co.in
html
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Loop Functions [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2023-2024/2/data-analysis-with-r/subsetting
Explore at:
htmlAvailable download formats
Dataset updated
Dec 2, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Loop Functions of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
d
Data release for winter peak extent analysis subset, 2003-2018, associated...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data release for winter peak extent analysis subset, 2003-2018, associated with the journal article "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States" [Dataset]. https://catalog.data.gov/dataset/data-release-for-winter-peak-extent-analysis-subset-2003-2018-associated-with-the-journal-
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Western United States, United States
Description
This dataset is provided in support of the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States". The data and code provided allow users to replicate, test, or further explore results. The dataset includes 2 raster datasets (folder:Rasters): 1) 'cntWinterPks2003_2018DR' provides a count of years with winter peaks from 2003-2018 in an 11-state area in the western United States. 2) 'VegClassGte5_2003_2018' raster, within the zip file 'WinterPeaksVegTypes.zip' identifies the broad vegetation types for locations with common winter peaks (5 or more years out of 16). The dataset also includes Google Earth Engine and R code files used to create the datasets. Additional files/folders provided include 1) Google Earth Engine scripts used to download MODIS data the GEE - javascript interface (folder: 'Code'). 2) Scripts used to manipulate rasters and to calculate and map the occurrence winter NDVI peaks from 2003-2018 using the statistical software package 'R'. 3) Supplemental R-project and packrat files that allow the user to apply the workflow by opening a project that will use the same package versions used in this study, for example the folders 'Rproj.user', and 'packrat', and files '.RData', and 'WinterPeakExtentPR.Rproj'. 4) Empty folders ('GEE_DataAnnPeak', 'GEE_DataLoose', and 'GEE_DataStrict') that should be used to contain the output from the GEE code files as follows: 'GEE_DataAnnPeak' should contain output from the S3 and S4 scripts, 'GEE_DataLoose' should contain output from the S1 script, and 'GEE_DataStrict' should contain output from the S2 script. 5) Graphic file 'Fig_9_MapsOfExtentPortrait2.jpg' shows temporal and ecosystem distribution of winter NDVI peaks in the western continental US, 2003 to 2018, derived from MODIS MCD43A4 product. TOP: Number of years with winter peaks in areas that meet defined thresholds for biomass (median annual peak NDVI >= 0.15) and temperature (mean December minimum daily temperature <= 0°C). BOTTOM: Predominant LANDFIRE Existing Vegetation Type physiognomy (i.e., mode of each 500-m MODIS pixel) in areas with >= 5 years of winter peaks. Present in lesser proportions but not identified on the map for legibility reasons are conifer-hardwood, exotics, riparian, and sparsely vegetated physiognomic categories as well as non-natural/non-terrestrial ecosystem categories. State abbreviations are AZ (Arizona), CA (California), CO (Colorado), ID (Idaho), MT (Montana), NV (Nevada), NM (New Mexico), OR (Oregon), WA (Washington), and WY (Wyoming). The final steps of overlaying common winter peak extent data on the Landfire data were done using ArcGIS and the publicly available Landfire dataset (see source datasets section of metadata and process steps). To successfully use the packrat information to replicate the exact processing steps that were used, the user should refer to packrat documentation available at https://cran.r-project.org/web/packages/packrat/index.html and at https://www.rdocumentation.org/packages/packrat/versions/0.5.0. Alternatively, the user may also use the descriptive documentation within this metadata along with the workflow described in the associated journal article to process the data to achieve the same results using newer packages or other software programs.
Computational time (in seconds) for the four combining methods.
plos.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexey Miroshnikov; Erin M. Conlon (2023). Computational time (in seconds) for the four combining methods. [Dataset]. http://doi.org/10.1371/journal.pone.0108425.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0108425.t002
Dataset updated
Jun 4, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Alexey Miroshnikov; Erin M. Conlon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Computational times, in seconds (rounded unless less than 1 second), for the four methods of the R package parallelMCMCcombine, using simulation data and T = 50,000 MCMC samples. The values in parentheses are for our example data sets; d = 2, M = 5 is for the Gamma model, and d = 5, M = 10 is for the logistic model. The results are based on a computer with operating system Windows 7 and an Intel Celeron 1007U CPU 1.5 GHz Processor.Computational time (in seconds) for the four combining methods.
e
Getting started, Background
paper.erudition.co.in
html
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Getting started, Background [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2023-2024/2/data-analysis-with-r/subsetting
Explore at:
htmlAvailable download formats
Dataset updated
Dec 2, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Getting started, Background of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
e
Simulation
paper.erudition.co.in
html
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Simulation [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2023-2024/2/data-analysis-with-r/subsetting
Explore at:
htmlAvailable download formats
Dataset updated
Dec 2, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Simulation of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
d
Data release for phenocam analysis subset associated with the journal...
catalog.data.gov
data.usgs.gov
+1more
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data release for phenocam analysis subset associated with the journal article "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States" [Dataset]. https://catalog.data.gov/dataset/data-release-for-phenocam-analysis-subset-associated-with-the-journal-article-solar-and-se
Explore at:
Dataset updated
Oct 1, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This dataset provides calculated camera-NDVI data for individual regions-of-interest (ROI's) for the phenocam named 'GRCA1PJ' (part of the Phenocam Network, https://phenocam.sr.unh.edu/webcam/). The GRCA1PJ phenocam is within a pinyon-juniper woodland in Grand Canyon National Park. Camera-NDVI refers to a modified version of NDVI calculated by the phenopix package (Filippa et al., 2016). The camera-calculated NDVI data are in the folder FinalOutput. File attributes within that folder are described in detail in the entity and attribute information section of this metadata. It should be possible for the user to use only the ROI definitions, image data downloaded from the phenocam network, and the phenopix R-package to reproduce the final NDVI dataset. However, the dataset also contains scripts and intermediate files that may be helpful in reproducing or extending the processing, but are not essential to reproducing the data. The complete dataset release includes 1) A workflow spreadsheet file that describes the processing steps, associated scripts, and output filenames (filename:Workflow_With_Filenames.ods). 2) R-code script files used in processing (folder:'Code'). 3) ROI boundary files and jpg images for the ROIs presented in the linked publication. (folder:"Phenocamdata/grca1pj/ROI") 4) Ancillary files used to create the NDVI dataset; these include exposure coordinates and training files (folder:'Phenocamdata/grca1pj/Ancillary'). 5) Files listing exposures for individual photos within the initial processing time period (folder:'Exposures'). 6) Screening parameters for cloud and poor-light-condition screening of photos, as well as a list of photos that meet the cloud-screening standards (folder:'Phenocamdata/grca1pj/BlueSkyScreening'). 7) Vegetation index files produced by the phenopix package, organized by ROI and month-year group (folder:"Phenocamdata/grca1pj/VI_Tables"). 8) Supplemental R-project and packrat files that allow the user to apply the workflow by opening a project that will use the same package versions used in this study (eg, .folders Rproj.user, and packrat, and files .RData, and PhenocamPR.Rproj). 9) The graphic 'Fig_4_ROIWithLabels.jpg' shows the phenocam field of view with labelled ROIs. outline colors correspond to juniper (red), pinyon (blue), 238 and other species (yellow). Labels correspond to NDVI curves in 'Fig_7_PhenocamCurves.JPG', (also included in this data release). The composite area comprises the field of view beneath the approximate horizon line labelled ‘J’ (gray). This image corresponds to Figure 4 in the associated journal article. 10) The graphic 'Fig_7_PhenocamCurves.JPG' shows NDVI curves derived from phenocam images from September 2017 - December 2018 for individual regions of interest (ROIs). Letter designations correspond to ROI labels in Fig_4_ROIWithLabels.jpg (also included in this data release). Data were screened to remove cloudy photos during Aqua and Terra flyover hours. Black ellipses indicate times when the ROI target vegetation was shaded. Red ellipses indicate times when the background of the ROI was shaded. To improve visibility, the Y axis is restricted and excludes 37 extreme values out of a total of 6698 values. The exposure adjustment method used by the phenopix package produces NDVI values that have a strong linear correlation with spectroradiometer-derived NDVI but are negatively shifted so that vegetated areas often have NDVI values below zero. This image corresponds to Figure 7 in the associated journal article. The file types .Rdata or .rds are commonly used in this release because these are the types created by the phenopix processing package, and these files will be needed (or the user will need to recreate new versions) for further processing. The scripts enable the user to replicate processing or to extend it to different times or areas of interest; however, these scripts require as additional input phenocam imagery that the user must download. To successfully use the packrat information to replicate the exact processing steps that were used, the user should refer to packrat documentation available at https://cran.r-project.org/web/packages/packrat/index.html and at https://www.rdocumentation.org/packages/packrat/versions/0.5.0. Alternatively, the user may also use the descriptive documentation phenopix package documentation, and description/references provided in the associated journal article to process the data to achieve the same results using newer packages or other software programs. Species-specific phenological curves included in the NDVI output section this dataset: Juniperus osteosperma, Pinus edulis, Purshia stansburiana, Artemisia tridentata, and Chamaebatiaria millefolium
I
Self-citation analysis data based on PubMed Central subset (2002-2005)
databank.illinois.edu
aws-databank-alb.library.illinois.edu
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shubhanshu Mishra; Brent D Fegley; Jana Diesner; Vetle I. Torvik, Self-citation analysis data based on PubMed Central subset (2002-2005) [Dataset]. http://doi.org/10.13012/B2IDB-9665377_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-9665377_V1
Authors
Shubhanshu Mishra; Brent D Fegley; Jana Diesner; Vetle I. Torvik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
U.S. National Science Foundation (NSF)
U.S. National Institutes of Health (NIH)
Description
Self-citation analysis data based on PubMed Central subset (2002-2005) ---------------------------------------------------------------------- Created by Shubhanshu Mishra, Brent D. Fegley, Jana Diesner, and Vetle Torvik on April 5th, 2018 ## Introduction This is a dataset created as part of the publication titled: Mishra S, Fegley BD, Diesner J, Torvik VI (2018) Self-Citation is the Hallmark of Productive Authors, of Any Gender. PLOS ONE. It contains files for running the self citation analysis on articles published in PubMed Central between 2002 and 2005, collected in 2015. The dataset is distributed in the form of the following tab separated text files: * Training_data_2002_2005_pmc_pair_First.txt (1.2G) - Data for first authors * Training_data_2002_2005_pmc_pair_Last.txt (1.2G) - Data for last authors * Training_data_2002_2005_pmc_pair_Middle_2nd.txt (964M) - Data for middle 2nd authors * Training_data_2002_2005_pmc_pair_txt.header.txt - Header for the data * COLUMNS_DESC.txt file - Descriptions of all columns * model_text_files.tar.gz - Text files containing model coefficients and scores for model selection. * results_all_model.tar.gz - Model coefficient and result files in numpy format used for plotting purposes. v4.reviewer contains models for analysis done after reviewer comments. * README.txt file ## Dataset creation Our experiments relied on data from multiple sources including properitery data from Thompson Rueter's (now Clarivate Analytics) Web of Science collection of MEDLINE citations. Author's interested in reproducing our experiments should personally request from Clarivate Analytics for this data. However, we do make a similar but open dataset based on citations from PubMed Central which can be utilized to get similar results to those reported in our analysis. Furthermore, we have also freely shared our datasets which can be used along with the citation datasets from Clarivate Analytics, to re-create the datased used in our experiments. These datasets are listed below. If you wish to use any of those datasets please make sure you cite both the dataset as well as the paper introducing the dataset. * MEDLINE 2015 baseline: https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html * Citation data from PubMed Central (original paper includes additional citations from Web of Science) * Author-ity 2009 dataset: - Dataset citation: Torvik, Vetle I.; Smalheiser, Neil R. (2018): Author-ity 2009 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4222651_V1 - Paper citation: Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3), 1–29. https://doi.org/10.1145/1552303.1552304 - Paper citation: Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2004). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140–158. https://doi.org/10.1002/asi.20105 * Genni 2.0 + Ethnea for identifying author gender and ethnicity: - Dataset citation: Torvik, Vetle (2018): Genni + Ethnea for the Author-ity 2009 dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9087546_V1 - Paper citation: Smith, B. N., Singh, M., & Torvik, V. I. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries - JCDL ’13. ACM Press. https://doi.org/10.1145/2467696.2467720 - Paper citation: Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington DC, USA. http://hdl.handle.net/2142/88927 * MapAffil for identifying article country of affiliation: - Dataset citation: Torvik, Vetle I. (2018): MapAffil 2016 dataset -- PubMed author affiliations mapped to cities and their geocodes worldwide. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4354331_V1 - Paper citation: Torvik VI. MapAffil: A Bibliographic Tool for Mapping Author Affiliation Strings to Cities and Their Geocodes Worldwide. D-Lib magazine : the magazine of the Digital Library Forum. 2015;21(11-12):10.1045/november2015-torvik * IMPLICIT journal similarity: - Dataset citation: Torvik, Vetle (2018): Author-implicit journal, MeSH, title-word, and affiliation-word pairs based on Author-ity 2009. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4742014_V1 * Novelty dataset for identify article level novelty: - Dataset citation: Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1 - Paper citation: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : The Magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra - Code: https://github.com/napsternxg/Novelty * Expertise dataset for identifying author expertise on articles: * Source code provided at: https://github.com/napsternxg/PubMed_SelfCitationAnalysis Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check here for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions Additional data related updates can be found at Torvik Research Group ## Acknowledgments This work was made possible in part with funding to VIT from NIH grant P01AG039347 and NSF grant 1348742. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. ## License Self-citation analysis data based on PubMed Central subset (2002-2005) by Shubhanshu Mishra, Brent D. Fegley, Jana Diesner, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at https://github.com/napsternxg/PubMed_SelfCitationAnalysis.
e
Control structures
paper.erudition.co.in
html
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Control structures [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2023-2024/2/data-analysis-with-r/subsetting
Explore at:
htmlAvailable download formats
Dataset updated
Dec 2, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Control structures of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
f
Data from: [Dataset:] Data from Tree Censuses and Inventories in Panama
smithsonian.figshare.com
zip
Updated Apr 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Condit; Rolando Pẽrez; Salomõn Aguilar; Suzanne Lao (2024). [Dataset:] Data from Tree Censuses and Inventories in Panama [Dataset]. http://doi.org/10.5479/data.stri.2016.0622
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5479/data.stri.2016.0622
Dataset updated
Apr 18, 2024
Dataset provided by
Smithsonian Tropical Research Institute
Authors
Richard Condit; Rolando Pẽrez; Salomõn Aguilar; Suzanne Lao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Panama
Description
Abstract: These are results from a network of 65 tree census plots in Panama. At each, every individual stem in a rectangular area of specified size is given a unique number and identified to species, then stem diameter measured in one or more censuses. Data from these numerous plots and inventories were collected following the same methods as, and species identity harmonized with, the 50-ha long-term tree census at Barro Colorado Island. Precise location of every site, elevation, and estimated rainfall (for many sites) are also included. These data were gathered over many years, starting in 1994 and continuing to the present, by principal investigators R. Condit, R. Perez, S. Lao, and S. Aguilar. Funding has been provided by many organizations.Description:marenaRecent.full.Rdata5Jan2013.zip: A zip archive holding one R Analytical Table, a version of the Marena plots' census data in R format, designed for data analysis. This and all other tables labelled 'full' have one record per individual tree found in that census. Detailed documentations of the 'full' tables is given in RoutputFull.pdf (see component 10 below); an additional column 'plot' is included because the table includes records from many different locations. Plot coordinates are given in PanamaPlot.txt (component 12 below). This one file, 'marenaRecent.full1.rdata', has data from the latest census at 60 different plots. These are the best data to use if only a single plot census is needed. marena2cns.full.Rdata5Jan2013.zip: R Analytical Tables of the style 'full' for 44 plots with two censuses: 'marena2cns.full1.rdata' for the first census and 'marena2cns.full2.rdata' for the second census. These 44 plots are a subset of the 60 found in marenaRecent.full (component 1): the 44 that have been censused two or more times. These are the best data to use if two plot censuses are needed. marena3cns.full.Rdata5Jan2013.zip. R Analytical Tables of the style 'full' for nine plots with three censuses: 'marena3cns.full1.rdata' for the first census through 'marena2cns.full3.rdata' for the third census. These nine plots are a subset of the 44 found in marena2cns.full (component 2): the nine that have been censused three or more times. These are the best data to use if three plot censuses are needed. marena4cns.full.Rdata5Jan2013.zip. R Analytical Tables of the style 'full' for six plots with four censuses: 'marena4cns.full1.rdata' for the first census through 'marena4cns.full4.rdata' for the fourth census. These six plots are a subset of the nine found in marena3cns.full (component 3): the six that have been censused four or more times. These are the best data to use if four plot censuses are needed. marenaRecent.stem.Rdata5Jan2013.zip. A zip archive holding one R Analytical Table, a version of the Marena plots' census data in R format. These are designed for data analysis. This one file, 'marenaRecent.full1.rdata', has data from the latest census at 60 different plots. The table has one record per individual stem, necessary because some individual trees have more than one stem. Detailed documentations of these tables is given in RoutputFull.pdf (see component 11 below); an additional column 'plot' is included because the table includes records from many different locations. Plot coordinates are given in PanamaPlot.txt (component 12 below). These are the best data to use if only a single plot census is needed, and individual stems are desired. marena2cns.stem.Rdata5Jan2013.zip. R Analytical Tables of the style 'stem' for 44 plots with two censuses: 'marena2cns.stem1.rdata' for the first census and 'marena3cns.stem2.rdata' for the second census. These 44 plots are a subset of the 60 found in marenaRecent.stem (component 1): the 44 that have been censused two or more times. These are the best data to use if two plot censuses are needed, and individual stems are desired. marena3cns.stem.Rdata5Jan2013.zip. R Analytical Tables of the style 'stem' for nine plots with three censuses: 'marena3cns.stem1.rdata' for the first census through 'marena3cns.stem3.rdata' for the third census. These nine plots are a subset of the 44 found in marena2cns.stem (component 6): the nine that have been censused three or more times. These are the best data to use if three plot censuses are needed, and individual stems are desired. marena4cns.stem.Rdata5Jan2013.zip. R Analytical Tables of the style 'stem' for six plots with four censuses: 'marena3cns.stem1.rdata' for the first census through 'marena3cns.stem3.rdata' for the third census. These six plots are a subset of the nine found in marena3cns.stem (component 7): the six that have been censused four or more times. These are the best data to use if four plot censuses are needed, and individual stems are desired. bci.spptable.rdata. A list of the 1414 species found across all tree plots and inventories in Panama, in R format. The column 'sp' in this table is a code identifying the species in the full census tables (marena.full and marena.stem, components 1-4 and 5-8 above). RoutputFull.pdf: Detailed documentation of the 'full' tables in Rdata format (components 1-4 above). RoutputStem.pdf: Detailed documentation of the 'stem' tables in Rdata format (component 5-8 above). PanamaPlot.txt: Locations of all tree plots and inventories in Panama.
e
Scoping Rules
paper.erudition.co.in
html
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Scoping Rules [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2023-2024/2/data-analysis-with-r/subsetting
Explore at:
htmlAvailable download formats
Dataset updated
Dec 2, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Scoping Rules of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
Appendix S1 - parallelMCMCcombine: An R Package for Bayesian Methods for Big...
plos.figshare.com
doc
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexey Miroshnikov; Erin M. Conlon (2023). Appendix S1 - parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics [Dataset]. http://doi.org/10.1371/journal.pone.0108425.s001
Explore at:
docAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0108425.s001
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Alexey Miroshnikov; Erin M. Conlon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Remarks on kernels and bandwidth selection for semiparametric density product estimator method. (DOC)

Facebook

Twitter

Click to copy link

Link copied

Cite

Anne E. Goodenough; Adam G. Hart; Richard Stafford (2023). Regression with Empirical Variable Selection: Description of a New Method and Application to Ecological Datasets [Dataset]. http://doi.org/10.1371/journal.pone.0034338

Data from: Regression with Empirical Variable Selection: Description of a New Method and Application to Ecological Datasets

Explore at:

38 scholarly articles cite this dataset (View in Google Scholar)

txtAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0034338

Dataset updated

Jun 8, 2023

Dataset provided by

PLOShttp://plos.org/

Authors

Anne E. Goodenough; Adam G. Hart; Richard Stafford

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike's Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset – habitat and offspring quality in the great tit (Parus major) – the optimal REVS model explained more variance (higher R2), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R2 values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of “core” variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines.

Clear search

Close search

Google apps

Main menu

Data from: Regression with Empirical Variable Selection: Description of a...

Subsetting

Data release for solar-sensor angle analysis subset associated with the...

OpenML R Bot Benchmark Data (final subset)

SDSS Galaxy Subset

Data from: Effects of nutrient enrichment on freshwater macrophyte and...

Source Code - Characterizing Variability and Uncertainty for Parameter...

Data Mining Project - Boston

Context

Use of Data Files

This loads the file into R

The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.

The file will appear in you working directory. If you are not familiar with your working directory. Run this code:

The output will be the file path to your working directory. You will find the file you just created in that folder.

Inspiration

ECG Chagas Disease [Balanced]

Loop Functions

Data release for winter peak extent analysis subset, 2003-2018, associated...

Computational time (in seconds) for the four combining methods.

Getting started, Background

Simulation

Data release for phenocam analysis subset associated with the journal...

Self-citation analysis data based on PubMed Central subset (2002-2005)

Control structures

Data from: [Dataset:] Data from Tree Censuses and Inventories in Panama

Scoping Rules

Appendix S1 - parallelMCMCcombine: An R Package for Bayesian Methods for Big...

Data from: Regression with Empirical Variable Selection: Description of a New Method and Application to Ecological Datasets