39 datasets found

C
GIS Final Project
data.cityofchicago.org
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chicago Police Department (2025). GIS Final Project [Dataset]. https://data.cityofchicago.org/Public-Safety/GIS-Final-Project/8n2i-4jmi
Explore at:
application/rdfxml, csv, tsv, xml, application/rssxml, kmz, application/geo+json, kmlAvailable download formats
Dataset updated
Mar 26, 2025
Authors
Chicago Police Department
Description
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e
H
Replication Data for: Data Analysis Class Final Paper
dataverse.harvard.edu
Updated Sep 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vinícius Silva Santana (2019). Replication Data for: Data Analysis Class Final Paper [Dataset]. http://doi.org/10.7910/DVN/8DQFXQ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/8DQFXQ
Dataset updated
Sep 3, 2019
Dataset provided by
Harvard Dataverse
Authors
Vinícius Silva Santana
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
UFPE Data Analysis Final Paper R data
e
Subsetting
paper.erudition.co.in
html
Updated Mar 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Subsetting [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2023-2024/2/data-analysis-with-r/subsetting
Explore at:
htmlAvailable download formats
Dataset updated
Mar 17, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Subsetting of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...
data.niaid.nih.gov
zenodo.org
+1more
zip
Updated Dec 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies [Dataset]. https://data.niaid.nih.gov/resources?id=dryad_w3r2280w0
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.w3r2280w0
Dataset updated
Dec 7, 2023
Dataset provided by
HIV Vaccine Trials Networkhttp://www.hvtn.org/
HIV Prevention Trials Networkhttp://www.hptn.org/
National Institute of Allergy and Infectious Diseaseshttp://www.niaid.nih.gov/
PEPFAR
Authors
Dylan Westfall; Mullins James
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence PCR amplicons derived from cDNA templates tagged with universal molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR and the use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Handling of the large datasets produced from SMRT-UMI sequencing was facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline), that automatically filters and parses reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination or early cycle PCR errors, resulting in highly accurate sequence datasets. The optimized SMRT-UMI sequencing method presented here represents a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus (HIV) quasispecies. Methods This serves as an overview of the analysis performed on PacBio sequence data that is summarized in Analysis Flowchart.pdf and was used as primary data for the paper by Westfall et al. "Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies" Five different PacBio sequencing datasets were used for this analysis: M027, M2199, M1567, M004, and M005 For the datasets which were indexed (M027, M2199), CCS reads from PacBio sequencing files and the chunked_demux_config files were used as input for the chunked_demux pipeline. Each config file lists the different Index primers added during PCR to each sample. The pipeline produces one fastq file for each Index primer combination in the config. For example, in dataset M027 there were 3–4 samples using each Index combination. The fastq files from each demultiplexed read set were moved to the sUMI_dUMI_comparison pipeline fastq folder for further demultiplexing by sample and consensus generation with that pipeline. More information about the chunked_demux pipeline can be found in the README.md file on GitHub. The demultiplexed read collections from the chunked_demux pipeline or CCS read files from datasets which were not indexed (M1567, M004, M005) were each used as input for the sUMI_dUMI_comparison pipeline along with each dataset's config file. Each config file contains the primer sequences for each sample (including the sample ID block in the cDNA primer) and further demultiplexes the reads to prepare data tables summarizing all of the UMI sequences and counts for each family (tagged.tar.gz) as well as consensus sequences from each sUMI and rank 1 dUMI family (consensus.tar.gz). More information about the sUMI_dUMI_comparison pipeline can be found in the paper and the README.md file on GitHub. The consensus.tar.gz and tagged.tar.gz files were moved from sUMI_dUMI_comparison pipeline directory on the server to the Pipeline_Outputs folder in this analysis directory for each dataset and appended with the dataset name (e.g. consensus_M027.tar.gz). Also in this analysis directory is a Sample_Info_Table.csv containing information about how each of the samples was prepared, such as purification methods and number of PCRs. There are also three other folders: Sequence_Analysis, Indentifying_Recombinant_Reads, and Figures. Each has an .Rmd file with the same name inside which is used to collect, summarize, and analyze the data. All of these collections of code were written and executed in RStudio to track notes and summarize results. Sequence_Analysis.Rmd has instructions to decompress all of the consensus.tar.gz files, combine them, and create two fasta files, one with all sUMI and one with all dUMI sequences. Using these as input, two data tables were created, that summarize all sequences and read counts for each sample that pass various criteria. These are used to help create Table 2 and as input for Indentifying_Recombinant_Reads.Rmd and Figures.Rmd. Next, 2 fasta files containing all of the rank 1 dUMI sequences and the matching sUMI sequences were created. These were used as input for the python script compare_seqs.py which identifies any matched sequences that are different between sUMI and dUMI read collections. This information was also used to help create Table 2. Finally, to populate the table with the number of sequences and bases in each sequence subset of interest, different sequence collections were saved and viewed in the Geneious program. To investigate the cause of sequences where the sUMI and dUMI sequences do not match, tagged.tar.gz was decompressed and for each family with discordant sUMI and dUMI sequences the reads from the UMI1_keeping directory were aligned using geneious. Reads from dUMI families failing the 0.7 filter were also aligned in Genious. The uncompressed tagged folder was then removed to save space. These read collections contain all of the reads in a UMI1 family and still include the UMI2 sequence. By examining the alignment and specifically the UMI2 sequences, the site of the discordance and its case were identified for each family as described in the paper. These alignments were saved as "Sequence Alignments.geneious". The counts of how many families were the result of PCR recombination were used in the body of the paper. Using Identifying_Recombinant_Reads.Rmd, the dUMI_ranked.csv file from each sample was extracted from all of the tagged.tar.gz files, combined and used as input to create a single dataset containing all UMI information from all samples. This file dUMI_df.csv was used as input for Figures.Rmd. Figures.Rmd used dUMI_df.csv, sequence_counts.csv, and read_counts.csv as input to create draft figures and then individual datasets for eachFigure. These were copied into Prism software to create the final figures for the paper.
Data from: Algorithms for Quantitative Pedology (AQP)
agdatacommons.nal.usda.gov
bin
Updated Feb 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Beaudette; Pierre Roudier; Andrew Brown (2024). Algorithms for Quantitative Pedology (AQP) [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Algorithms_for_Quantitative_Pedology_AQP_/24853281
Explore at:
binAvailable download formats
Dataset updated
Feb 13, 2024
Dataset provided by
United States Department of Agriculturehttp://usda.gov/
Authors
Dylan Beaudette; Pierre Roudier; Andrew Brown
License
https://www.gnu.org/licenses/fdl-1.3.en.htmlhttps://www.gnu.org/licenses/fdl-1.3.en.html
Description
Algorithms for Quantitative Pedology (AQP) is a collection of code, ideas, documentation, and examples wrapped-up into several R packages. The theory behind much of the code can be found in Beaudette, D., Roudier, P., & O'Geen, A. (2013). Algorithms for quantitative pedology: A toolkit for soil scientists. Computers & Geosciences, 52, 258-268. doi: 10.1016/j.cageo.2012.10.020. The AQP package was designed to support data-driven approaches to common soils-related tasks such as visualization, aggregation, and classification of soil profile collections. To contribute code, documentation, bug reports, etc. contact Dylan at dylan [dot] beaudette [at] usda [dot] gov. AQP is a collaborative effort, funded in part by the Kearney Foundation of Soil Science (2009-2011) and USDA-NRCS (2011-current). The AQP suite of R packages are used to generate figures for SoilWeb, Series Extent Explorer, and Soil Data Explorer. Soil data presented were derived from the 100+ year efforts of the National Cooperative Soil Survey, c/o USDA-NRCS. Resources in this dataset:Resource Title: aqp: Algorithms for Quantitative Pedology (CRAN). File Name: Web Page, url: https://CRAN.R-project.org/package=aqp The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a loosely-related set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks . These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb https://casoilresource.lawr.ucdavis.edu/soilweb-apps/. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDA-NRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.
e
Simulation
paper.erudition.co.in
html
Updated Mar 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Simulation [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2023-2024/2/data-analysis-with-r/subsetting
Explore at:
htmlAvailable download formats
Dataset updated
Mar 17, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Simulation of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
h
NATCOOP dataset
heidata.uni-heidelberg.de
csv, docx, pdf, tsv +1
Updated Jan 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Diekert; Florian Diekert; Robbert-Jan Schaap; Robbert-Jan Schaap; Tillmann Eymess; Tillmann Eymess (2022). NATCOOP dataset [Dataset]. http://doi.org/10.11588/DATA/GV8NBL
Explore at:
docx(90179), pdf(432619), csv(3441765), docx(499022), tsv(86553), pdf(473493), pdf(856157), pdf(467245), docx(101203), pdf(351653), pdf(576588), pdf(200225), pdf(124038), type/x-r-syntax(14339), pdf(345323), pdf(69467), docx(43108), pdf(268168), docx(493800), docx(25110), docx(43036), pdf(270379), pdf(77960), pdf(464499), pdf(392748), docx(42158), pdf(374488), docx(498354), pdf(282466), pdf(482954), pdf(302513), pdf(513748), pdf(126342), docx(33772), tsv(2313475), pdf(441389), pdf(92836), pdf(392718)Available download formats
Unique identifier
https://doi.org/10.11588/DATA/GV8NBL
Dataset updated
Jan 27, 2022
Dataset provided by
heiDATA
Authors
Florian Diekert; Florian Diekert; Robbert-Jan Schaap; Robbert-Jan Schaap; Tillmann Eymess; Tillmann Eymess
License
https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/GV8NBLhttps://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/GV8NBL
Time period covered
Jan 1, 2017 - Jan 1, 2021
Dataset funded by
European Commission
Description
The NATCOOP project set out to study how nature shapes the preferences and incentives of economic agents and how this in turn affects common-pool resource management. Imagine a group of fishermen targeting a species that requires a lot of teamwork to harvest. Do these fishers become more social over time compared to fishers that work in a more solitary manner? If so, does this have implications for how the fishery should be managed? To study this, the NATCOOP team travelled to Chile and Tanzania and collected data using surveys and economic experiments. These two very different countries have a large population of small-scale fishermen, and both host several distinct types of fisheries. Over the course of five field trips, the project team surveyed more than 2500 fishermen with each field trip contributing to the main research question by measuring fishermen’s preferences for cooperation and risk. Additionally, each fieldtrip aimed to answer another smaller research question that was either focused on risk taking or cooperation behavior in the fisheries. The data from both surveys and experiments are now publicly available and can be freely studied by other researchers, resource managers, or interested citizens. Overall, the NATCOOP dataset contains participants’ responses to a plethora of survey questions and their actions during incentivized economic experiments. It is available in both the .dta and .csv format, and its use is recommended with statistical software such as R or Stata. For those unaccustomed with statistical analysis, we included a video tutorial on how to use the data set in the open-source program R.
m
Data for: A systematic review showed no performance benefit of machine...
data.mendeley.com
search.datacite.org
Updated Mar 14, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben Van Calster (2019). Data for: A systematic review showed no performance benefit of machine learning over logistic regression for clinical prediction models [Dataset]. http://doi.org/10.17632/sypyt6c2mc.1
Explore at:
Unique identifier
https://doi.org/10.17632/sypyt6c2mc.1
Dataset updated
Mar 14, 2019
Authors
Ben Van Calster
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The uploaded files are:

1) Excel file containing 6 sheets in respective Order: "Data Extraction" (summarized final data extractions from the three reviewers involved), "Comparison Data" (data related to the comparisons investigated), "Paper level data" (summaries at paper level), "Outcome Event Data" (information with respect to number of events for every outcome investigated within a paper), "Tuning Classification" (data related to the manner of hyperparameter tuning of Machine Learning Algorithms).

2) R script used for the Analysis (In order to read the data, please: Save "Comparison Data", "Paper level data", "Outcome Event Data" Excel sheets as txt files. In the R script srpap: Refers to the "Paper level data" sheet, srevents: Refers to the "Outcome Event Data" sheet and srcompx: Refers to " Comparison data Sheet".

3) Supplementary Material: Including Search String, Tables of data, Figures

4) PRISMA checklist items
Storage and Transit Time Data and Code
zenodo.org
zip
Updated Nov 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14171251
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14171251
Dataset updated
Nov 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrew Felton; Andrew Felton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Andrew J. Felton
Date: 11/15/2024

This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

"Global estimates of the storage and transit time of water through vegetation"

Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated throughout the peer review process.

#Data information:

The data folder contains key data sets used for analysis. In particular:

"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

#Code information

Python scripts can be found in the "supporting_code" folder.

Each R script in this project has a role:

"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

"02_functions.R": This script contains custom functions. Load this using the `source()` function in the 01_start.R script.

"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.

"04_figures_tables.R": This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the "manuscript_figures" folder. Note that all maps were produced using Python code found in the "supporting_code"" folder. Also note that within the "manuscript_figures" folder there is an "extended_data" folder, which contains tables of the summary statistics (e.g., quartiles and sample sizes) behind figures containing box plots or depicting regression coefficients.

"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.
w
Appalachian Basin Play Fairway Analysis: Thermal Quality Analysis in...
data.wu.ac.at
zip
Updated Mar 6, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HarvestMaster (2018). Appalachian Basin Play Fairway Analysis: Thermal Quality Analysis in Low-Temperature Geothermal Play Fairway Analysis (GPFA-AB) ThermalQualityAnalysisWellDataOutliersRemovedPredTempAt1500m.zip [Dataset]. https://data.wu.ac.at/schema/geothermaldata_org/YjcyY2NkZDUtN2NmNy00NTlhLTg0OTktN2E5MGJiZTFlNDky
Explore at:
zipAvailable download formats
Dataset updated
Mar 6, 2018
Dataset provided by
HarvestMaster
Area covered
2ae06014f9655418af34aa9e222e0020187b411d
Description
This collection of files are part of a larger dataset uploaded in support of Low Temperature Geothermal Play Fairway Analysis for the Appalachian Basin (GPFA-AB, DOE Project DE-EE0006726). Phase 1 of the GPFA-AB project identified potential Geothermal Play Fairways within the Appalachian basin of Pennsylvania, West Virginia and New York. This was accomplished through analysis of 4 key criteria: thermal quality, natural reservoir productivity, risk of seismicity, and heat utilization. Each of these analyses represent a distinct project task, with the fifth task encompassing combination of the 4 risks factors. Supporting data for all five tasks has been uploaded into the Geothermal Data Repository node of the National Geothermal Data System (NGDS).

This submission comprises the data for Thermal Quality Analysis (project task 1) and includes all of the necessary shapefiles, rasters, datasets, code, and references to code repositories that were used to create the thermal resource and risk factor maps as part of the GPFA-AB project. The identified Geothermal Play Fairways are also provided with the larger dataset. Figures (.png) are provided as examples of the shapefiles and rasters. The regional standardized 1 square km grid used in the project is also provided as points (cell centers), polygons, and as a raster. Two ArcGIS toolboxes are available: 1) RegionalGridModels.tbx for creating resource and risk factor maps on the standardized grid, and 2) ThermalRiskFactorModels.tbx for use in making the thermal resource maps and cross sections. These toolboxes contain item description documentation for each model within the toolbox, and for the toolbox itself. This submission also contains three R scripts: 1) AddNewSeisFields.R to add seismic risk data to attribute tables of seismic risk, 2) StratifiedKrigingInterpolation.R for the interpolations used in the thermal resource analysis, and 3) LeaveOneOutCrossValidation.R for the cross validations used in the thermal interpolations.

Some file descriptions make reference to various 'memos'. These are contained within the final report submitted October 16, 2015.

Each zipped file in the submission contains an 'about' document describing the full Thermal Quality Analysis content available, along with key sources, authors, citation, use guidelines, and assumptions, with the specific file(s) contained within the .zip file highlighted.

UPDATE: Newer version of the Thermal Quality Analysis has been added here: https://gdr.openei.org/submissions/879 (Also linked below) Newer version of the Combined Risk Factor Analysis has been added here: https://gdr.openei.org/submissions/880 (Also linked below) This is one of seven .zip files associated files relating to thermal outlier assessment within the Thermal Quality Analysis task of the Low Temperature Geothermal Play Fairway Analysis for the Appalachian Basin. This file contains the data pertinent to the predicted depth at 1.5 km.

The seven files contain the well data sorted for outliers, and the R scripts used to process the data. Before running the outlier identification, wells in AASG_Thermed.xlsx (may be found within ThermalQualityAnalysisThermalModelDataFilesStateWellTemperatureDatabases.zip) were checked for the same spatial location. Only the deepest well in a given location is used for quality purposes. The R script SortingWells.R contains two functions that were developed to sort the data according to these specifications. Further details about the data processing are provided in 9_GPFA-AB_InterpolationThermalFieldEstimation.pdf (Smith, 2015), provided within the project final report submitted on 10/16/2015. Outlier identification is done using the local identification function in Whealton and Smith (2015) with a 32 km searching radius for points. The nearest 25 points are used to check for outliers. Details about the algorithm are provided in 6_GPFA-AB_ThermalOutlierAssessment.pdf (Whealton and Stedinger, 2015), again within the final report.

For ease in setting up the data for outlier identification, a function was written in R script DataArrangementAndRunOutlierAnalysis.R. This function sets up the data and runs the outlier identification function.
Google Data Analytics Capstone
kaggle.com
Updated Aug 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reilly McCarthy (2022). Google Data Analytics Capstone [Dataset]. https://www.kaggle.com/datasets/reillymccarthy/google-data-analytics-capstone/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Reilly McCarthy
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Hello! Welcome to the Capstone project I have completed to earn my Data Analytics certificate through Google. I chose to complete this case study through RStudio desktop. The reason I did this is that R is the primary new concept I learned throughout this course. I wanted to embrace my curiosity and learn more about R through this project. In the beginning of this report I will provide the scenario of the case study I was given. After this I will walk you through my Data Analysis process based on the steps I learned in this course:

Ask

Prepare

Process

Analyze

Share

Act

The data I used for this analysis comes from this FitBit data set: https://www.kaggle.com/datasets/arashnic/fitbit

" This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. "
g
Scripts and data to run R-QWTREND models and produce results | gimi9.com
gimi9.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scripts and data to run R-QWTREND models and produce results | gimi9.com [Dataset]. https://www.gimi9.com/dataset/data-gov_scripts-and-data-to-run-r-qwtrend-models-and-produce-results/
Explore at:
Description
This child page contains a zipped folder which contains all items necessary to run trend models and produce results published in U.S. Geological Scientific Investigations Report 2021–XXXX [Tatge, W.S., Nustad, R.A., and Galloway, J.M., 2021, Evaluation of Salinity and Nutrient Conditions in the Heart River Basin, North Dakota, 1970-2020: U.S. Geological Survey Scientific Investigations Report 2021-XXXX, XX p.]. To run the R-QWTREND program in R 6 files are required and each is included in this child page: prepQWdataV4.txt, runQWmodelV4XXUEP.txt, plotQWtrendV4XXUEP.txt, qwtrend2018v4.exe, salflibc.dll, and StartQWTrendV4.R (Vecchia and Nustad, 2020). The folder contains: six items required to run the R–QWTREND trend analysis tool; a readme.txt file; a flowtrendData.RData file; an allsiteinfo.table.csv file, a folder called "scripts", and a folder called "waterqualitydata". The "scripts" folder contains the scripts that can be used to reproduce the results found in the USGS Scientific Investigations Report referenced above. The "waterqualitydata" folder contains .csv files with the naming convention of site_ions or site_nuts for major ions and nutrients constituents and contains machine readable files with the water-quality data used for the trend analysis at each site. R–QWTREND is a software package for analyzing trends in stream-water quality. The package is a collection of functions written in R (R Development Core Team, 2019), an open source language and a general environment for statistical computing and graphics. The following system requirements are necessary for using R–QWTREND: • Windows 10 operating system • R (version 3.4 or later; 64 bit recommended) • RStudio (version 1.1.456 or later). An accompanying report (Vecchia and Nustad, 2020) serves as the formal documentation for R–QWTREND. Vecchia, A.V., and Nustad, R.A., 2020, Time-series model, statistical methods, and software documentation for R–QWTREND—An R package for analyzing trends in stream-water quality: U.S. Geological Survey Open-File Report 2020–1014, 51 p., https://doi.org/10.3133/ofr20201014 R Development Core Team, 2019, R—A language and environment for statistical computing: Vienna, Austria, R Foundation for Statistical Computing, accessed December 7, 2020, at https://www.r-project.org.
e
Data Analysis with R (GE3B-07), 2nd Semester, Bachelor of Computer...
paper.erudition.co.in
html
Updated Mar 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Data Analysis with R (GE3B-07), 2nd Semester, Bachelor of Computer Application 2023-2024, MAKAUT | Erudition Paper [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2023-2024/2/data-analysis-with-r/subsetting
Explore at:
htmlAvailable download formats
Dataset updated
Mar 17, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of Data Analysis with R (GE3B-07),2nd Semester,Bachelor of Computer Application 2023-2024,Maulana Abul Kalam Azad University of Technology
Bellabeat case study using R
kaggle.com
Updated Oct 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
R. Naga Amrutha (2022). Bellabeat case study using R [Dataset]. https://www.kaggle.com/datasets/rnagaamrutha/bellabeatcasestudywithr/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 29, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
R. Naga Amrutha
Description
Dataset

This dataset was created by R. Naga Amrutha

Contents
c
Data Visualization of a GL Community: A Cooperative Project
datacatalogue.cessda.eu
ssh.datastations.nl
Updated Apr 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
R. Bartolini; S. Goggi; G. Pardelli (2023). Data Visualization of a GL Community: A Cooperative Project [Dataset]. http://doi.org/10.17026/dans-x3b-fvyj
Explore at:
Unique identifier
https://doi.org/10.17026/dans-x3b-fvyj
Dataset updated
Apr 11, 2023
Dataset provided by
Istituto di Linguistica Computazionale, ILC-CNR
Institute for Computational Linguistics, ILC-CNR
Authors
R. Bartolini; S. Goggi; G. Pardelli
Description
In 2012, GreyNet published a page on its website and made accessible the first edition of IDGL, International Directory of Organizations in Grey Literature . The latest update of this PDF publication was in August 2016, providing a list of some 280 organizations in 40 countries worldwide that have contact with the Grey Literature Network Service. The listing appears by country followed by the names of the organizations in alphabetical order, which are then linked to a URL.
This year GreyNet International marks its Twenty Fifth Anniversary and seeks to more fully showcase organizations, whose involvement in grey literature is in one or more ways linked to GreyNet.org. Examples of which include: members, partners, conference hosts, sponsors, authors, service providers, committee members, associate editors, etc.
This revised and updated edition of IDGL will benefit from the use of visualization software mapping the cities in which GreyNet’s contacts are located. Behind each point of contact are a number of fields that can be grouped and cross-tabulated for further data analysis. Such fields include the source, name of organization, acronym, affiliate’s job title, sector of information, subject/discipline, city, state, country, ISO code, continent, and URL. Eight of the twelve fields require input, while the other four fields do not.
The population of the study was derived by extracting records from GreyNet’s in-house, administrative file. Only recipients on GreyNet’s Distribution List as of February 2017 were included. The records were then further filtered and only those that allowed for completion of the required fields remained. This set of records was then converted to Excel format, duplications were removed, and further normalization of field entries took place. In fine, 510 records form the corpus of this study. In the coming months, an in-depth analysis of the data will be carried out - the results of which will be recorded and made visually accessible.
The expected outcome of the project will not only produce a revised, expanded, and updated publication of IDGL, but will also provide a visual overview of GreyNet as an international organization serving diverse communities with shared interests in grey literature. It will be a demonstration of GreyNet’s commitment to research, publication, open access, education, and public awareness in this field of library and information science. Finally, this study will serve to pinpoint geographic and subject based areas currently within as well as outside of GreyNet’s catchment.
d
Wheat Corn Soy Estimates Red River of the North Basin
catalog.data.gov
data.usgs.gov
+2more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Wheat Corn Soy Estimates Red River of the North Basin [Dataset]. https://catalog.data.gov/dataset/wheat-corn-soy-estimates-red-river-of-the-north-basin
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Area covered
Red River
Description
These data describe the percent of cropland harvested as wheat, corn, and soybean within each basin (basins 1-8, see accompanying shapefiles). Data are available for other crops; however, these three were chosen because wheat is a traditional crop that has been grown for a long time in the Basin and corn and soybeans have increased in recent times because of wetter conditions, the demand for biofuels, and advances in breeding short-season, drought-tolerant crops. The data come from the National Agricultural Statistics Service (NASS) Census of Agriculture (COA) and have estimates for 1974, 1978, 1982, 1986, 1992, 1997, 2002, 2007, and 2012. Years with missing data were estimated estimated using multivariate imputation of missing values with principal components analysis (PCA) via the function imputePCA in the R (R Core Team, 2015) package missMDA (Husson and Josse, 2015). In the interest of dimension reduction, the scores of the first principal component of principal component analysis, by basin, of the wheat, corn, and soy variables is included. Husson, F., and Josse, J., 2015, missMDA—Handling missing values with multivariate data analysis: R package version 1.9, https://CRAN.R-project.org/package=missMDA. R Core Team, 2015, R: A language and environment for statistical computing: R Foundation for Statistical Computing, Vienna, http://www.R-project.org.
o
University SET data, with faculty and courses characteristics
openicpsr.org
Updated Sep 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1
Explore at:
Unique identifier
https://doi.org/10.3886/E149801V1
Dataset updated
Sep 12, 2021
Authors
Under blind review in refereed journal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○
e
Getting started, Background
paper.erudition.co.in
html
Updated Mar 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Getting started, Background [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2023-2024/2/data-analysis-with-r/subsetting
Explore at:
htmlAvailable download formats
Dataset updated
Mar 17, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Getting started, Background of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
d
Digital Shoreline Analysis System version 4.3 Transects with Long-Term...
datasets.ai
search.dataone.org
+1more
55
Updated Sep 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of the Interior (2024). Digital Shoreline Analysis System version 4.3 Transects with Long-Term Linear Regression Rate Calculations for southern North Carolina (NCsouth) [Dataset]. https://datasets.ai/datasets/digital-shoreline-analysis-system-version-4-3-transects-with-long-term-linear-regression-r-7faa2
Explore at:
55Available download formats
Dataset updated
Sep 8, 2024
Dataset authored and provided by
Department of the Interior
Area covered
North Carolina
Description
Sandy ocean beaches are a popular recreational destination, often surrounded by communities containing valuable real estate. Development is on the rise despite the fact that coastal infrastructure is subjected to flooding and erosion. As a result, there is an increased demand for accurate information regarding past and present shoreline changes. To meet these national needs, the Coastal and Marine Geology Program of the U.S. Geological Survey (USGS) is compiling existing reliable historical shoreline data along open-ocean sandy shores of the conterminous United States and parts of Alaska and Hawaii under the National Assessment of Shoreline Change project. There is no widely accepted standard for analyzing shoreline change. Existing shoreline data measurements and rate calculation methods vary from study to study and prevent combining results into state-wide or regional assessments. The impetus behind the National Assessment project was to develop a standardized method of measuring changes in shoreline position that is consistent from coast to coast. The goal was to facilitate the process of periodically and systematically updating the results in an internally consistent manner.
f
beachmat: A Bioconductor C++ API for accessing high-throughput biological...
plos.figshare.com
figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron T. L. Lun; Hervé Pagès; Mike L. Smith (2023). beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types [Dataset]. http://doi.org/10.1371/journal.pcbi.1006135
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1006135
Dataset updated
May 31, 2023
Dataset provided by
PLOS Computational Biology
Authors
Aaron T. L. Lun; Hervé Pagès; Mike L. Smith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set.

Facebook

Twitter

Click to copy link

Link copied

Cite

Chicago Police Department (2025). GIS Final Project [Dataset]. https://data.cityofchicago.org/Public-Safety/GIS-Final-Project/8n2i-4jmi

GIS Final Project

Explore at:

10 scholarly articles cite this dataset (View in Google Scholar)

application/rdfxml, csv, tsv, xml, application/rssxml, kmz, application/geo+json, kmlAvailable download formats

Dataset updated

Mar 26, 2025

Authors

Chicago Police Department

Description

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e

Clear search

Close search

Google apps

Main menu

GIS Final Project

Replication Data for: Data Analysis Class Final Paper

Subsetting

Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...

Data from: Algorithms for Quantitative Pedology (AQP)

Simulation

NATCOOP dataset

Data for: A systematic review showed no performance benefit of machine...

Storage and Transit Time Data and Code

Appalachian Basin Play Fairway Analysis: Thermal Quality Analysis in...

Google Data Analytics Capstone

Scripts and data to run R-QWTREND models and produce results | gimi9.com

Data Analysis with R (GE3B-07), 2nd Semester, Bachelor of Computer...

Bellabeat case study using R

Dataset

Contents

Data Visualization of a GL Community: A Cooperative Project

Wheat Corn Soy Estimates Red River of the North Basin

University SET data, with faculty and courses characteristics

Getting started, Background

Digital Shoreline Analysis System version 4.3 Transects with Long-Term...

beachmat: A Bioconductor C++ API for accessing high-throughput biological...

GIS Final Project