Facebook
TwitterThe whole data and source can be found at https://emilhvitfeldt.github.io/friends/
"The goal of friends to provide the complete script transcription of the Friends sitcom. The data originates from the Character Mining repository which includes references to scientific explorations using this data. This package simply provides the data in tibble format instead of json files."
friends.csv - Contains the scenes and lines for each character, including season and episodes.friends_emotions.csv - Contains sentiments for each scene - for the first four seasons only.friends_info.csv - Contains information regarding each episode, such as imdb_rating, views, episode title and directors.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data Science Community on Reddit is growing every year. Today, the network is a platform for many professionals and enthusiasts who share valuable materials and experiences. Quite an interesting task is the analysis of posts dedicated to Data Science: - finding interesting topics, - studying changes in trends over time, - predicting the potential popularity of posts on Reddit by its title and text, etc.
Over time, I will increase the size of this dataset by adding posts from other subreddits, so that the quality of the analysis and modeling will improve.
Feel free to leave your comments on this dataset and starter notebook. I will try to make the dataset better and much larger. The current goal is 750k + posts (spoiler: after that, there will be a million!)
r/analytics, r/deeplearning, r/datascience, r/datasets, r/kaggle, r/learnmachinelearning, r/MachineLearning, r/statistics, r/artificial, r/AskStatistics, r/computerscience, r/computervision, r/dataanalysis, r/dataengineering, r/DataScienceJobs, r/datascienceproject, r/data, r/MLQuestions, r/rstats
Data were collected from pushshift.io API (maintained by Jason Baumgartner).
If you know of any interesting Data Science subreddits, please, let me know in discussions.
| column | description |
|---|---|
# | row index |
created_date | post publication date |
created_timestamp | post publication timestamp |
subreddit | subreddit name |
title | post title |
id | unique operation id |
author | post author |
author_created_utc | author registration date |
full_link | hyperlink to post |
score | ratio of likes and dislikes |
num_comments | the number of comments |
num_crossposts | the number of crossposts |
subreddit_subscribers | the number of subreddit subscribers at the time the post was published |
post | post text |
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Publication
will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt. 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Facebook
Twitterhttps://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Datasets from the WallOmics project. Contains phenomics, metabolomics, proteomics and transcriptomics data collected from two organs of five ecotypes of the model plant Arabidopsis thaliana exposed to two temperature growth conditions. Exploratory and integrative analyses of these data are presented in Durufle et al (2020) (doi:10.1093/bib/bbaa166) and Durufle et al (2020) (doi:10.3390/cells9102249).
Facebook
TwitterThe rainfall-runoff erosivity factor (R-Factor) quantifies the effects of raindrop impacts and reflects the amount and rate of runoff associated with the rain. The R-factor is one of the parameters used by the Revised Unified Soil Loss Equation (RUSLE) to estimate annual rates of erosion. This product is a raster representation of R-Factor derived from isoerodent maps published in the Agriculture Handbook Number 703 (Renard et al.,1997). Lines connecting points of equal rainfall ersoivity are called isoerodents. The iserodents plotted on a map of the Island of Kauai were digitized, then values between these lines were obtained by linear interpolation. The final R-Factor data are in raster GeoTiff format at 30 meter resolution in UTM, Zone 4, GRS80, NAD83.
Facebook
TwitterThis child page contains a zipped folder which contains all files necessary to run trend models and produce results published in U.S. Geological Scientific Investigations Report 2020–5079 [Nustad, R.A., and Vecchia, A.V., 2020, Water-quality trends for selected sites and constituents in the international Red River of the North Basin, Minnesota and North Dakota, United States, and Manitoba, Canada, 1970–2017: U.S. Geological Survey Scientific Investigations Report 2020–5079, 75 p., https://doi.org/10.3133/sir20205079]. The folder contains: six files required to run the R–QWTREND trend analysis tool; a readme.txt file; an alldata.RData file; a siteinfo_appendix.txt: and a folder called "scripts". R–QWTREND is a software package for analyzing trends in stream-water quality. The package is a collection of functions written in R (R Development Core Team, 2019), an open source language and a general environment for statistical computing and graphics. The following system requirements are necessary for using R–QWTREND: • Windows 10 operating system • R (version 3.4 or later; 64 bit recommended) • RStudio (version 1.1.456 or later). An accompanying report (Vecchia and Nustad, 2020) serves as the formal documentation for R–QWTREND. Vecchia, A.V., and Nustad, R.A., 2020, Time-series model, statistical methods, and software documentation for R–QWTREND—An R package for analyzing trends in stream-water quality: U.S. Geological Survey Open-File Report 2020–1014, 51 p., https://doi.org/10.3133/ofr20201014 R Development Core Team, 2019, R—A language and environment for statistical computing: Vienna, Austria, R Foundation for Statistical Computing, accessed June 12, 2019, at https://www.r-project.org.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Neural decoding is a powerful method to analyze neural activity. However, the code needed to run a decoding analysis can be complex, which can present a barrier to using the method. In this paper we introduce a package that makes it easy to perform decoding analyses in the R programing language. We describe how the package is designed in a modular fashion which allows researchers to easily implement a range of different analyses. We also discuss how to format data to be able to use the package, and we give two examples of how to use the package to analyze real data. We believe that this package, combined with the rich data analysis ecosystem in R, will make it significantly easier for researchers to create reproducible decoding analyses, which should help increase the pace of neuroscience discoveries.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
R+S_3k is a dataset for object detection tasks - it contains CAR annotations for 4,000 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterCalifornia policy is incentivizing rapid adoption of zero emission electric vehicles for light duty and freight applications. In this project, we explored how locating charging facilities at California's highway rest stops, might impact electricity demand, grid operation, and integration of renewables like solar and wind into California's energy mix. Assuming a growing population of electric vehicles to meet state goals, we estimated state-wide growth of electricity demand, and identified the most attractive rest stop locations for siting chargers. Using a California-specific electricity dispatch model developed at ITS, we estimated how charging vehicles at these stations would impact renewable energy curtailment in California. We estimated the impacts of charging infrastructures on California's electricity system and how they can be utilized to decrease the duck curve effect resulting from a large amount of solar energy penetration by 2050.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is a repository for codes and datasets for the open-access paper in Linguistik Indonesia, the flagship journal for the Linguistic Society of Indonesia (Masyarakat Linguistik Indonesia [MLI]) (cf. the link in the references below).
Rajeg, G. P. W., Denistia, K., & Rajeg, I. M. (2018). Working with a linguistic corpus using R: An introductory note with Indonesian negating construction. Linguistik Indonesia, 36(1), 1–36. doi: 10.26499/li.v36i1.71
Cite (dark-pink button on the top-left) and select the citation style through the dropdown button (default style is Datacite option (right-hand side)Rmd file) used to write the paper and containing the R codes to generate the analyses in the paper.rds format so that all code-chunks in the R Markdown file can be run.csl files for the referencing and bibliography (with APA 6th style). Rproj). Double click on this file to open an RStudio session associated with the content of this repository. See here and here for details on Project-based workflow in RStudio.docx template file following the basic stylesheet for Linguistik Indonesiabookdown R package (Xie, 2018). Make sure this package is installed in R.
Facebook
TwitterThe GOES-R PLT Mission Reports dataset consists of various reports filed by the scientists during the GOES-R Post Launch Test (PLT) field campaign including flight reports, weather forecasts, mission scientist reports, and plan-of-day reports. The campaign took place from March to May of 2017 in support of post-launch L1B and L2+ product validation of the Advanced Baseline Imager (ABI) and the Geostationary Lightning Mapper (GLM). The GOES-R PLT Mission Reports dataset contains reports from March 13 through May 17, 2017 in PDF, PNG, Microsoft Excel and Word (.xlsx and .docx) format, and KMZ format for display in Google Earth.
Facebook
TwitterImplementation of GWR in R
This repository contains code and files related to my project on Geographically Weighted Regression (GWR) in R. The dataset is from Badan Pusat Statistik. Files
Dataset.xlsx: This file contains the dataset used in the analysis.
GWLR.R: This script implements Geographically Weighted Logistic Regression in R.
GWPR.r: This script implements Geographically Weighted Poisson Regression in R.
GWR.R: This script implements Geographically Weighted Regression in R.
License
This project is licensed under the MIT License. See the LICENSE file for more details. Contact
If you have any questions or suggestions, feel free to contact me.
Facebook
TwitterOpen Access# Data and R code used in: Plant geographic distribution influences chemical defenses in native and introduced Plantago lanceolata populations ## Description of the data and file structure * 00_ReadMe_DescriptonVariables.csv: A list with the description of variables from each file used. * 00_Metadata_Coordinates.csv : A dataset that includes the coordinates of each Plantago lanceolata population used. * 00_Metadata_Climate.csv : A dataset that includes coordinates, bioclimatic parameters, and the results of PCA. The dataset was created based on the script '1_Environmental variables.qmd' * 00_Metadata_Individuals.csv: A dataset that includes general information about each plant individual. Information about root traits and chemistry is missing in four samples since we lost the samples. * 01_Datset_PlantTraits.csv: Size-related and resource allocation traits measured of Plantago lanceolata and herbivore damage. * 02_Dataset_TargetedCompounds.csv: Phytohormones, Iridoid glycosides, Verbascoside and Flavonoids quantification of the leaves and roots of Plantago lanceolata. Data generated from HPLC * 03_Dataset_Volatiles_Area.csv: Area of identified volatile compounds. Data generated from GC-FID * 03_Dataset_Volatiles_Compounds.csv: Information on identified volatile compounds. Data generated from GC-MS. * 04_Dataset_Metabolome_Negative_Metadata.txt: Metadata for files in negative mode * 04_Dataset_Metabolome_Negative_Intensity.xlsx : File with the intensity of the metabolite features in negative mode. The file was generated from Metaboscape and adapted as required for the Notame package. * 04_Dataset_Metabolome_Negative_Intensity_filtered.xlsx: File generated after preprocessing of features in negative mode. During the notadame pacakged preprossesing 0 were converted to na * 04_Dataset_Metabolome_Negative.msmsonly.csv: File with a intensity of the the metabolite features in negative mode with ms/ms data. File generated from Metaboscape. * 04_Results_Metabolome_Negative_canopus_compound_summary.tsv: Feature classification. Results generated from Sirius software. * 04_Results_Metabolome_Negative_compound_identifications.tsv: Feature identification. Results generated from Sirius software. * 05_Dataset_Metabolome_Positive_Metadata.txt: Metadata for files in positive mode * 05_DatasetMetabolome_Positive_Intensity.xlsx : File with a intensity of the the metabolite features in positive mode. File generated from Metaboscape and adapted as required for the Notame package. * 05_Dataset_Metabolome_Positive_Intensity_filtered: File generated after preprocessing of features in positive mode.During the notadame pacakged preprossesing 0 were converted to na ## ## Code/Software * 1_Environmental vairables.qmd: Rscript to Retrieve bioclimatic variables from based on the coordinates of each population and then perform a principal components analysis to reduce the axes variation and included the first principal component as an explanatory variable in our model to estimate trait differences between native and introduced populations. Figure 1b and 1d * 2_PlantTraits_and_Herbivory: Rscript for statistical anaylsis of size-related traits, resource allocation traits and herbivore damage. Figure 2. It needs to source: Model_1_Fucntion.R, Model_2_Fucntion.R, Plot_Function.R * 3_Metabolome: Rscript for statistical anaylsis of Plantago lanceolata metabolome. Figure 3. It needs to source: Metabolome_preprocessing_R, Model_1_Fucntion.R, Model_2_Fucntion.R, Plot_Function.R. * 4_TargetedCompounds: Rscript for statistical anaylsis of Plantago lanceolata targeted compounds. Figure 4. It needs to source: Model_1_Fucntion.R, Model_2_Fucntion.R, Plot_Function.R * 5_Volatilome: Rscript for statistical anaylsis of Plantago lanceolata metabolome. Figure 5. It needs to source: Model_1_Fucntion.R, Model_2_Fucntion.R, Plot_Function.R * Model_1_Function.R : Function to run statistical models * Model_2_Function.R : Function to run statistical models * Plots_Function.R : Function to run plot graphs * Metabolome_prepocessing.R: Script to preprocess features
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
semcoder/PyX-R dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterA major objective of biometrical genetics is to explore the nature of gene action in determining quantitative traits. This also includes determination of the number of major genetic factors or genes responsible for the traits. Diallel Mating Designs have been designed to deal with the type of genetic experiments that help assess variability in observed quantitative traits arising from genetic factors, environmental factors, and their interactions. Some Diallel Mating Designs are North Carolina Designs, Line by Tester Designs and Diallel designs. AGD-R is a set of R programs that performs statistical analyses to calculate Diallel, Line by Tester, North Carolina. AGD-R contains a graphical JAVA interface that helps the user to easily choose input files, which analysis to implement, and which variables to analyze.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
DxL_ver2_phase3.r is a dataset for object detection tasks - it contains Food And Drink 3 annotations for 6,206 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterThe GOES-R PLT Field Campaign Ivanpah dataset consists of surface reflectance and total optical depth data measured at Ivanpah Playa, Nevada during the GOES-R Post Launch Test (PLT) field campaign. The atmospheric measurements were made using an Automated Solar Radiometer (ASR), which tracks the sun throughout the day. Surface reflectance measurements were made using an ASD portable spectroradiometer and Spectralon reference panel. The GOES-R PLT field campaign took place from March to May of 2017 in support of post-launch L1b and L2+ product validation of the Advanced Baseline Image (ABI) and the Geostationary Lightning Mapper (GLM). The main goal of this dataset is to provide an independent validation of the AVIRIS-NG airborne instrument calibration. Data files in Excel format and browse imagery files in JPEG and PNG formats are only available for March 23 and March 28, 2017.
Facebook
TwitterThis dataset provides information about the number of properties, residents, and average property values for R Street cross streets in Wilmington, CA.
Facebook
TwitterThe whole data and source can be found at https://emilhvitfeldt.github.io/friends/
"The goal of friends to provide the complete script transcription of the Friends sitcom. The data originates from the Character Mining repository which includes references to scientific explorations using this data. This package simply provides the data in tibble format instead of json files."
friends.csv - Contains the scenes and lines for each character, including season and episodes.friends_emotions.csv - Contains sentiments for each scene - for the first four seasons only.friends_info.csv - Contains information regarding each episode, such as imdb_rating, views, episode title and directors.