CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data Source: https://www.kaggle.com/datasets/gufukuro/movie-scripts-corpus Data Description : Movie Scripts Corpus This corpus was collected to use for screenplay analysis with machine learning methods. Corpus includes movie scripts, crawled from different sources, their annotations by script structural elements and movies metadata. Corpus description Screenplay data consists of: Movie scripts TXT-documents with raw full text (2858 docs) Movie scripts TXT-documents with full text lemmas (2858 docs) Manual annotation TXT-documents for some movie scripts (33 docs, more than 6000 annotated rows) Movie scripts annotations TXT-documents obtained by BERT Movie scripts annotations json-documents obtained by rule-based annotator ScreenPy Movies metadata consists of: Cut versions of movie reviews and scores from metacritic: Number of reviews: 21025 Number of movies with reviews: 2038 Metadata for movies, including: title, akas, launch year, score from metacritic, imdb user rating and number of votes from imdb.com, movie awards, opening weekend, producers, budget, script department, production companies, writers, directors, cast info, countries involved in production, age restrict, plot (with outline), keywords, genres, taglines, critics' synopsis Screenplay awards information: Academy Awards adapted screenplay, Academy Awards original screenplay, BAFTA, Golden Globe Award for Best Screenplay, Writers Guild Awards Winners & Nominees 2020-2013 nominations information for 462 movies in total. Movie characters data consists of: Script text fragments with dialogs and scene descriptions for characters, gathered with annotators: 2153 movies and text fragments for 32114 characters in total Gender labels for 4792 characters
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Main folder for Figure 3.
This file includes an annotated R script used for data analysis for this project. Data files called in this script are also uploaded. Annotations within the script equate to metadata. This dataset is associated with the following publication: Wick, M., T. Angradi, M. Pawlowski, D. Bolgrien, R. Debbout, J. Launspach, and M. Nord. Deep Lake Explorer: A web application for crowdsourcing the classification of benthic underwater video from the Laurentian Great Lakes. JOURNAL OF GREAT LAKES RESEARCH. International Association for Great Lakes Research, Ann Arbor, MI, USA, 46(5): 1469-1478, (2020).
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This child page contains a zipped folder which contains all items necessary to run trend models and produce results published in U.S. Geological Scientific Investigations Report 2022–XXXX [Nustad, R.A., and Tatge, W.S., 2023, Comprehensive Water-Quality Trend Analysis for Selected Sites and Constituents in the International Souris River Basin, Saskatchewan and Manitoba, Canada and North Dakota, United States, 1970-2020: U.S. Geological Survey Scientific Investigations Report 2023-XXXX, XX p.]. To run the R-QWTREND program in R, 6 files are required and each is included in this child page: prepQWdataV4.txt, runQWmodelV4.txt, plotQWtrendV4.txt, qwtrend2018v4.exe, salflibc.dll, and StartQWTrendV4.R (Vecchia and Nustad, 2020). The folder contains: three items required to run the R–QWTREND trend analysis tool; a README.txt file; a folder called "dataout"; and a folder called "scripts". The "scripts" folder contains the scripts that can be used to reproduce the results found in the USGS ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
First, "0 logfile processing.txt" was run in R. This was necessary to allow us to later adjust the epoch onsets. (The Presentation triggers are time-locked to word offset but we later decided to time-lock to word onset, so in the EEG processing we need to move each trigger up to line up with the word onset.) This script creates a "delays" file in each participant's folder; that file will later be used during the EEG preprocessing to adjust the latencies of the triggers.Next, "step1_importandfilteranddoica.m" was run in MATLAB. This imports the EEG data from a .cnt file, does preprocessing (e.g. re-referencing, bad channel interpolation, adjusting the trigger latencies as described above, and epoching), and then runs ICA.After this, the authors manually inspected each ICA decomposition and recorded the bad ICs in the later scripts so that they would be removed.Last, we run any of the "postprocessing" scripts. The differences between them are described below:The ones that say "strictercriteria" in the name us the criteria from our stage 1 pre-registration. The ones that don't say that use our looser deviation criteriaWithin each of the pairs described above (the ones with stricter criteria and the ones with looser criteria), there are two separate scripts. The one that has "fieldtrip" in the name are for doing statistics. The one with "eeglab" in the name (or the one just called "step2_postprocess.m" are for making plots.
This data package is associated with the publication “Meta-metabolome ecology reveals that geochemistry and microbial functional potential are linked to organic matter development across seven rivers” submitted to Science of the Total Environment. This data package includes the data necessary to replicate the analyses presented within the manuscript to investigate dissolved organic matter (DOM) development across broad spatial distances and within divergent biomes. Specifically, we included the Fourier transform ion cyclotron mass spectrometry (FTICR-MS) data, geochemistry data, annotated metagenomic data, and results from ecological null modeling analyses in this data package. Additionally, we included the scripts necessary to generate the figures from the manuscript.Complete metagenomic data associated with this data package can be found at the National Center for Biotechnology (NCBI) under Bioproject PRJNA946291.This dataset consists of (1) four folders; (2) a file-level metadata (flmd) file; (3) a data dictionary (dd) file; (4) a factor sheet describing samples; and (5) a readme. The FTICR Data folder contains (1) the processed Fourier transform ion cyclotron mass spectrometry (FTICR-MS) data; (2) a transformation-weighted characteristics dendrogram generated from the FTICR-MS data; and (3) the script used to generate all FTICR-MS related figures. The Geochemical Data folder contains (1) the single geochemistry data filemore » and (2) the R script responsible for generating associated figures. The Metagenomic Data folder contains (1) annotation information across different levels; (2) carbohydrate active enzyme (CAZyme) information from the dbCAN database (Yin et al., 2012); (3) phylogenetic tree data (FASTAs, alignments, and tree file); and (4) the scripts necessary to analyze all of these data and generate figures. The Null Modeling Data folder contains (1) data generated during null modeling for each river and all rivers combined and (2) the R scripts necessary to process the data. All files are .csv, .pdf, .tsv, .tre, .faa, .afa, .tree, or .R.« less
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files were used to calculate the model effect and select candidate comitters for the paper.
The folder `calculate-model-effect` contains the MySQL database and the Python script for each metric.
In the MySQL database, there are the following tables:
1. scmlog (the basic information of the commits)
2. files
3. hash_file (the hash of commits and their modified files)
4. sign (signed-off-by of commits)
5. review (reviewed-by of commits)
6. test (tested-by of commits)
7. ack (acked-by of commits)
8. maintainers (created using the file MAINTAINERS in the Linux kernel repository)
9. signer_maintainer
10. i915-committer-no-maintainer
The folder `select-candidate-committers` contains the data and the C++ script for selecting candidate committers for the subsystems.
Run `main.cpp` to get the results
Data and scripts are provided in support of the manuscript "Efficient inference of paternity and sibship inference given known maternity via hierarchical clustering", and the associated Python package FAPS, available from www.github.com/ellisztamas/faps.
Simulation scripts cover: 1. Performance under different mating scenarios. 2. Comparison with Colony2. 3. Effect of changing the number of Monte Carlo draws
The final script covers the analysis of half-sib arrays from wild-pollinated seed in an Antirrhinum majus hybrid zone.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This directory contains all the images, raw data, and scripts of the MAP2B manuscript, including the figures in both the main text and the supplementary materials. As GitHub has a size limit for single files, a compressed archive of all the data in this directory is provided here.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Main folder for Figure 2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data files in this collection are associated with the paper "Point-to-set lengths, local structure, and glassiness", S. Yaida, P. Charbonneau, L. Berthier, and G. Tarjus, Phys. Rev. E, 2016. They include .dat, .eps and .m files with associated raw data and generating scripts to allow for replication of the figures. The growing sluggishness of glass-forming liquids is thought to be accompanied by growing structural order. The nature of such order, however, remains hotly debated. A decade ago, point-to-set (PTS) correlation lengths were proposed as measures of amorphous order in glass formers, but recent results raise doubts as to their generality. Here, we extend the definition of PTS correlations in order to agnostically capture any type of growing order in liquids, be it local or amorphous. This advance enables the formulation of a clear distinction between slowing down due to conventional critical ordering from that due to glassiness and provides a unified framework to assess the relative importance of specific local order and generic amorphous order in glass formation. ... [Read More]
This child page contains a zipped folder which contains all items necessary to run trend models and produce results published in U.S. Geological Scientific Investigations Report 2021–XXXX [Tatge, W.S., Nustad, R.A., and Galloway, J.M., 2021, Evaluation of Salinity and Nutrient Conditions in the Heart River Basin, North Dakota, 1970-2020: U.S. Geological Survey Scientific Investigations Report 2021-XXXX, XX p.]. To run the R-QWTREND program in R 6 files are required and each is included in this child page: prepQWdataV4.txt, runQWmodelV4XXUEP.txt, plotQWtrendV4XXUEP.txt, qwtrend2018v4.exe, salflibc.dll, and StartQWTrendV4.R (Vecchia and Nustad, 2020). The folder contains: six items required to run the R–QWTREND trend analysis tool; a readme.txt file; a flowtrendData.RData file; an allsiteinfo.table.csv file, a folder called "scripts", and a folder called "waterqualitydata". The "scripts" folder contains the scripts that can be used to reproduce the results found in the USGS Scientific Investigations Report referenced above. The "waterqualitydata" folder contains .csv files with the naming convention of site_ions or site_nuts for major ions and nutrients constituents and contains machine readable files with the water-quality data used for the trend analysis at each site. R–QWTREND is a software package for analyzing trends in stream-water quality. The package is a collection of functions written in R (R Development Core Team, 2019), an open source language and a general environment for statistical computing and graphics. The following system requirements are necessary for using R–QWTREND: • Windows 10 operating system • R (version 3.4 or later; 64 bit recommended) • RStudio (version 1.1.456 or later). An accompanying report (Vecchia and Nustad, 2020) serves as the formal documentation for R–QWTREND. Vecchia, A.V., and Nustad, R.A., 2020, Time-series model, statistical methods, and software documentation for R–QWTREND—An R package for analyzing trends in stream-water quality: U.S. Geological Survey Open-File Report 2020–1014, 51 p., https://doi.org/10.3133/ofr20201014 R Development Core Team, 2019, R—A language and environment for statistical computing: Vienna, Austria, R Foundation for Statistical Computing, accessed December 7, 2020, at https://www.r-project.org.
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
In the frame of the QUAE project, an identification procedure was develop to sort singular behaviours in river temperature time series. This procedure was conceived as a tool to indentify particular behaviours in time series despite non continuous measurements and regardless the type of measurement (temperature, streamflow...). Three types of singularities are identified: extreme values (in some cases similar as outliers), roughened data (such as the difference between water temperature and air temperature) and buffered data (such as signals caused by groundwater inflows).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data, scripts and web site of the project “Digital mapping of fictional places in Spanish Early Modern Byzantine novels”. Visit the project web page at http://editio.github.io/mapping.literature
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These are the Matlab scripts to import EEG data and perform data preprocessing and dimension reduction. Script files are in MATLAB .m format. Also included are various support and information files for this process; these files are in various formats (.doc, .xls, .ced, .dat). There is a MS-Word .doc file that explains the various files and scripts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The scripts in this folder weer used to combine all call statistic files per day into one file, resulting in nine files containing all call statistics per data. The script ‘merging_dataset.R’ was used to combine all days worth of call statistics and create subsets of two frequency ranges (18-32 and 32-96). The script ‘camera_data’ was used to combine all camera and observation data.
Spectra measured from SRM-2063a and standards at 20 keV, 25 keV and 30 keV. Scripts for processing this data. Scripts for Monte Carlo simulating thin films of ADM-6005a and Al2O3 on CaF2 and for quantifying these simulated spectra.
Scripts and data acquired at the Mirror Lake Research Site, cited by the article submitted to Water Resources Research: Distributed Acoustic Sensing (DAS) as a Distributed Hydraulic Sensor in Fractured Bedrock M. W. Becker(1), T. I. Coleman(2), and C. C. Ciervo(1) 1 California State University, Long Beach, Geology Department, 1250 Bellflower Boulevard, Long Beach, California, 90840, USA. 2 Silixa LLC, 3102 W Broadway St, Suite A, Missoula MT 59808, USA. Corresponding author: Matthew W. Becker (matt.becker@csulb.edu).
Cantonese textual data, 82 million pieces in total; data is collected from Cantonese script text; data set can be used for natural language understanding, knowledge base construction and other tasks.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Main folder for Figure 6
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data Source: https://www.kaggle.com/datasets/gufukuro/movie-scripts-corpus Data Description : Movie Scripts Corpus This corpus was collected to use for screenplay analysis with machine learning methods. Corpus includes movie scripts, crawled from different sources, their annotations by script structural elements and movies metadata. Corpus description Screenplay data consists of: Movie scripts TXT-documents with raw full text (2858 docs) Movie scripts TXT-documents with full text lemmas (2858 docs) Manual annotation TXT-documents for some movie scripts (33 docs, more than 6000 annotated rows) Movie scripts annotations TXT-documents obtained by BERT Movie scripts annotations json-documents obtained by rule-based annotator ScreenPy Movies metadata consists of: Cut versions of movie reviews and scores from metacritic: Number of reviews: 21025 Number of movies with reviews: 2038 Metadata for movies, including: title, akas, launch year, score from metacritic, imdb user rating and number of votes from imdb.com, movie awards, opening weekend, producers, budget, script department, production companies, writers, directors, cast info, countries involved in production, age restrict, plot (with outline), keywords, genres, taglines, critics' synopsis Screenplay awards information: Academy Awards adapted screenplay, Academy Awards original screenplay, BAFTA, Golden Globe Award for Best Screenplay, Writers Guild Awards Winners & Nominees 2020-2013 nominations information for 462 movies in total. Movie characters data consists of: Script text fragments with dialogs and scene descriptions for characters, gathered with annotators: 2153 movies and text fragments for 32114 characters in total Gender labels for 4792 characters