Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Following the procedure of Jupyter notebook, users can create SUMMA input using *.csv files. If users want to create new SUMMA input, they can prepare input by csv format. After that, users are able to simulate SUMMA with PySUMMA and Plotting with SUMMA output by the various way.
Following the step of this notebooks 1. Creating SUMMA input from *.csv files 2. Run SUMMA Model using PySUMMA 3. Plotting with SUMMA output - Time series Plotting - 2D Plotting (heatmap, hovmoller) - Calculating water balance variables and Plotting - Spatial Plotting with shapefile
The ESS-DIVE reporting format for Comma-separated Values (CSV) file structure is based on a combination of existing guidelines and recommendations including some found within the Earth Science Community with valuable input from the Environmental Systems Science (ESS) Community. The CSV reporting format is designed to promote interoperability and machine-readability of CSV data files while also facilitating the collection of some file-level metadata content. Tabular data in the form of rows and columns should be archived in its simplest form, and we recommend submitting these tabular data following the ESS-DIVE reporting format for generic comma-separated values (CSV) text format files. In general, the CSV file format is more likely accessible by future systems when compared to a proprietary format and CSV files are preferred because this format is easier to exchange between different programs increasing the interoperability of a data file. By defining the reporting format and providing guidelines for how to structure CSV files and some field content within, this can increase the machine-readability of the data file for extracting, compiling, and comparing the data across files and systems. Data package files are in .csv, .png, and .md. Open the .csv with e.g. Microsoft Excel, LibreOffice, or Google Sheets. Open the .md files by downloading and using a text editor (e.g., notepad or TextEdit). Open the .png in e.g. a web browser, photo viewer/editor, or Google Drive.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
_p_SUSPHIRE/_I_T31_mealybug/_S_P4_Pcitri_IsoSeq/_A_02_cDNAcupcake-dry/
Trends in nutrient fluxes and streamflow for selected tributaries in the Lake Erie watershed were calculated using monitoring data at 10 locations. Trends in flow-normalized nutrient fluxes were determined by applying a weighted regression approach called WRTDS (Weighted Regression on Time, Discharge, and Season). Site information and streamflow and water-quality records are contained in 3 zipped files named as follows: INFO (site information), Daily (daily streamflow records), and Sample (water-quality records). The INFO, Daily (flow), and Sample files contain the input data, by water-quality parameter and by site as .csv files, used to run trend analyses. These files were generated by the R (version 3.1.2) software package called EGRET - Exploration and Graphics for River Trends (version 2.5.1) (Hirsch and DeCicco, 2015), and can be used directly as input to run graphical procedures and WRTDS trend analyses using EGRET R software. The .csv files are identified according to water-quality parameter (TP, SRP, TN, NO23, and TKN) and site reference number (e.g. TPfiles.1.INFO.csv, SRPfiles.1.INFO.csv, TPfiles.2.INFO.csv, etc.). Water-quality parameter abbreviations and site reference numbers are defined in the file "Site-summary_table.csv" on the landing page, where there is also a site-location map ("Site_map.pdf"). Parameter information details, including abbreviation definitions, appear in the abstract on the Landing Page. SRP data records were available at only 6 of the 10 trend sites, which are identified in the file "site-summary_table.csv" (see landing page) as monitored by the organization NCWQR (National Center for Water Quality Research). The SRP sites are: RAIS, MAUW, SAND, HONE, ROCK, and CUYA. The model-input dataset is presented in 3 parts: 1. INFO.zip (site information) 2. Daily.zip (daily streamflow records) 3. Sample.zip (water-quality records) Reference: Hirsch, R.M., and De Cicco, L.A., 2015 (revised). User Guide to Exploration and Graphics for RivEr Trends (EGRET) and dataRetrieval: R Packages for Hydrologic Data, Version 2.0, U.S. Geological Survey Techniques Methods, 4-A10. U.S. Geological Survey, Reston, VA., 93 p. (at: http://dx.doi.org/10.3133/tm4A10).
Metereological information related to Naples Metropolitan area
This dataset contains (a) a script “R_met_integrated_for_modeling.R”, and (b) associated input CSV files: 3 CSV files per location to create a 5-variable integrated meteorological dataset file (air temperature, precipitation, wind speed, relative humidity, and solar radiation) for 19 meteorological stations and 1 location within Trail Creek from the modeling team within the East River Community Observatory as part of the Watershed Function Scientific Focus Area (SFA). As meteorological forcings varied across the watershed, a high-frequency database is needed to ensure consistency in the data analysis and modeling. We evaluated several data sources, including gridded meteorological products and field data from meteorological stations. We determined that our modeling efforts required multiple data sources to meet all their needs. As output, this dataset contains (c) a single CSV data file (*_1981-2022.csv) for each location (20 CSV output files total) containing hourly time series data for 1981 to 2022 and (d) five PNG files of time series and density plots for each variable per location (100 PNG files). Detailed location metadata is contained within the Integrated_Met_Database_Locations.csv file for each point location included within this dataset, obtained from Varadharajan et al., 2023 doi:10.15485/1660962. This dataset also includes (e) a file-level metadata (flmd.csv) file that lists each file contained in the dataset with associated metadata and (f) a data dictionary (dd.csv) file that contains column/row headers used throughout the files along with a definition, units, and data type. Review the (g) ReadMe_Integrated_Met_Database.pdf file for additional details on the script, methods, and structure of the dataset. The script integrates Northwest Alliance for Computational Science and Engineering’s PRISM gridded data product, National Oceanic and Atmospheric Administration’s NCEP-NCAR Reanalysis 1 gridded data product (through the RCNEP
R package, Kemp et al., doi:10.32614/CRAN.package.RNCEP), and analytical-based calculations. Further, this script downscales the input data into hourly frequency, which is necessary for the modeling efforts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
_p_SUSPHIRE/_I_T31_mealybug/_S_P4_Pcitri_IsoSeq/_A_02_cDNAcupcake-dry/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder contains the input data for the WaterTAP3 model that was used for the eight NAWI (National Alliance for Water Innovation) source water baselines studies published in the Environmental Science and Technology special issue: Technology Baselines and Innovation Priorities for Water Treatment and Supply. There are also eight other separate DAMS submissions, one per source water, that include the model results for the published studies. In this data submission, all model inputs across the eight baselines are included. The data structure and content are described in a README.txt file. For more details on how to use the data in WaterTAP3 please refer to the model documentation and GitHub site found at "WaterTAP3 Github" linked in the submission resources.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Corresponding peer-reviewed publication
This dataset corresponds to all the RAPID input and output files that were used in the study reported in:
When making use of any of the files in this dataset, please cite both the aforementioned article and the dataset herein.
Time format
The times reported in this description all follow the ISO 8601 format. For example 2000-01-01T16:00-06:00 represents 4:00 PM (16:00) on Jan 1st 2000 (2000-01-01), Central Standard Time (-06:00). Additionally, when time ranges with inner time steps are reported, the first time corresponds to the beginning of the first time step, and the second time corresponds to the end of the last time step. For example, the 3-hourly time range from 2000-01-01T03:00+00:00 to 2000-01-01T09:00+00:00 contains two 3-hourly time steps. The first one starts at 3:00 AM and finishes at 6:00AM on Jan 1st 2000, Universal Time; the second one starts at 6:00 AM and finishes at 9:00AM on Jan 1st 2000, Universal Time.
Data sources
The following sources were used to produce files in this dataset:
Software
The following software were used to produce files in this dataset:
Study domain
The files in this dataset correspond to one study domain:
Description of files
All files below were prepared by Cédric H. David, using the data sources and software mentioned above.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A diverse selection of 1000 empirical time series, along with results of an hctsa feature extraction, using v1.06 of hctsa and Matlab 2019b, computed on a server at The University of Sydney.The results of the computation are in the hctsa file, HCTSA_Empirical1000.mat for use in Matlab using v1.06 of hctsa.The same data is also provided in .csv format for the hctsa_datamatrix.csv (results of feature computation), with information about rows (time series) in hctsa_timeseries-info.csv, information about columns (features) in hctsa_features.csv (and corresponding hctsa code used to compute each feature in hctsa_masterfeatures.csv), and the data of individual time series (each line a time series, for time series described in hctsa_timeseries-info.csv) is in hctsa_timeseries-data.csv. These .csv files were produced by running >>OutputToCSV(HCTSA_Empirical1000.mat,true,true); in hctsa.The input file, INP_Empirical1000.mat, is for use with hctsa, and contains the time-series data and metadata for the 1000 time series. For example, massive feature extraction from these data on the user's machine, using hctsa, can proceed as>> TS_Init('INP_Empirical1000.mat');Some visualizations of the dataset are in CarpetPlot.png (first 1000 samples of all time series as a carpet (color) plot) and 150TS-250samples.png (conventional time-series plots of the first 250 samples of a sample of 150 time series from the dataset). More visualizations can be performed by the user using TS_PlotTimeSeries from the hctsa package.See links in references for more comprehensive documentation for performing methodological comparison using this dataset, and on how to download and use v1.06 of hctsa.
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
The Dog Food Data Extracted from Chewy (USA) dataset contains 4,500 detailed records of dog food products sourced from one of the leading pet supply platforms in the United States, Chewy. This dataset is ideal for businesses, researchers, and data analysts who want to explore and analyze the dog food market, including product offerings, pricing strategies, brand diversity, and customer preferences within the USA.
The dataset includes essential information such as product names, brands, prices, ingredient details, product descriptions, weight options, and availability. Organized in a CSV format for easy integration into analytics tools, this dataset provides valuable insights for those looking to study the pet food market, develop marketing strategies, or train machine learning models.
Key Features:
Csv Exports And Imports Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project: Human Recourses Analysis - Human_Resources.csv
Description:
The dataset, named "Human_Resources.csv", is a comprehensive collection of employee records from a fictional company. Each row represents an individual employee, and the columns represent various features associated with that employee.
The dataset is rich, highlighting features like 'Age', 'MonthlyIncome', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department', 'EducationField', 'JobSatisfaction', and many more. The main focus is the 'Attrition' variable, which indicates whether an employee left the company or not.
Employee data were sourced from various departments, encompassing a diverse array of job roles and levels. Each employee's record provides an in-depth look into their background, job specifics, and satisfaction levels.
The dataset further includes specific indicators and parameters that were considered during employee performance assessments, offering a granular look into the complexities of each employee's experience.
For privacy reasons, certain personal details and specific identifiers have been anonymized or fictionalized. Instead of names or direct identifiers, each entry is associated with a unique 'EmployeeNumber', ensuring data privacy while retaining data integrity.
The employee records were subjected to rigorous examination, encompassing both manual assessments and automated checks. The end result of this examination, specifically whether an employee left the company or not, is clearly indicated for each record.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of 94,663 samples representing the training and test set. It includes the following columns: Timestamp of query (‘Time’), Cryptocurrency name (‘Cryptocurrency’), Rate (‘Rate’), Trading Volume (‘Volume’), Number of tweets (‘NumTweets’), Mean positive VADER Score (‘Positive’), Mean negative VADER Score (‘Negative’), Mean compound VADER Score (‘Compound’) and Mean neutral VADER Score (‘Neutral’). (ZIP)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LifeSnaps Dataset Documentation
Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.
The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.
Data Import: Reading CSV
For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.
Data Import: Setting up a MongoDB (Recommended)
To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.
To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.
For the Fitbit data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c fitbit
For the SEMA data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c sema
For surveys data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c surveys
If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.
Data Availability
The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:
{
_id:
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
This is a Wikidata 2015 NTriple dump in which the delimiter is changed to ','. The file is used in subsetting experiment via Radlog.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Input data (viscosities, in .csv format) for the modelling workflow, collected from literature, see associated publication for details.
Appendix.pdf Tags-topics.md Stack-exchange-query.md RQ1/ LDA_input/ combined-so-quora-mallet-metadata.csv topic-input.mallet LDA_output/ Mallet/ output_csv/ docs-in-topics.csv topic-words.csv topics-in-docs.csv topics-metadata.csv output_html/ all_topics.html Docs/ Topics/ RQ2/ datasource_rawdata/ quora.csv stackoverflow.csv manual_analysis_output/ stackoverflow_quora_taxonomy.xlsx
## Contents of the Replication Package --- - Appendix.pdf- Appendix of the paper containing supplement tables - Tags-topics.md tags selected from Stack overflow and topics selected from Quora for the study (RQ1 & RQ2) - Stack-exchange-query.md the query interface used to extract the posts from stack exchnage explorer. - RQ1/ - contains the data used to answer RQ1 - LDA_input/ - input data used for LDA analysis - combined-so-quora-mallet-metadata.csv
- Stack overflow and Quora questions used to perform LDA analysis - topic-input.mallet
- input file to the mallet tool - LDA_output/ - Mallet/ - contains the LDA output generated by MALLET tool - output_csv/ - docs-in-topics.csv
- documents per topic - topic-words.csv
- most relevant topic words - topics-in-docs.csv
- topic probability per document - topics-metadata.csv
- metadata per document and topic probability - output_html/ - Browsable results of mallet output - all_topics.html
- Docs/
- Topics/
- RQ2/ - contains the data used to answer RQ2 - datasource_rawdata/ - contains the raw data for each source - quora.csv
- contains the processed dataset (like removing html tags). To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool. - stackoverflow.csv
- contains the processed stackoverflow dataset. To know more about the preprocessing steps, please refer to the reproducibility section in the paper. The data is preprocessed using Makar tool. - manual_analysis_output/ - stackoverflow_quora_taxonomy.xlsx
- contains the classified dataset of stackoverflow and quora and description of taxonomy. - Taxonomy
- contains the description of the first dimension and second dimension categories. Second dimension categories are further divided into levels, separated by |
symbol. - stackoverflow-posts
- the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories. - quota-posts
- the questions are labelled relevant or irrelevant and categorized into the first dimension and second dimension categories. ---Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Following the procedure of Jupyter notebook, users can create SUMMA input using *.csv files. If users want to create new SUMMA input, they can prepare input by csv format. After that, users are able to simulate SUMMA with PySUMMA and Plotting with SUMMA output by the various way.
Following the step of this notebooks 1. Creating SUMMA input from *.csv files 2. Run SUMMA Model using PySUMMA 3. Plotting with SUMMA output - Time series Plotting - 2D Plotting (heatmap, hovmoller) - Calculating water balance variables and Plotting - Spatial Plotting with shapefile