Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
a renewable energy resource-based sustainable microgrid model for a residential area is designed by HOMER PRO microgrid software. A small-sized residential area of 20 buildings of about 60 families with 219 MWh and an electric vehicle charging station of daily 10 batteries with 18.3MWh annual energy consumption considered for Padma residential area, Rajshahi (24°22.6'N, 88°37.2'E) is selected as our case study. Solar panels, natural gas generator, inverter and Li-ion batteries are required for our proposed model. The HOMER PRO microgrid software is used to optimize our designed microgrid model. Data were collected from HOMER PRO for the year 2007. We have compared our daily load demand 650KW with the results varying the load by 10%, 5%, 2.5% more and less to find out the best case according to our demand. We have a total of 7 different datasets for different load conditions where each dataset contains a total of 8760 sets of data having 6 different parameters for each set. Data file contents: Data 1:: original_load.csv: This file contains data for 650KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Data arrangement is given below: Column 1: Date and time of data recording in the format of MM-DD- YYYY [hh]:[mm]. Time is in 24-hour format. Column 2: Solar power output in KW unit. Column 3: Generator power output in KW unit. Column 4: Total Electrical load served in KW unit. Column 5: Excess electrical production in KW unit. Column 6: Li-ion battery energy content in KWh unit. Column 7: Li-ion battery state of charge in % unit.
Data 2:: 2.5%_more_load.csv: This file contains data for 677KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset.
Data 3:: 2.5%_less_load.csv: This file contains data for 622KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset.
Data 4:: 5%_more_load.csv: This file contains data for 705KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset. Data 5:: 5%_less_load.csv: This file contains data for 595KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset. Data 6:: 10%_more_load.csv: This file contains data for the 760KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset. Data 7:: 10%_less_load.csv: This file contains data for 540KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database is composed of stator voltages and currents, field voltage and rotor position of a three-phase synchronous generator under resistive load. The data were acquired by means of two Tektronix MSO 2014B oscilloscopes with 4 channels each. Data were collected on 8 channels corresponding to phase voltages (Va,Vb,Vc), phase currents (Ia,Ib,Ic), field current of the generator (Ifd) and a pulse signal for angular position reference of the rotor (theta_m). For the simultaneous collection of the signals, a trip circuit was designed and implemented, the output of which was used as the triggering signal for the oscilloscopes. The synchronous generator was connected to a synchronous motor (Y-Y) and to a resistive circuit (18 units of 40W lamps) which served as loads. Voltage data were collected by means of a Keysight N2791 voltage probes; current data were collected using Tektronix A622 current tips. An PHCT203 optical key was used to collect rotor position pulse signal. The generator model is MOTROM M610-75-B-1K8-GS of 0.5 cv, 1800 rpm, 4 poles. The generator parameters obtained by means of physical bench tests were:
Rs = 32.5 ohms (stator winding resistance)
Rfd = 358.9 ohms (field winding resistance)
Ld = 0.803H (direct axis stator inductance)
Lq = 0.691H (quadrature axis stator inductance)
Lls = 0.12H (stator winding leakage inductance)
Lfd = 2.23H (field winding inductance)
Vf = 64V (field voltage applied during the experiment, supplied by an regulated DC source)
The database is composed by the following files of preprocessed data sampled at 10kHz in which the following variables are given, respectively, time, Va, Vb, Vc, Theta_r, Ia, Ib, Ic:
1) data0001.txt
2) data0002.txt
3) data0003.txt
4) data0001.csv
5) data0002.csv
6) data0003.csv
Further, the database contains the following files of raw data:
1) T0001A.txt
2) T0001B.txt
3) T0002A.txt
4) T0002B.txt
5) T0003A.txt
6) T0003B.txt
Each realization is contained in two files (suffixes A,B) and contain:
Suffix A
time
CH1 (Va)
CH1_peak (Va peak)
CH2 (Vb)
CH2_peak (Vb peak)
CH3 (Vc)
CH3_peak (Vc peak)
CH4 (Theta_r)
CH4_peak (Theta_r peak)
Suffix B
time
CH1 (Ia)
CH1_peak (Ia peak)
CH2 (Ib)
CH2_peak (Ib peak)
CH3 (Ic)
CH3_peak (Ic peak)
CH4 (EMPTY)
CH4_peak (EMPTY)
When using the raw data, it is important to consider that the values were acquired in the following conditions:
- Voltage probe scale: 100:1
- Current probe scale: 100mV/A
- Oscilloscope probe scale: 10x
- Oscilloscope configuration: check header of raw files.
Contact information: jose.grzybowski@uffs.edu.br
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the code we used to conduct the SNMF analysis and generate the tables of admixture coefficients. In excel, we used the output csv files to generate the plots presented in the manuscript. Files included are a text file detailing the analysis, the R script we used, the vcf file we used in the analysis, and two .csv files that were the output of the analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository includes MATLAB files and datasets related to the IEEE IIRW 2023 conference proceeding:T. Zanotti et al., "Reliability Analysis of Random Telegraph Noisebased True Random Number Generators," 2023 IEEE International Integrated Reliability Workshop (IIRW), South Lake Tahoe, CA, USA, 2023, pp. 1-6, doi: 10.1109/IIRW59383.2023.10477697
The repository includes:
The data of the bitmaps reported in Fig. 4, i.e., the results of the simulation of the ideal RTN-based TRNG circuit for different reseeding strategies. To load and plot the data use the "plot_bitmaps.mat" file.
The result of the circuit simulations considering the EvolvingRTN from the HfO2 device shown in Fig. 7, for two Rgain values. Specifically, the data is contained in the following csv files:
"Sim_TRNG_Circuit_HfO2_3_20s_Vth_210m_no_Noise_Ibias_11n.csv" (lower Rgain)
"Sim_TRNG_Circuit_HfO2_3_20s_Vth_210m_no_Noise_Ibias_4_8n.csv" (higher Rgain)
The result of the circuit simulations considering the temporary RTN from the SiO2 device shown in Fig. 8. Specifically, the data is contained in the following csv files:
"Sim_TRNG_Circuit_SiO2_1c_300s_Vth_180m_Noise_Ibias_1.5n.csv" (ref. Rgain)
"Sim_TRNG_Circuit_SiO2_1c_100s_200s_Vth_180m_Noise_Ibias_1.575n.csv" (lower Rgain)
"Sim_TRNG_Circuit_SiO2_1c_100s_200s_Vth_180m_Noise_Ibias_1.425n.csv" (higher Rgain)
Instructions on how to create a layer containing recent earthquakes from a CSV file downloaded from GNS Sciences GeoNet website to a Web Map.The CSV file must contain latitude and longitude fields for the earthquake location for it to be added to a Web Map as a point layer.Document designed to support the Natural Hazards - Earthquakes story map
Listing of low carbon energy generators installed on GLA group properties as requested in question 2816/2010 to the Mayor during the September 2010 Mayor's Question Time.
To date information has been provided by the London Fire and Emergency Planning Authority, the GLA and the Metropolitan Police Service (MPS). Transport for London has provided interim data, and further data will follow.
GLA csv
LFEPA csv
MPS csv
TfL csv
LFEPA Data
Details of low carbon energy generators located at fire stations in London operated by the London Fire Brigade (London Fire and Emergency Planning Authority). The data provides the location of the fire stations (including post code) and the type of generators at those premises including photovoltaic (PV) array, combined heat and power (CHP), wind turbines (WT) and solar thermal panels (STU). Data correct as of June 2016. The previous LFEPA data from October 2010 is available in csv, tab and shp formats. Previous LFEPA data from May 2011, April 2014 and April 2015 is available.
For further information please contact david.wyatt@london-fire.gov.uk
GLA Data Details of the photovoltaic (PV) installation at City Hall. Data correct as of 4th May 2011.
MPS Data The table provides details of low carbon energy generation installations on MPS buildings in London. The data provides the site locations (including post code, grid reference and latitude/longitude) and the type of generators at those premises which includes Photovoltaic (PV) arrays, Combined Heat and Power (CHP), Ground Source Heat Pumps (GSHP) and Solar Thermal panels (STU). This data is correct as at the 20th May 2011.
TfL Data Details of low carbon energy generators located at Transport for London’s buildings such as stations, depots, crew accommodation and head offices are provided. The data includes the postcode of the buildings and the type of generators at those premises including photovoltaic (PV) array, combined heat and power (CHP), wind turbines (WT) and solar thermal panels (STU). Data correct as of 24th May 2011.
For further information please contact helenwoolston@tfl.gov.uk
https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html
Replication pack, FSE2018 submission #164: ------------------------------------------
**Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: A Case Study of the PyPI Ecosystem **Note:** link to data artifacts is already included in the paper. Link to the code will be included in the Camera Ready version as well. Content description =================== - **ghd-0.1.0.zip** - the code archive. This code produces the dataset files described below - **settings.py** - settings template for the code archive. - **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset. This dataset only includes stats aggregated by the ecosystem (PyPI) - **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages themselves, which take around 2TB. - **build_model.r, helpers.r** - R files to process the survival data (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, `common.cache/survival_data.pypi_2008_2017-12_6.csv` in **dataset_full_Jan_2018.tgz**) - **Interview protocol.pdf** - approximate protocol used for semistructured interviews. - LICENSE - text of GPL v3, under which this dataset is published - INSTALL.md - replication guide (~2 pages)
Replication guide ================= Step 0 - prerequisites ---------------------- - Unix-compatible OS (Linux or OS X) - Python interpreter (2.7 was used; Python 3 compatibility is highly likely) - R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible) Depending on detalization level (see Step 2 for more details): - up to 2Tb of disk space (see Step 2 detalization levels) - at least 16Gb of RAM (64 preferable) - few hours to few month of processing time Step 1 - software ---------------- - unpack **ghd-0.1.0.zip**, or clone from gitlab: git clone https://gitlab.com/user2589/ghd.git git checkout 0.1.0 `cd` into the extracted folder. All commands below assume it as a current directory. - copy `settings.py` into the extracted folder. Edit the file: * set `DATASET_PATH` to some newly created folder path * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` - install docker. For Ubuntu Linux, the command is `sudo apt-get install docker-compose` - install libarchive and headers: `sudo apt-get install libarchive-dev` - (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools` Without this dependency, you might get an error on the next step, but it's safe to ignore. - install Python libraries: `pip install --user -r requirements.txt` . - disable all APIs except GitHub (Bitbucket and Gitlab support were not yet implemented when this study was in progress): edit `scraper/init.py`, comment out everything except GitHub support in `PROVIDERS`. Step 2 - obtaining the dataset ----------------------------- The ultimate goal of this step is to get output of the Python function `common.utils.survival_data()` and save it into a CSV file: # copy and paste into a Python console from common import utils survival_data = utils.survival_data('pypi', '2008', smoothing=6) survival_data.to_csv('survival_data.csv') Since full replication will take several months, here are some ways to speedup the process: ####Option 2.a, difficulty level: easiest Just use the precomputed data. Step 1 is not necessary under this scenario. - extract **dataset_minimal_Jan_2018.zip** - get `survival_data.csv`, go to the next step ####Option 2.b, difficulty level: easy Use precomputed longitudinal feature values to build the final table. The whole process will take 15..30 minutes. - create a folder `
Included in this content:
0045.perovksitedata.csv - main dataset used in this article. A more detailed description can be found in the “dataset overview” section below Chemical Inventory.csv - the hand curated file of all chemicals used in the construction of the perovskite dataset. This file includes identifiers, chemical properties, and other information. ExcessMolarVolumeData.xlsx - record of experimental data, computations, and final dataset used in the generation of the excess molar volume plots. MLModelMetrics.xlsx - all of the ML metrics organized in one place (excludes reactant set specific breakdown, see ML_Logs.zip for those files). OrganoammoniumDensityDataset.xlsx - complete set of the data used to generate the density values. Example calculations included. model_matchup_main.py - python pipeline used to generate all of the ML runs associated with the article. More detailed instructions on the operation of this code is included in the “ML Code” Section below. This file is also hosted on GIT: https://github.com/ipendlet/MLScripts/blob/master/temp_densityconc/model_matchup_main_20191231.py
SolutionVolumeDataset - complete set of 219 solutions in the perovskite dataset. Tabs include the automatically generated reagent information from ESCALATE, hand curated reagent information from early runs, and the generation of the dataset used in the creation of Figure 5.
error_auditing.zip - code and historical datasets used for reporting the dataset auditing.
“AllCode.zip” which contains:
model_matchup_main_20191231.py - python pipeline used to generate all of the ML runs associated with the article. More detailed instructions on the operation of this code is included in the “ML Code” Section below. This file is also hosted on
GIT: https://github.com/ipendlet/MLScripts/blob/master/temp_densityconc/0045.perovskitedata.csv
VmE_CurveFitandPlot.py - python code for generating the third order polynomial fit to the VmE vs mole fraction of FAH included in the main text. Requires the ‘MolFractionResults.csv’ to function (also included).
Calculation_Vm_Ve_CURVEFITTING.nb - mathematica code for generating the third order polynomial fit to the VmE vs mole fraction of FAH included in the main text.
Covariance_Analysis.py - python code for ingesting and plotting the covariance of features and volumes in the perovskite dataset. Includes renaming dictionaries used for the publication.
FeatureComparison_Plotting.py - python code for reading in and plotting features for the ‘GBT’ and ‘OHGBT’ folders in this directory. The code parses the contents of these folders and generates feature comparison metrics used for Figure 9 and the associated Figure S8. Some assembly required.
Requirements.txt - all of the packages used in the generation of this paper
0045.perovskitedata.csv - the main dataset described throughout the article. This file is required to run some of the code and is therefore kept near the code.
“ML_Logs.zip” which contains:
A folder describing every model generated for this article. In each folder there are a number of files:
Features_named_important.csv and features_value_importance.csv - these files are linked together and describe the weighted feature contributions from features (only present for GBT models)
AnalysisLog.txt - Log file of the run including all options, data curation and model training summaries
LeaveOneOut_Summary.csv - Results of the leave-one-reactant set-out studies on the model (if performed)
LOOModelInfo.txt - Hyperparameter information for each model in the study (associated with the given dataset, sometimes includes duplicate runs).
STTSModelInfo.txt - Hyperparameter information for each model in the study (associated with the given dataset, sometimes includes duplicate runs).
StandardTestTrain_Summary.csv - Results of the 6 fold cross validation ML performance (for the hold out case)
LeaveOneOut_FullDataset_ByAmine.csv - Results of the leave-one-reactant set-out studies performed on the full dataset (all experiments) specified by reactant set (delineated by the amine)
LeaveOneOut_StratifiedData_ByAmine.csv - Results of the leave-one-reactant set-out studies performed on a random stratified sample (96 random experiments) specified by reactant set (delineated by the amine)
model_matchup_main_*.py - code used to generate all of the runs contained in a particular folder. The code is exactly what was used at run time to generate a given dataset (requires 0045.perovskitedata.csv file to run).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Authors’ have designed and simulated a hybrid AC-DC micro-grid for a geographical area of Bangladesh, Payra (22.1493° N, 90.1352° E) aiming to the economical electricity supply. The design was held using a popular simulation software, “Homer Pro”. In the micro-grid, solar photovoltaic panel, wind turbine, and Natural gas generator were used as sources and 100 household’s AC load, and 30 Electric vehicle’s DC load as consumers. Solar Global Horizontal Irradiance, wind speed, hourly annual load demand were the corresponding input parameters. In order to utilize the generated power, the load demand has been simulated differently by increasing and decreasing it to some extent, such as 10%, 5% and 2.5% more and less respectively. Finally, there arise 7 datasets including the original and the 6 varied load scenarios. The dataset has total 8761 instance of data where, each set has 14 data samples. Data file contents: Data 1:: Payra_Original_load.csv: Contains raw output data corresponds to the original load demand of Payra, collected from the micro-grid simulation. This file has 8762 set of data where, each set having 14 data samples from the simulation output. Data arrangement is described below: Column 1: Date and time of data recording in the format of DD-MM-YY [hh]:[mm]. Time is in 24-hour format. Column 2: Output power data of the flat plate photovoltaic panel in the "kW" unit. Column 3: Output power data of the Northern Power NPS 60-24 wind turbine in the "kW" unit. Column 4: Output power data of the Autosize Genset Natural gas generator in the "kW" unit. Column 5: Output power data of the Autosize Genset Natural gas generator Fuel consumption in the "m-3" unit. Power values have a maximum of seven decimal places. Column 6: Data for the total electrical load served in the "kW" unit. Column 7: Data for percentage penetration of renewable energy sources. Column 8: Output power data of the Excess Electricity Production in the "kW" unit. Column 9: Output power data of the total renewable energy sources in the "kW" unit. Column 10: Output power data of the Inverter used in the model in the "kW" unit. Column 11: Output power data of the Rectifier, used in the model in the "kW" unit. Column 12: Output power data of the CELLCUBE® FB 20-130 Battery charge in the "kW" unit. Column 13: Output power data of the CELLCUBE® FB 20-130 Battery discharge in the "kW" unit. Column 14: Data for percentage state of charge of the CELLCUBE® FB 20-130 Battery.
Data 2:: Payra_10%more_load.csv, Data 3:: Payra_10%less_load.csv, Data 4:: Payra_5%more_load.csv, Data 5:: Payra_5%less_load.csv, Data 6:: Payra_2.5%more_load.csv, and Data 7:: Payra_2.5%less_load.csv :: Contains raw output data that corresponds to the (10% more, 10% less, 5% more, 5%less, 2.5% more, 2.5% less load demand respectively) of Payra, collected from the micro-grid simulation, having the same shape and data arrangement as before.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The purpose of this document is to accompany the public release of data collected from OpenCon 2015 applications.Download & Technical Information The data can be downloaded in CSV format from GitHub here: https://github.com/RightToResearch/OpenCon-2015-Application-Data The file uses UTF8 encoding, comma as field delimiter, quotation marks as text delimiter, and no byte order mark.
This data is released to the public for free and open use under a CC0 1.0 license. We have a couple of requests for anyone who uses the data. First, we’d love it if you would let us know what you are doing with it, and share back anything you develop with the OpenCon community (#opencon / @open_con ). Second, it would also be great if you would include a link to the OpenCon 2015 website (www.opencon2015.org) wherever the data is used. You are not obligated to do any of this, but we’d appreciate it!
Unique ID
This is a unique ID assigned to each applicant. Numbers were assigned using a random number generator.
Timestamp
This was the timestamp recorded by google forms. Timestamps are in EDT (Eastern U.S. Daylight Time). Note that the application process officially began at 1:00pm EDT June 1 ended at 6:00am EDT on June 23. Some applications have timestamps later than this date, and this is due to a variety of reasons including exceptions granted for technical difficulties, error corrections (which required re-submitting the form), and applications sent in via email and later entered manually into the form. [a]
Gender
Mandatory. Choose one from list or fill-in other. Options provided: Male, Female, Other (fill in).
Country of Nationality
Mandatory. Choose one option from list.
Country of Residence
Mandatory. Choose one option from list.
What is your primary occupation?
Mandatory. Choose one from list or fill-in other. Options provided: Undergraduate student; Masters/professional student; PhD candidate; Faculty/teacher; Researcher (non-faculty); Librarian; Publisher; Professional advocate; Civil servant / government employee; Journalist; Doctor / medical professional; Lawyer; Other (fill in).
Select the option below that best describes your field of study or expertise
Mandatory. Choose one option from list.
What is your primary area of interest within OpenCon’s program areas?
Mandatory. Choose one option from list. Note: for the first approximately 24 hours the options were listed in this order: Open Access, Open Education, Open Data. After that point, we set the form to randomize the order, and noticed an immediate shift in the distribution of responses.
Are you currently engaged in activities to advance Open Access, Open Education, and/or Open Data?
Mandatory. Choose one option from list.
Are you planning to participate in any of the following events this year?
Optional. Choose all that apply from list. Multiple selections separated by semi-colon.
Do you have any of the following skills or interests?
Mandatory. Choose all that apply from list or fill-in other. Multiple selections separated by semi-colon. Options provided: Coding; Website Management / Design; Graphic Design; Video Editing; Community / Grassroots Organizing; Social Media Campaigns; Fundraising; Communications and Media; Blogging; Advocacy and Policy; Event Logistics; Volunteer Management; Research about OpenCon's Issue Areas; Other (fill-in).
This data consists of information collected from people who applied to attend OpenCon 2015. In the application form, questions that would be released as Open Data were marked with a caret (^) and applicants were asked to acknowledge before submitting the form that they understood that their responses to these questions would be released as such. The questions we released were selected to avoid any potentially sensitive personal information, and to minimize the chances that any individual applicant can be positively identified. Applications were formally collected during a 22 day period beginning on June 1, 2015 at 13:00 EDT and ending on June 23 at 06:00 EDT. Some applications have timestamps later than this date, and this is due to a variety of reasons including exceptions granted for technical difficulties, error corrections (which required re-submitting the form), and applications sent in via email and later entered manually into the form. Applications were collected using a Google Form embedded at http://www.opencon2015.org/attend, and the shortened bit.ly link http://bit.ly/AppsAreOpen was promoted through social media. The primary work we did to clean the data focused on identifying and eliminating duplicates. We removed all duplicate applications that had matching e-mail addresses and first and last names. We also identified a handful of other duplicates that used different e-mail addresses but were otherwise identical. In cases where duplicate applications contained any different information, we kept the information from the version with the most recent timestamp. We made a few minor adjustments in the country field for cases where the entry was obviously an error (for example, electing a country listed alphabetically above or below the one indicated elsewhere in the application). We also removed one potentially offensive comment (which did not contain an answer to the question) from the Gender field and replaced it with “Other.”
OpenCon 2015 is the student and early career academic professional conference on Open Access, Open Education, and Open Data and will be held on November 14-16, 2015 in Brussels, Belgium. It is organized by the Right to Research Coalition, SPARC (The Scholarly Publishing and Academic Resources Coalition), and an Organizing Committee of students and early career researchers from around the world. The meeting will convene students and early career academic professionals from around the world and serve as a powerful catalyst for projects led by the next generation to advance OpenCon's three focus areas—Open Access, Open Education, and Open Data. A unique aspect of OpenCon is that attendance at the conference is by application only, and the majority of participants who apply are awarded travel scholarships to attend. This model creates a unique conference environment where the most dedicated and impactful advocates can attend, regardless of where in the world they live or their access to travel funding. The purpose of the application process is to conduct these selections fairly. This year we were overwhelmed by the quantity and quality of applications received, and we hope that by sharing this data, we can better understand the OpenCon community and the state of student and early career participation in the Open Access, Open Education, and Open Data movements.
For inquires about the OpenCon 2015 Application data, please contact Nicole Allen at nicole@sparc.arl.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Files to run the small dataset experiments used in the preprint "Self-Supervised Spatio-Temporal Representation Learning Of Satellite Image Time Series" available here. This .csv files enables to generate balanced small dataset from the PASTIS dataset. These files are required to run the experiment with a small training data-set, from the open source code ssl_ubarn. In the .csv file name selected_patches_fold_{FOLD}_nb_{NSITS}_seed_{SEED}.csv :
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Packet Capture (PCAP) files of UNSW-NB15 and CIC-IDS2017 dataset are processed and labelled utilizing the CSV files. Each packet is labelled by comparing the eight distinct features: Source IP, Destination IP, Source Port, Destination Port, Starting time, Ending time, Protocol and Time to live. The dimensions for the dataset is Nx1504. All column of the dataset are integers, therefore you can directly utilize this dataset in you machine learning models. Moreover, details of the whole processing and transformation is provided in the following GitHub Repo:
https://github.com/Yasir-ali-farrukh/Payload-Byte
You can utilize the tool available at the above mentioned GitHub repo to generate labelled dataset from scratch. All of the detail of processing and transformation is provided in the following paper:
@article{Payload,
author = "Yasir Ali Farrukh and Irfan Khan and Syed Wali and David Bierbrauer and Nathaniel Bastian",
title = "{Payload-Byte: A Tool for Extracting and Labeling Packet Capture Files of Modern Network Intrusion Detection Datasets}",
year = "2022",
month = "9",
url = "https://www.techrxiv.org/articles/preprint/Payload-Byte_A_Tool_for_Extracting_and_Labeling_Packet_Capture_Files_of_Modern_Network_Intrusion_Detection_Datasets/20714221",
doi = "10.36227/techrxiv.20714221.v1"
}
```
If you are using our tool or dataset, kindly cite our related paper which outlines the details of the tools and its processing.
Base milk frother dataset The unprocessed milk frother dataset consists of 1089 images denoted as Page-X.png where X is the sample ID. The dataset also consists of a csv file named sketch_drawing.csv that includes Image_ID, that denotes the id of the image, and text, that denotes the text description of the image. The other fields in the dataset do not matter for the purposes of this project. Augmented milk frother dataset After running the sketch2prototype framework on the dataset, we obtain an augmented dataset. Each sample has its own directory with the following contents: Directory of 4 generated images dalle_response.json - a log of the prompts used to generate the images original.png - the original milk frother sketch used to generate the 4 images Additionally, sketch_drawing.csv is in the folder.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains raw data and all scripts for the in situ sequencing processing pipeline (Mats Nilsson Lab, Stockholm University) and downstream analysis applied in the manuscript “Spatial and temporal localization of immune transcripts defines hallmarks and diversity in the tuberculosis granuloma”. Raw images:All 16bit tiff images are for 12 weeks_section1Format is baseX_cY_ORGX indicates the hybridization round (1-4)Y indicates the channel: 1: DAPI2: FITC (T)3: Cy3 (G)4: TexR (C)5: Cy5 (A)6: AF750 (Anchor primer)
Raw data in folders:
Lung csv files
Bacteria csv files
DAPI for plotting
HE folder
Scripts folder:
Matlab scripts
Cellprofiler pipelines
Identification and plotting of transcripts:
For all matlab scripts: Download the “Matlab scripts” folder, add lib to MATLAB path. Except MATLAB, no additional Mathworks product is required. Tested on R2017b.
InSituSequencing.m is the top-level script processing sequencing images to positional visualization of decoded transcripts. Use raw images for 12weeks_section1 as input images (others are available on request) . After Tiling in "InsituSequencing.m", process tiled images in the cell profiler pipeline “Blob identification”. Run decode and threshold in "InSituSequencing.m" to generate csv files containing position and intensity of each identified signal.csv files for all lung scans (3 per time point) are in “lung csv files” folder and can be plotted on DAPI images (10% of original size) found in “DAPI for plotting” folder using Plotting global in "InSituSequencing.m". High resolution H&E scans of in situ-sequenced lungs for lung section per time point are in the “HE folder” at 50% of original size. For all images 1 pixel corresponds to 0.325 mm.
Identification and plotting of transcripts in given proximity to bacteria:
Use the cellprofiler pipeline “Bacteria identification” instead of “Blob identification” to identify signal in indicated distances from identified bacteria. The folder “bacteria csv files” contains identified signals in the indicated distances to identified bacteria. Input images are available on request.
Downstream analysis (Matlab Scripts folder)
DensityEstimation.m was used to display not absolute reads but a kernel density estimation of a certain gene in a 2log scale.
ROI_draw_onImage.m was applied to extract reads from annotated regions. Pictures of annotations can be found in the manuscript supplementary figure S1.
HexbinClustering.m performed an unsupervised clustering (kmeans) of spatial data with a given number of distinct clusters in a given radius.Table 1-3 contain sequences of used specific primers, padlock probes and detection oligos.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The DIAMAS project investigates Institutional Publishing Service Providers (IPSP) in the broadest sense, with a special focus on those publishing initiatives that do not charge fees to authors or readers. To collect information on Institutional Publishing in the ERA, a survey was conducted among IPSPs between March-May 2024. This dataset contains aggregated data from the 685 valid responses to the DIAMAS survey on Institutional Publishing.
The dataset supplements D2.3 Final IPSP landscape Report Institutional Publishing in the ERA: results from the DIAMAS survey.
The data
Basic aggregate tabular data
Full individual survey responses are not being shared to prevent the easy identification of respondents (in line with conditions set out in the survey questionnaire). This dataset contains full tables with aggregate data for all questions from the survey, with the exception of free-text responses, from all 685 survey respondents. This includes, per question, overall totals and percentages for the answers given as well the breakdown by both IPSP-types: institutional publishers (IPs) and service providers (SPs). Tables at country level have not been shared, as cell values often turned out to be too low to prevent potential identification of respondents. The data is available in csv and docx formats, with csv files grouped and packaged into ZIP files. Metadata describing data type, question type, as well as question response rate, is available in csv format. The R code used to generate the aggregate tables is made available as well.
Files included in this dataset
survey_questions_data_description.csv - metadata describing data type, question type, as well as question response rate per survey question.
tables_raw_all.zip - raw tables (csv format) with aggregated data per question for all respondents, with the exception of free-text responses. Questions with multiple answers have a table for each answer option. Zip file contains 180 csv files.
tables_raw_IP.zip - as tables_raw_all.zip, for responses from institutional publishers (IP) only. Zip file contains 180 csv files.
tables_raw_SP.zip - as tables_raw_all.zip, for responses from service providers (SP) only. Zip file contains 170 csv files.
tables_formatted_all.docx - formatted tables (docx format) with aggregated data per question for all respondents, with the exception of free-text responses. Questions with multiple answers have a table for each answer option.
tables_formatted_IP.docx - as tables_formatted_all.docx, for responses from institutional publishers (IP) only.
tables_formatted_SP.docx - as tables_formatted_all.docx, for responses from service providers (SP) only.
DIAMAS_Tables_single.R - R script used to generate raw tables with aggregated data for all single response questions
DIAMAS_Tables_multiple.R - R script used to generate raw tables with aggregated data for all multiple response questions
DIAMAS_Tables_layout.R - R script used to generate document with formatted tables from raw tables with aggregated data
DIAMAS Survey on Instititutional Publishing - data availability statement (pdf)
All data are made available under a CC0 license.
This data package is associated with the publication “On the Transferability of Residence Time Distributions in Two 10-km Long River Sections with Similar Hydromorphic Units” submitted to the Journal of Hydrology (Bao et al. 2024). Quantifying hydrologic exchange fluxes (HEFs) at the stream-groundwater interface, along with their residence time distributions (RTDs) in the subsurface, is crucial for managing water quality and ecosystem health in dynamic river corridors. However, directly simulating high-spatial resolution HEFs and RTDs can be a time-consuming process, particularly for watershed-scale modeling. Efficient surrogate models that link RTDs to hydromorphic units (HUs) may serve as alternatives for simulating RTDs in large-scale models. One common concern with these surrogate models, however, is the transferability of the relationship between the RTDs and HUs from one river corridor to another. To address this, we evaluated the HEFs and the resulting RTD-HU relationships for two 10-kilometer-long river corridors along the Columbia River, using a one-way coupled three-dimensional transient surface-subsurface water transport modeling framework that we previously developed. Applying this framework to the two river corridors with similar HUs allows for quantitative comparisons of HEFs and RTDs using both statistical tests and machine learning classification models. This data package includes the model inputs files and the simulation results data. This data package contains 10 folders. The modeling simulation results data are in the folders 100H_pt_data and 300area_pt_data, for the study domain Hanford 100H and 300 area respectively. The remaining eight folders contain the scripts and data to generate the manuscript figures. The file-level metadata file (Bao_2024_Residence_Time_Distribution _flmd.csv) includes a list of all files contained in this data package and descriptions for each. The data dictionary file (Bao_2024_Residence_Time_Distribution _dd.csv) includes column header definitions and units of all tabular files.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Images metadata (file)Image_metadata.xlsx is a file which specifies the experiment, days of growth, stress/media, and stressor concentration associated with each image file included in this project.Images (zipped folder)This folder contains all of the phenotyping images obtained in this study.Sequenced mutants dataset (zipped folder)Includes two items:1) Sulfite phenotyping of haploid mutants of S. cerevisiae and S. paradoxus chosen as candidates for sequencing.2) Copper phenotyping of haploid mutants of S. cerevisiae and S. paradoxus chosen as candidates for sequencing.For sulfite the files provided contain the following info: Raw_data_positions_sulfite.txt = colony sizes at each position for each plate. Raw_data_strains_sulfite.csv = The raw data processed to link the colony size measurements with a technical replicate of a particular strain. Sulfite concentrations of each plate can also be found in the rightmost column. ANC_key_triplicate_sulfite.csv = Link the numeric designations of the mutants to their ancestors. positions_key_triplicate_sulfite.csv = links the positions on the plates to the numeric designations of the mutants. YJF_key_triplicate_sulfite.csv = YJF designations for the mutants that were chosen for sequencing linked to their numeric id in this experiment.For copper, two files contain all of the information. 4_13_21_seqed_coppermutsreplicatedphenod3_ColonyData_AllPlates.txt contains all of the colony sizes for each position in the images. Copper_design_YJFdesignations.csv specifies the YJF designations of each strain in each position.Diploid dataset (zipped folder)This dataset includes images and colony size measurements from several phenotyping experiments: Copper phenotyping of diploid mutants of S. cerevisiae and S. paradoxus with elevated resistance.Sulfite phenotyping of diploid mutants of S. cerevisiae and S. paradoxus with elevated resistance.Phenotyping these mutants in permissive conditions.The file diploid_colony_size_dataset.csv contains colony size measurements derived from the images in this item along with the collection metadata associated with each sample (relative size, color, recovery concentration, circularity, spontaneous/induced).Note the column "mutnumericid_techreps" in this file, which defines the positions that are technical replicates of the same mutant/strain.Haploid dataset (zipped folder)This dataset includes images and colony size measurements from several phenotyping experiments: Copper phenotyping of haploid mutants of S. cerevisiae and S. paradoxus with elevated resistance.Sulfite phenotyping of haploid mutants of S. cerevisiae and S. paradoxus with elevated resistance.Phenotyping these mutants in permissive conditions.The file haploid_colony_size_dataset.csv contains colony size measurements derived from the images in this item along with the collection metadata associated with each sample (relative size, color, recovery concentration, circularity, spontaneous/induced).Processed data used to generate figures (zipped folder)The following files contain the data used to generate the figures in the associated publication:canavanine2.csv = mutation rates and standard deviations of those rates for the three concentrations of canavanine used for both species for each treatment (mutagenized and mock mutagenized)copper2.csv = mutation rates and standard deviations for each copper concentration for both species for both treatments. Columns are added that were used to specify line connections and horizontal point offset in ggplot2.copper3.csv = Total mutation rates for copper for both species for both treatments. Includes a column used for horizontal offset in ggplot2.hapcop.csv, dipcop,csv, hapsul.csv, dipsul.csv contain effect size data for all the nonescapee strains that were phenotyped for both species.hapcopc.csv, dipcopc,csv, hapsulc.csv, dipsulc.csv contain costs data for all the nonescapee strains that were phenotyped for both species.rc_da_cop.csv and rc_da_sul.csv contain delta AUC values and costs measurements for the sequenced mutants and contain columns to split the mutants by category.Incidence.csv contains the incidence of the major mutant classes recovered in this study split between species.KSP1_muts.csv, PMA1_muts.csv, RTS1_muts.csv, REG1_muts.csv, encodes the position and identity of mutants recovered in this study such that they can be visualized as bar charts. Negative values are used for S. paradoxus counts.YJF4464_CUP1.csv contained coverage data at the CUP1 locus for S. paradoxus copper mutant YJF4464
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"Wake Vision" is a large, high-quality dataset featuring over 6 million images, significantly exceeding the scale and diversity of current tinyML datasets (100x). The dataset contains images with annotations of whether each image contains a person. Additionally, the dataset incorporates a comprehensive fine-grained benchmark to assess fairness and robustness, covering perceived gender, perceived age, subject distance, lighting conditions, and depictions. This dataset hosted on Harvard Dataverse contains images, CSV files, and code to generate a Wake Vision TensorFlow Dataset. We publish the annotations of this dataset under a CC BY 4.0 license. All images in the dataset are from the Open Images v7 dataset, which are sourced images from Flickr and are listed as having a CC BY 2.0 license.
This dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
This dataset contains the data and scripts to generate the hydrological response variables for surface water in the Clarence Moreton subregion as reported in CLM261 (Gilfedder et al. 2016).
File CLM_AWRA_HRVs_flowchart.png shows the different files in this dataset and how they interact. The python and R-scripts are written by the BA modelling team to, as detailed below, read, combine and analyse the source datasets CLM AWRA model, CLM groundwater model V1 and CLM16swg Surface water gauging station data within the Clarence Moreton Basin to create the hydrological response variables for surface water as reported in CLM2.6.1 (Gilfedder et al. 2016).
R-script HRV_SWGW_CLM.R reads, for each model simulation, the outputs from the surface water model in netcdf format from file Qtot.nc (dataset CLM AWRA model) and the outputs from the groundwater model, flux_change.csv (dataset CLM groundwater model V1) and creates a set of files in subfolder /Output for each GaugeNr and simulation Year:
CLM_GaugeNr_Year_all.csv and CLM_GaugeNR_Year_baseline.csv: the set of 9 HRVs for GaugeNr and Year for all 5000 simulations for baseline conditions
CLM_GaugeNr_Year_CRDP.csv: the set of 9 HRVs for GaugeNr and Year for all 5000 simulations for CRDP conditions (=AWRA streamflow - MODFLOW change in SW-GW flux)
CLM_GaugeNr_Year_minMax.csv: minimum and maximum of HRVs over all 5000 simulations
Python script CLM_collate_DoE_Predictions.py collates that information into following files, for each HRV and each maxtype (absolute maximum (amax), relative maximum (pmax) and time of absolute maximum change (tmax)):
CLM_AWRA_HRV_maxtyp_DoE_Predictions: for each simulation and each gauge_nr, the maxtyp of the HRV over the prediction period (2012 to 2102)
CLM_AWRA_HRV_DoE_Observations: for each simulation and each gauge_nr, the HRV for the years that observations are available
CLM_AWRA_HRV_Observations: summary statistics of each HRV and the observed value (based on data set CLM16swg Surface water gauging station data within the Clarence Moreton Basin)
CLM_AWRA_HRV_maxtyp_Predictions: summary statistics of each HRV
R-script CLM_CreateObjectiveFunction.R calculates for each HRV the objective function value for all simulations and stores it in CLM_AWRA_HRV_ss.csv. This file is used by python script CLM_AWRA_SI.py to generate figure CLM-2615-002-SI.png (sensitivity indices).
The AWRA objective function is combined with the overall objective function from the groundwater model in dataset CLM Modflow Uncertainty Analysis (CLM_MF_DoE_ObjFun.csv) into csv file CLM_AWRA_HRV_oo.csv. This file is used to select behavioural simulations in python script CLM-2615-001-top10.py. This script uses files CLM_NodeOrder.csv and BA_Visualisation.py to create the figures CLM-2616-001-HRV_10pct.png.
Bioregional Assessment Programme (2016) CLM AWRA HRVs Uncertainty Analysis. Bioregional Assessment Derived Dataset. Viewed 28 September 2017, http://data.bioregionalassessments.gov.au/dataset/e51a513d-fde7-44ba-830c-07563a7b2402.
Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements 20131204
Derived From Qld 100K mapsheets - Mount Lindsay
Derived From Qld 100K mapsheets - Helidon
Derived From Qld 100K mapsheets - Ipswich
Derived From CLM - Woogaroo Subgroup extent
Derived From CLM - Interpolated surfaces of Alluvium depth
Derived From CLM - Extent of Logan and Albert river alluvial systems
Derived From CLM - Bore allocations NSW v02
Derived From CLM - Bore allocations NSW
Derived From CLM - Bore assignments NSW and QLD summary tables
Derived From CLM - Geology NSW & Qld combined v02
Derived From CLM - Orara-Bungawalbin bedrock
Derived From CLM16gwl NSW Office of Water_GW licence extract linked to spatial locations_CLM_v3_13032014
Derived From CLM groundwater model hydraulic property data
Derived From CLM - Koukandowie FM bedrock
Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
Derived From NSW Office of Water - National Groundwater Information System 20140701
Derived From CLM - Gatton Sandstone extent
Derived From CLM16gwl NSW Office of Water, GW licence extract linked to spatial locations in CLM v2 28022014
Derived From Bioregional Assessment areas v03
Derived From NSW Geological Survey - geological units DRAFT line work.
Derived From Mean Annual Climate Data of Australia 1981 to 2012
Derived From CLM Preliminary Assessment Extent Definition & Report( CLM PAE)
Derived From Qld 100K mapsheets - Caboolture
Derived From CLM - AWRA Calibration Gauges SubCatchments
Derived From CLM - NSW Office of Water Gauge Data for Tweed, Richmond & Clarence rivers. Extract 20140901
Derived From Qld 100k mapsheets - Murwillumbah
Derived From AHGFContractedCatchment - V2.1 - Bremer-Warrill
Derived From Bioregional Assessment areas v01
Derived From Bioregional Assessment areas v02
Derived From QLD Current Exploration Permits for Minerals (EPM) in Queensland 6/3/2013
Derived From Pilot points for prediction interpolation of layer 1 in CLM groundwater model
Derived From CLM - Bore water level NSW
Derived From Climate model 0.05x0.05 cells and cell centroids
Derived From CLM - New South Wales Department of Trade and Investment 3D geological model layers
Derived From CLM - Metgasco 3D geological model formation top grids
Derived From State Transmissivity Estimates for Hydrogeology Cross-Cutting Project
Derived From CLM - Extent of Bremer river and Warrill creek alluvial systems
Derived From NSW Catchment Management Authority Boundaries 20130917
Derived From QLD Department of Natural Resources and Mining Groundwater Database Extract 20131111
Derived From Qld 100K mapsheets - Esk
Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements linked to bores and NGIS v4 28072014
Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012
Derived From CLM - Qld Surface Geology Mapsheets
Derived From NSW Office of Water Pump Test dataset
Derived From [CLM -
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data provided by NARI (Institut de Recherche de l'�cole Navale) contains AIS kinematic messages from vessels sailing in the Atlantic Ocean around the port of Brest, Brittany, France and span a period from 1 October 2015 to 31 March 2016.
Raw AIS messages - file: nari_dynamic.csv, 19,035,630 records.
After deduplication of original AIS messages, this dataset yielded 18,495,677 point locations (kinematic AIS messages only), which was used as input for creating trajectory synopses.
Attribute "MMSI" in the original data is used in processing as the identifier of each vessel (e.g., "244670495").
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
a renewable energy resource-based sustainable microgrid model for a residential area is designed by HOMER PRO microgrid software. A small-sized residential area of 20 buildings of about 60 families with 219 MWh and an electric vehicle charging station of daily 10 batteries with 18.3MWh annual energy consumption considered for Padma residential area, Rajshahi (24°22.6'N, 88°37.2'E) is selected as our case study. Solar panels, natural gas generator, inverter and Li-ion batteries are required for our proposed model. The HOMER PRO microgrid software is used to optimize our designed microgrid model. Data were collected from HOMER PRO for the year 2007. We have compared our daily load demand 650KW with the results varying the load by 10%, 5%, 2.5% more and less to find out the best case according to our demand. We have a total of 7 different datasets for different load conditions where each dataset contains a total of 8760 sets of data having 6 different parameters for each set. Data file contents: Data 1:: original_load.csv: This file contains data for 650KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Data arrangement is given below: Column 1: Date and time of data recording in the format of MM-DD- YYYY [hh]:[mm]. Time is in 24-hour format. Column 2: Solar power output in KW unit. Column 3: Generator power output in KW unit. Column 4: Total Electrical load served in KW unit. Column 5: Excess electrical production in KW unit. Column 6: Li-ion battery energy content in KWh unit. Column 7: Li-ion battery state of charge in % unit.
Data 2:: 2.5%_more_load.csv: This file contains data for 677KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset.
Data 3:: 2.5%_less_load.csv: This file contains data for 622KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset.
Data 4:: 5%_more_load.csv: This file contains data for 705KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset. Data 5:: 5%_less_load.csv: This file contains data for 595KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset. Data 6:: 10%_more_load.csv: This file contains data for the 760KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset. Data 7:: 10%_less_load.csv: This file contains data for 540KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset.