Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is How to save tax 1999 edition. It features 7 columns including author, publication date, language, and book publisher.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is This is how I save my life : from California to India, a true story of finding everything when you are willing to try anything. It features 7 columns including author, publication date, language, and book publisher.
(1) dataandpathway_eisner.R, dataandpathway_bordbar.R, dataandpathway_taware.R and dataandpathway_almutawa.R: functions and codes to clean the realdata sets and obtain the annotation databases, which are save as .RData files in sudfolders Eisner, Bordbar, Taware and Al-Mutawa respectively. (2) FWER_excess.R: functions to show the inflation of FWER when integrating multiple annotation databases and to generate Table 1. (3) data_info.R: code to obtain Table 2 and Table 3. (4) rejections_perdataset.R and triangulartable.R: functions to generate Table 4. The runing time of rejections_perdataset.R is 7 hours around, we thus save the corresponding results as res_eisner.RData, res_bordbar.RData, res_taware.RData and res_almutawa.RData in subfolders Eisner, Bordbar, Taware and Al-Mutawa respectively. (5) pathwaysizerank.R: code for generating Figure 4 based on res_eisner.RData from (h). (6) iterationandtime_plot.R: code for generating Figure 5 based on “Al-Mutawa” data. The code is really time-consuming, nearly 5 days, we thus save the corresponding results and plot them in the main manuscript by pgfplot.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
S.A.V.E is a dataset for object detection tasks - it contains Drowning Swimming annotations for 3,841 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
3D Print Save is a dataset for object detection tasks - it contains Spaghetti annotations for 286 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Save is a framework for implementing highly available network-accessible services. Save consists of a command-line utility and a small set of extensions for the existing Mon monitoring utility. Mon is a flexible command scheduler that has the ability to take various actions (called 'alerts') depending on the exit conditions of the periodic commands (called 'monitors') it executes. Save provides a set of monitors and alerts that execute within the Mon scheduler.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Save The Great Barrier Reef is a dataset for object detection tasks - it contains Starfish annotations for 8,332 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is Reach across time to save our planet. It features 7 columns including author, publication date, language, and book publisher.
Testing dataset creation. Specifically testing this and the "notes" field.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence PCR amplicons derived from cDNA templates tagged with universal molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR and the use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Handling of the large datasets produced from SMRT-UMI sequencing was facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline), that automatically filters and parses reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination or early cycle PCR errors, resulting in highly accurate sequence datasets. The optimized SMRT-UMI sequencing method presented here represents a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus (HIV) quasispecies.
Methods
This serves as an overview of the analysis performed on PacBio sequence data that is summarized in Analysis Flowchart.pdf and was used as primary data for the paper by Westfall et al. "Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies"
Five different PacBio sequencing datasets were used for this analysis: M027, M2199, M1567, M004, and M005
For the datasets which were indexed (M027, M2199), CCS reads from PacBio sequencing files and the chunked_demux_config files were used as input for the chunked_demux pipeline. Each config file lists the different Index primers added during PCR to each sample. The pipeline produces one fastq file for each Index primer combination in the config. For example, in dataset M027 there were 3–4 samples using each Index combination. The fastq files from each demultiplexed read set were moved to the sUMI_dUMI_comparison pipeline fastq folder for further demultiplexing by sample and consensus generation with that pipeline. More information about the chunked_demux pipeline can be found in the README.md file on GitHub.
The demultiplexed read collections from the chunked_demux pipeline or CCS read files from datasets which were not indexed (M1567, M004, M005) were each used as input for the sUMI_dUMI_comparison pipeline along with each dataset's config file. Each config file contains the primer sequences for each sample (including the sample ID block in the cDNA primer) and further demultiplexes the reads to prepare data tables summarizing all of the UMI sequences and counts for each family (tagged.tar.gz) as well as consensus sequences from each sUMI and rank 1 dUMI family (consensus.tar.gz). More information about the sUMI_dUMI_comparison pipeline can be found in the paper and the README.md file on GitHub.
The consensus.tar.gz and tagged.tar.gz files were moved from sUMI_dUMI_comparison pipeline directory on the server to the Pipeline_Outputs folder in this analysis directory for each dataset and appended with the dataset name (e.g. consensus_M027.tar.gz). Also in this analysis directory is a Sample_Info_Table.csv containing information about how each of the samples was prepared, such as purification methods and number of PCRs. There are also three other folders: Sequence_Analysis, Indentifying_Recombinant_Reads, and Figures. Each has an .Rmd
file with the same name inside which is used to collect, summarize, and analyze the data. All of these collections of code were written and executed in RStudio to track notes and summarize results.
Sequence_Analysis.Rmd
has instructions to decompress all of the consensus.tar.gz files, combine them, and create two fasta files, one with all sUMI and one with all dUMI sequences. Using these as input, two data tables were created, that summarize all sequences and read counts for each sample that pass various criteria. These are used to help create Table 2 and as input for Indentifying_Recombinant_Reads.Rmd
and Figures.Rmd
. Next, 2 fasta files containing all of the rank 1 dUMI sequences and the matching sUMI sequences were created. These were used as input for the python script compare_seqs.py which identifies any matched sequences that are different between sUMI and dUMI read collections. This information was also used to help create Table 2. Finally, to populate the table with the number of sequences and bases in each sequence subset of interest, different sequence collections were saved and viewed in the Geneious program.
To investigate the cause of sequences where the sUMI and dUMI sequences do not match, tagged.tar.gz was decompressed and for each family with discordant sUMI and dUMI sequences the reads from the UMI1_keeping directory were aligned using geneious. Reads from dUMI families failing the 0.7 filter were also aligned in Genious. The uncompressed tagged folder was then removed to save space. These read collections contain all of the reads in a UMI1 family and still include the UMI2 sequence. By examining the alignment and specifically the UMI2 sequences, the site of the discordance and its case were identified for each family as described in the paper. These alignments were saved as "Sequence Alignments.geneious". The counts of how many families were the result of PCR recombination were used in the body of the paper.
Using Identifying_Recombinant_Reads.Rmd
, the dUMI_ranked.csv file from each sample was extracted from all of the tagged.tar.gz files, combined and used as input to create a single dataset containing all UMI information from all samples. This file dUMI_df.csv was used as input for Figures.Rmd.
Figures.Rmd
used dUMI_df.csv, sequence_counts.csv, and read_counts.csv as input to create draft figures and then individual datasets for eachFigure. These were copied into Prism software to create the final figures for the paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For more details, please refer to our paper: Nihal, R. A., et al. "UAV-Enhanced Combination to Application: Comprehensive Analysis and Benchmarking of a Human Detection Dataset for Disaster Scenarios." ICPR 2024 (Accepted), arXiv preprint arXiv (2024).
and the github repo https://github.com/Ragib-Amin-Nihal/C2A
We encourage users to cite this paper when using the dataset for their research or applications.
The C2A (Combination to Application) Dataset is a resource designed to advance human detection in disaster scenarios using UAV imagery. This dataset addresses a critical gap in the field of computer vision and disaster response by providing a large-scale, diverse collection of synthetic images that combine real disaster scenes with human poses.
Context: In the wake of natural disasters and emergencies, rapid and accurate human detection is crucial for effective search and rescue operations. UAVs (Unmanned Aerial Vehicles) have emerged as powerful tools in these scenarios, but their effectiveness is limited by the lack of specialized datasets for training AI models. The C2A dataset aims to bridge this gap, enabling the development of more robust and accurate human detection systems for disaster response.
Sources: The C2A dataset is a synthetic combination of two primary sources: 1. Disaster Backgrounds: Sourced from the AIDER (Aerial Image Dataset for Emergency Response Applications) dataset, providing authentic disaster scene imagery. 2. Human Poses: Derived from the LSP/MPII-MPHB (Multiple Poses Human Body) dataset, offering a wide range of human body positions.
Key Features: - 10,215 high-resolution images - Over 360,000 annotated human instances - 5 human pose categories: Bent, Kneeling, Lying, Sitting, and Upright - 4 disaster scenario types: Fire/Smoke, Flood, Collapsed Building/Rubble, and Traffic Accidents - Image resolutions ranging from 123x152 to 5184x3456 pixels - Bounding box annotations for each human instance
Inspiration: This dataset was inspired by the pressing need to improve the capabilities of AI-assisted search and rescue operations. By providing a diverse and challenging set of images that closely mimic real-world disaster scenarios, we aim to: 1. Enhance the accuracy of human detection algorithms in complex environments 2. Improve the generalization of models across various disaster types and human poses 3. Accelerate the development of AI systems that can assist first responders and save lives
Applications: The C2A dataset is designed for researchers and practitioners in: - Computer Vision and Machine Learning - Disaster Response and Emergency Management - UAV/Drone Technology - Search and Rescue Operations - Humanitarian Aid and Crisis Response
We hope this dataset will inspire innovative approaches to human detection in challenging environments and contribute to the development of technologies that can make a real difference in disaster response efforts.
Demo to save data from a Space to a Dataset. Goal is to provide reusable snippets of code.
Documentation: https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#scheduled-uploads Space: https://huggingface.co/spaces/Wauplin/space_to_dataset_saver/ JSON dataset: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json Image dataset: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-image Image (zipped) dataset:… See the full description on the dataset page: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-json.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This synthetic dataset contains 20,000 records of X-ray data labeled as "Normal" or "Tuberculosis". It is specifically created for training and evaluating classification models in the field of medical image analysis. The dataset aims to aid in building machine learning and deep learning models for detecting tuberculosis from X-ray data.
Tuberculosis (TB) is a highly infectious disease that primarily affects the lungs. Accurate detection of TB using chest X-rays can significantly enhance medical diagnostics. However, real-world datasets are often scarce or restricted due to privacy concerns. This synthetic dataset bridges that gap by providing simulated patient data while maintaining realistic distributions and patterns commonly observed in TB cases.
Column Name | Description |
---|---|
Patient_ID | Unique ID for each patient (e.g., PID000001) |
Age | Age of the patient (in years) |
Gender | Gender of the patient (Male/Female) |
Chest_Pain | Presence of chest pain (Yes/No) |
Cough_Severity | Severity of cough (Scale: 0-9) |
Breathlessness | Severity of breathlessness (Scale: 0-4) |
Fatigue | Level of fatigue experienced (Scale: 0-9) |
Weight_Loss | Weight loss (in kg) |
Fever | Level of fever (Mild, Moderate, High) |
Night_Sweats | Whether night sweats are present (Yes/No) |
Sputum_Production | Level of sputum production (Low, Medium, High) |
Blood_in_Sputum | Presence of blood in sputum (Yes/No) |
Smoking_History | Smoking status (Never, Former, Current) |
Previous_TB_History | Previous tuberculosis history (Yes/No) |
Class | Target variable indicating the condition (Normal, Tuberculosis) |
The dataset was generated using Python with the following libraries:
- Pandas: To create and save the dataset as a CSV file
- NumPy: To generate random numbers and simulate realistic data
- Random Seed: Set to ensure reproducibility
The target variable "Class" has a 70-30 distribution between Normal and Tuberculosis cases. The data is randomly generated with realistic patterns that mimic typical TB symptoms and demographic distributions.
This dataset is intended for:
- Machine Learning and Deep Learning classification tasks
- Data exploration and feature analysis
- Model evaluation and comparison
- Educational and research purposes
This synthetic dataset is open for educational and research use. Please credit the creator if used in any public or academic work.
This dataset was generated as a synthetic alternative to real-world data to help developers and researchers practice building and fine-tuning classification models without the constraints of sensitive patient data.
This dataset was created by Sneha Ramesh
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Noor Saeed
Released under MIT
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is How are you going to save yourself. It features 7 columns including author, publication date, language, and book publisher.
This data package includes the underlying data and files to replicate the calculations, charts, and tables presented in Can a Country Save Too Much? The Case of Norway, PIIE Policy Brief 18-7.
If you use the data, please cite as: Gagnon, Joseph E. (2018). Can a Country Save Too Much? The Case of Norway. PIIE Policy Brief 18-7. Peterson Institute for International Economics.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Waste detection in the desert using pre-trained Yolo11
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
guanguan99/qatuples_filtered-save dataset hosted on Hugging Face and contributed by the HF Datasets community
This dataset includes the following data reported in the PTI paper (link). These datasets can be read and processed using the provided notebooks (link) with the waveorder package (link). The zarr arrays (live one level below Col_x in the zarr files) in these datasets can also be visualized with the python image viewer (napari). You will need the ome-zarr plugin in napari and drag the zarr array to the napari viewer. 1. Anisotropic_target_small.zip includes two zarr files that save the raw intensity images and processed physical properties of the small anisotropic target (double line-scan, 300-fs pulse duration): - Anisotropic_target_small_raw.zarr: array size in the format of (PolChannel, IllumChannel, Z, Y, X) = (4, 9, 96, 300, 300) - Anisotropic_target_small_processed.zarr: (Pos0 - Stitched_f_tensor) array size in the format of (T, C, Z, Y, X) = (1, 9, 96, 300, 300) (Pos1 - Stitched_physical) array size in the format of (T, C, Z, Y, X) = (1, 5, 96, 300, 300) 2. Anisotropic_target_raw.zip includes the raw intensity images of another anisotropic target (single line-scan, 500-fs pulse duration): - data: 9 x 96 (pattern x z-slices) raw intensity images (TIFF) of the target with size of (2048, 2448) -> 4 channels of (1024, 1224) - bg: - data: 9 (pattern) raw intensity images (TIFF) of the background with size of (2048, 2448) -> 4 channels of (1024, 1224) - cali_images.pckl: pickle file that contains calibration curves of the polarization channels for this dataset 3. Anisotropic_target_processed.zip includes two zarr files that save the processed scattering potential tensor components and the processed physical properties of the anisotropic target (single line-scan, 500-fs pulse duration): - uPTI_stitched.zarr: (Stitched_f_tensor) array size in the format of (T, C, Z, Y, X) = (1, 9, 96, 1024, 1224) - uPTI_physical.zarr: (Stitched_physical) array size in the format of (T, C, Z, Y, X) = (1, 5, 96, 700, 700) (cropping the star target region) 4. Mouse_brain_aco_raw.zip includes the raw intensity images of the mouse brain section at aco region: - data: 9 x 96 (pattern x z-slices) raw intensity images (TIFF) of the mouse brain section with size of (2048, 2448) -> 4 channels of (1024, 1224) - bg: - data: 9 (pattern) raw intensity images (TIFF) of the background with size of (2048, 2448) -> 4 channels of (1024, 1224) - cali_images.pckl: pickle file that contains calibration curves of the polarization channels for this dataset 5. Mouse_brain_aco_processed.zip includes two zarr files that save the processed scattering potential tensor components and the processed physical properties of the mouse brain section at aco region: - uPTI_stitched.zarr: (Stitched_f_tensor) array size in the format of (T, C, Z, Y, X) = (1, 9, 96, 1024, 1224) - uPTI_physical.zarr: (Stitched_physical) array size in the format of (T, C, Z, Y, X) = (1, 5, 96, 1024, 1224) 6. Cardiomyocytes_(condition)_raw.zip includes two zarr files that save the raw PTI intensity images and the deconvolved fluorescence images of the cardiomyocytes with the specified (condition): - Cardiomyocytes_(condition)_raw.zarr: (Pos0) raw intensity images with the array size in the format of (PolChannel, IllumChannel, Z, Y, X) = (4, 9, 32, 1024, 1224) (Pos1) background intensity images with the array size in the format of (PolChannel, IllumChannel, Z, Y, X) = (4, 9, 1, 1024, 1224) - Cardiomyocytes_(condition)_fluor_decon.zarr: deconvolved fluorescence images with the array size in the format of (T, C, Z, Y, X) = (1, 3, 32, 1024, 1224) 7. Cardiomyocytes_(condition)_processed.zip includes two zarr files that save the processed scattering potential tensor components and the processed physical properties of the cardiomyocytes with the specified (condition): - uPTI_stitched.zarr: (Stitched_f_tensor) array size in the format of (T, C, Z, Y, X) = (1, 9, 32, 1024, 1224) - uPTI_physical.zarr: (Stitched_physical) array size in the format of (T, C, Z, Y, X) = (1, 5, 32, 1024, 1224) 8. cardiac_tissue_H_and_E_processed.zip and Human_uterus_section_H_and_E_raw.zip include the raw PTI intensity and H&E images of the cardiac tissue and human uterus section: - data: 10 x 40 (pattern x z-slices) raw intensity images (TIFF) of the target with size of (2048, 2448) -> 4 channels of (1024, 1224), the last channel is for images acquired with LCD turned off (the light leakage needed to be subtracted from the data) - bg: - data: 10 (pattern) raw intensity images (TIFF) of the background with size of (2048, 2448) -> 4 channels of (1024, 1224) - cali_images.pckl: pickle file that contains calibration curves of the polarization channels for this dataset - fluor: 3 x 40 (RGB x z-slices) raw H&E intensity images (TIFF) of the sample with size of (2048, 2448) - fluor_bg: 3 (RGB) raw H&E intensity images (TIFF) of the background with size of (2048, 2448) 9. cardiac_tissue_H_and_E_processed.zip and Human_uterus_section_H_and_E_processed.zip include three zarr files that save the processed scattering potential tensor components, the processed physical properties, and the white-balanced H&E intensities of the cardiac tissue and human uterus section: - uPTI_stitched.zarr: (Stitched_f_tensor) array size in the format of (T, C, Z, Y, X) = (1, 9, 40, 1024, 1224) - uPTI_physical.zarr: (Stitched_physical) array size in the format of (T, C, Z, Y, X) = (1, 5, 40, 1024, 1224) - H_and_E.zarr: (H_and_E) array size in the format of (T, C, Z, Y, X) = (1, 3, 40, 1024, 1224)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is How to save tax 1999 edition. It features 7 columns including author, publication date, language, and book publisher.