Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is How to save tax 1999 edition. It features 7 columns including author, publication date, language, and book publisher.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
S.A.V.E is a dataset for object detection tasks - it contains Drowning Swimming annotations for 3,841 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Viet Thinh Le
Released under CC0: Public Domain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 2 rows and is filtered where the books is How to save inheritance tax. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence PCR amplicons derived from cDNA templates tagged with universal molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR and the use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Handling of the large datasets produced from SMRT-UMI sequencing was facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline), that automatically filters and parses reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination or early cycle PCR errors, resulting in highly accurate sequence datasets. The optimized SMRT-UMI sequencing method presented here represents a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus (HIV) quasispecies.
Methods
This serves as an overview of the analysis performed on PacBio sequence data that is summarized in Analysis Flowchart.pdf and was used as primary data for the paper by Westfall et al. "Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies"
Five different PacBio sequencing datasets were used for this analysis: M027, M2199, M1567, M004, and M005
For the datasets which were indexed (M027, M2199), CCS reads from PacBio sequencing files and the chunked_demux_config files were used as input for the chunked_demux pipeline. Each config file lists the different Index primers added during PCR to each sample. The pipeline produces one fastq file for each Index primer combination in the config. For example, in dataset M027 there were 3–4 samples using each Index combination. The fastq files from each demultiplexed read set were moved to the sUMI_dUMI_comparison pipeline fastq folder for further demultiplexing by sample and consensus generation with that pipeline. More information about the chunked_demux pipeline can be found in the README.md file on GitHub.
The demultiplexed read collections from the chunked_demux pipeline or CCS read files from datasets which were not indexed (M1567, M004, M005) were each used as input for the sUMI_dUMI_comparison pipeline along with each dataset's config file. Each config file contains the primer sequences for each sample (including the sample ID block in the cDNA primer) and further demultiplexes the reads to prepare data tables summarizing all of the UMI sequences and counts for each family (tagged.tar.gz) as well as consensus sequences from each sUMI and rank 1 dUMI family (consensus.tar.gz). More information about the sUMI_dUMI_comparison pipeline can be found in the paper and the README.md file on GitHub.
The consensus.tar.gz and tagged.tar.gz files were moved from sUMI_dUMI_comparison pipeline directory on the server to the Pipeline_Outputs folder in this analysis directory for each dataset and appended with the dataset name (e.g. consensus_M027.tar.gz). Also in this analysis directory is a Sample_Info_Table.csv containing information about how each of the samples was prepared, such as purification methods and number of PCRs. There are also three other folders: Sequence_Analysis, Indentifying_Recombinant_Reads, and Figures. Each has an .Rmd
file with the same name inside which is used to collect, summarize, and analyze the data. All of these collections of code were written and executed in RStudio to track notes and summarize results.
Sequence_Analysis.Rmd
has instructions to decompress all of the consensus.tar.gz files, combine them, and create two fasta files, one with all sUMI and one with all dUMI sequences. Using these as input, two data tables were created, that summarize all sequences and read counts for each sample that pass various criteria. These are used to help create Table 2 and as input for Indentifying_Recombinant_Reads.Rmd
and Figures.Rmd
. Next, 2 fasta files containing all of the rank 1 dUMI sequences and the matching sUMI sequences were created. These were used as input for the python script compare_seqs.py which identifies any matched sequences that are different between sUMI and dUMI read collections. This information was also used to help create Table 2. Finally, to populate the table with the number of sequences and bases in each sequence subset of interest, different sequence collections were saved and viewed in the Geneious program.
To investigate the cause of sequences where the sUMI and dUMI sequences do not match, tagged.tar.gz was decompressed and for each family with discordant sUMI and dUMI sequences the reads from the UMI1_keeping directory were aligned using geneious. Reads from dUMI families failing the 0.7 filter were also aligned in Genious. The uncompressed tagged folder was then removed to save space. These read collections contain all of the reads in a UMI1 family and still include the UMI2 sequence. By examining the alignment and specifically the UMI2 sequences, the site of the discordance and its case were identified for each family as described in the paper. These alignments were saved as "Sequence Alignments.geneious". The counts of how many families were the result of PCR recombination were used in the body of the paper.
Using Identifying_Recombinant_Reads.Rmd
, the dUMI_ranked.csv file from each sample was extracted from all of the tagged.tar.gz files, combined and used as input to create a single dataset containing all UMI information from all samples. This file dUMI_df.csv was used as input for Figures.Rmd.
Figures.Rmd
used dUMI_df.csv, sequence_counts.csv, and read_counts.csv as input to create draft figures and then individual datasets for eachFigure. These were copied into Prism software to create the final figures for the paper.
Save is a framework for implementing highly available network-accessible services. Save consists of a command-line utility and a small set of extensions for the existing Mon monitoring utility. Mon is a flexible command scheduler that has the ability to take various actions (called 'alerts') depending on the exit conditions of the periodic commands (called 'monitors') it executes. Save provides a set of monitors and alerts that execute within the Mon scheduler.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Save Reel is a dataset for object detection tasks - it contains Save Reel Btn annotations for 410 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
3D Print Save is a dataset for object detection tasks - it contains Spaghetti annotations for 286 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
SAVE Melanoma is a dataset for object detection tasks - it contains Ssss annotations for 2,690 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
This dataset includes the following data reported in the PTI paper (link). These datasets can be read and processed using the provided notebooks (link) with the waveorder package (link). The zarr arrays (live one level below Col_x in the zarr files) in these datasets can also be visualized with the python image viewer (napari). You will need the ome-zarr plugin in napari and drag the zarr array to the napari viewer. 1. Anisotropic_target_small.zip includes two zarr files that save the raw intensity images and processed physical properties of the small anisotropic target (double line-scan, 300-fs pulse duration): - Anisotropic_target_small_raw.zarr: array size in the format of (PolChannel, IllumChannel, Z, Y, X) = (4, 9, 96, 300, 300) - Anisotropic_target_small_processed.zarr: (Pos0 - Stitched_f_tensor) array size in the format of (T, C, Z, Y, X) = (1, 9, 96, 300, 300) (Pos1 - Stitched_physical) array size in the format of (T, C, Z, Y, X) = (1, 5, 96, 300, 300) 2. Anisotropic_target_raw.zip includes the raw intensity images of another anisotropic target (single line-scan, 500-fs pulse duration): - data: 9 x 96 (pattern x z-slices) raw intensity images (TIFF) of the target with size of (2048, 2448) -> 4 channels of (1024, 1224) - bg: - data: 9 (pattern) raw intensity images (TIFF) of the background with size of (2048, 2448) -> 4 channels of (1024, 1224) - cali_images.pckl: pickle file that contains calibration curves of the polarization channels for this dataset 3. Anisotropic_target_processed.zip includes two zarr files that save the processed scattering potential tensor components and the processed physical properties of the anisotropic target (single line-scan, 500-fs pulse duration): - uPTI_stitched.zarr: (Stitched_f_tensor) array size in the format of (T, C, Z, Y, X) = (1, 9, 96, 1024, 1224) - uPTI_physical.zarr: (Stitched_physical) array size in the format of (T, C, Z, Y, X) = (1, 5, 96, 700, 700) (cropping the star target region) 4. Mouse_brain_aco_raw.zip includes the raw intensity images of the mouse brain section at aco region: - data: 9 x 96 (pattern x z-slices) raw intensity images (TIFF) of the mouse brain section with size of (2048, 2448) -> 4 channels of (1024, 1224) - bg: - data: 9 (pattern) raw intensity images (TIFF) of the background with size of (2048, 2448) -> 4 channels of (1024, 1224) - cali_images.pckl: pickle file that contains calibration curves of the polarization channels for this dataset 5. Mouse_brain_aco_processed.zip includes two zarr files that save the processed scattering potential tensor components and the processed physical properties of the mouse brain section at aco region: - uPTI_stitched.zarr: (Stitched_f_tensor) array size in the format of (T, C, Z, Y, X) = (1, 9, 96, 1024, 1224) - uPTI_physical.zarr: (Stitched_physical) array size in the format of (T, C, Z, Y, X) = (1, 5, 96, 1024, 1224) 6. Cardiomyocytes_(condition)_raw.zip includes two zarr files that save the raw PTI intensity images and the deconvolved fluorescence images of the cardiomyocytes with the specified (condition): - Cardiomyocytes_(condition)_raw.zarr: (Pos0) raw intensity images with the array size in the format of (PolChannel, IllumChannel, Z, Y, X) = (4, 9, 32, 1024, 1224) (Pos1) background intensity images with the array size in the format of (PolChannel, IllumChannel, Z, Y, X) = (4, 9, 1, 1024, 1224) - Cardiomyocytes_(condition)_fluor_decon.zarr: deconvolved fluorescence images with the array size in the format of (T, C, Z, Y, X) = (1, 3, 32, 1024, 1224) 7. Cardiomyocytes_(condition)_processed.zip includes two zarr files that save the processed scattering potential tensor components and the processed physical properties of the cardiomyocytes with the specified (condition): - uPTI_stitched.zarr: (Stitched_f_tensor) array size in the format of (T, C, Z, Y, X) = (1, 9, 32, 1024, 1224) - uPTI_physical.zarr: (Stitched_physical) array size in the format of (T, C, Z, Y, X) = (1, 5, 32, 1024, 1224) 8. cardiac_tissue_H_and_E_processed.zip and Human_uterus_section_H_and_E_raw.zip include the raw PTI intensity and H&E images of the cardiac tissue and human uterus section: - data: 10 x 40 (pattern x z-slices) raw intensity images (TIFF) of the target with size of (2048, 2448) -> 4 channels of (1024, 1224), the last channel is for images acquired with LCD turned off (the light leakage needed to be subtracted from the data) - bg: - data: 10 (pattern) raw intensity images (TIFF) of the background with size of (2048, 2448) -> 4 channels of (1024, 1224) - cali_images.pckl: pickle file that contains calibration curves of the polarization channels for this dataset - fluor: 3 x 40 (RGB x z-slices) raw H&E intensity images (TIFF) of the sample with size of (2048, 2448) - fluor_bg: 3 (RGB) raw H&E intensity images (TIFF) of the background with size of (2048, 2448) 9. cardiac_tissue_H_and_E_processed.zip and Human_uterus_section_H_and_E_processed.zip include three zarr files that save the processed scattering potential tensor components, the processed physical properties, and the white-balanced H&E intensities of the cardiac tissue and human uterus section: - uPTI_stitched.zarr: (Stitched_f_tensor) array size in the format of (T, C, Z, Y, X) = (1, 9, 40, 1024, 1224) - uPTI_physical.zarr: (Stitched_physical) array size in the format of (T, C, Z, Y, X) = (1, 5, 40, 1024, 1224) - H_and_E.zarr: (H_and_E) array size in the format of (T, C, Z, Y, X) = (1, 3, 40, 1024, 1224)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is How are you going to save yourself. It features 7 columns including author, publication date, language, and book publisher.
Movie based on the 1987 event of flooding in Central Texas.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For more details, please refer to our paper: Nihal, R. A., et al. "UAV-Enhanced Combination to Application: Comprehensive Analysis and Benchmarking of a Human Detection Dataset for Disaster Scenarios." ICPR 2024 (Accepted), arXiv preprint arXiv (2024).
and the github repo https://github.com/Ragib-Amin-Nihal/C2A
We encourage users to cite this paper when using the dataset for their research or applications.
The C2A (Combination to Application) Dataset is a resource designed to advance human detection in disaster scenarios using UAV imagery. This dataset addresses a critical gap in the field of computer vision and disaster response by providing a large-scale, diverse collection of synthetic images that combine real disaster scenes with human poses.
Context: In the wake of natural disasters and emergencies, rapid and accurate human detection is crucial for effective search and rescue operations. UAVs (Unmanned Aerial Vehicles) have emerged as powerful tools in these scenarios, but their effectiveness is limited by the lack of specialized datasets for training AI models. The C2A dataset aims to bridge this gap, enabling the development of more robust and accurate human detection systems for disaster response.
Sources: The C2A dataset is a synthetic combination of two primary sources: 1. Disaster Backgrounds: Sourced from the AIDER (Aerial Image Dataset for Emergency Response Applications) dataset, providing authentic disaster scene imagery. 2. Human Poses: Derived from the LSP/MPII-MPHB (Multiple Poses Human Body) dataset, offering a wide range of human body positions.
Key Features: - 10,215 high-resolution images - Over 360,000 annotated human instances - 5 human pose categories: Bent, Kneeling, Lying, Sitting, and Upright - 4 disaster scenario types: Fire/Smoke, Flood, Collapsed Building/Rubble, and Traffic Accidents - Image resolutions ranging from 123x152 to 5184x3456 pixels - Bounding box annotations for each human instance
Inspiration: This dataset was inspired by the pressing need to improve the capabilities of AI-assisted search and rescue operations. By providing a diverse and challenging set of images that closely mimic real-world disaster scenarios, we aim to: 1. Enhance the accuracy of human detection algorithms in complex environments 2. Improve the generalization of models across various disaster types and human poses 3. Accelerate the development of AI systems that can assist first responders and save lives
Applications: The C2A dataset is designed for researchers and practitioners in: - Computer Vision and Machine Learning - Disaster Response and Emergency Management - UAV/Drone Technology - Search and Rescue Operations - Humanitarian Aid and Crisis Response
We hope this dataset will inspire innovative approaches to human detection in challenging environments and contribute to the development of technologies that can make a real difference in disaster response efforts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Save The Great Barrier Reef is a dataset for object detection tasks - it contains Starfish annotations for 8,332 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Waste detection in the desert using pre-trained Yolo11
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Open Poetry Vision
dataset is a synthetic dataset created by Roboflow for OCR tasks.
It combines a random image from the Open Images Dataset with text primarily sampled from Gwern's GPT-2 Poetry project. Each image in the dataset contains between 1 and 5 strings in a variety of fonts and colors randomly positioned in the 512x512 canvas. The classes correspond to the font of the text.
Example Image:
https://i.imgur.com/sZT516a.png" alt="Example Image">
A common OCR workflow is to use a neural network to isolate text for input into traditional optical character recognition software. This dataset could make a good starting point for an OCR project like business card parsing or automated paper form-processing.
Alternatively, you could try your hand using this as a neural font identification dataset. Nvidia, amongst others, have had success with this task.
Use the fork
button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
This dataset was created by Meng-Hsuan Liu
Demo to save data from a Space to a Dataset. Goal is to provide reusable snippets of code.
Documentation: https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#scheduled-uploads Space: https://huggingface.co/spaces/Wauplin/space_to_dataset_saver/ JSON dataset: https://huggingface.co/datasets/Wauplin/example-commit-scheduler-json Image dataset: https://huggingface.co/datasets/Wauplin/example-commit-scheduler-image Image (zipped) dataset:… See the full description on the dataset page: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-image-zip.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is How to save tax 1999 edition. It features 7 columns including author, publication date, language, and book publisher.