bongo2112/mulokoziepk-dreambooth-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Timerkhanov Yuriy
Released under CC0: Public Domain
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
What this collection is: A curated, binary-classified image dataset of grayscale (1 band) 400 x 400-pixel size, or image chips, in a JPEG format extracted from processed Sentinel-1 Synthetic Aperture Radar (SAR) satellite scenes acquired over various regions of the world, and featuring clear open ocean chips, look-alikes (wind or biogenic features) and oil slick chips.
This binary dataset contains chips labelled as:
- "0" for chips not containing any oil features (look-alikes or clean seas)
- "1" for those containing oil features.
This binary dataset is imbalanced, and biased towards "0" labelled chips (i.e., no oil features), which correspond to 66% of the dataset. Chips containing oil features, labelled "1", correspond to 34% of the dataset.
Why: This dataset can be used for training, validation and/or testing of machine learning, including deep learning, algorithms for the detection of oil features in SAR imagery. Directly applicable for algorithm development for the European Space Agency Sentinel-1 SAR mission (https://sentinel.esa.int/web/sentinel/missions/sentinel-1 ), it may be suitable for the development of detection algorithms for other SAR satellite sensors.
Overview of this dataset: Total number of chips (both classes) is N=5,630 Class 0 1 Total 3,725 1,905
Further information and description is found in the ReadMe file provided (ReadMe_Sentinel1_SAR_OilNoOil_20221215.txt)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.
Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
andersonbcdefg/openai-moderation-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
https://spdx.org/licenses/MIT.htmlhttps://spdx.org/licenses/MIT.html
The authors introduced DeepFish Dataset as a benchmark suite accompanied by a vast dataset tailored for training and evaluating various computer vision tasks. This dataset comprises roughly 40,000 images captured underwater across 20 distinct habitats in the tropical waters of Australia. Initially, the dataset solely featured classification labels. However, recognizing the need for a more comprehensive fish analysis benchmark, the authors augmented it by collecting segmentation labels. These labels empower models to autonomously monitor fish populations, pinpoint their locations, and estimate their sizes, thereby enhancing the dataset's utility for diverse analytical purposes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Home Objects is a dataset for object detection tasks - it contains Objects annotations for 4,467 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
The U.S. Geological Survey has developed a National Elevation Database (NED). The NED is a seamless mosaic of best-available elevation data. The 7.5-minute elevation data for the conterminous United States are the primary initial source data. In addition to the availability of complete 7.5-minute data, efficient processing methods were developed to filter production artifacts in the existing data, convert to the NAD83 datum, edge-match, and fill slivers of missing data at quadrangle seams. One of the effects of the NED processing steps is a much-improved base of elevation data for calculating slope and hydrologic derivatives.
The Amazon review dataset is used for multi-source domain adaptation. It contains review texts and ratings of bought products. Products are grouped into categories. Following [66, 51], we perform tf-idf transformation and select the top 1, 000 frequent words. Ratings are used as the target labels.
The HH-RLHF dataset is used to evaluate the performance of the proposed Compositional Preference Models (CPMs).
Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features
Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.
Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases
Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.
Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Autorace is a dataset for object detection tasks - it contains Sign annotations for 442 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Reorganized version of Wild-Heart/Disney-VideoGeneration-Dataset. This is needed for Mochi-1 fine-tuning.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Layout planning is centrally important in the field of architecture and urban design. Among the various basic units carrying urban functions, residential community plays a vital part for supporting human life. Therefore, the layout planning of residential community has always been of concern, and has attracted particular attention since the advent of deep learning that facilitates the automated layout generation and spatial pattern recognition. However, the research circles generally suffer from the insufficiency of residential community layout benchmark or high-quality datasets, which hampers the future exploration of data-driven methods for residential community layout planning. The lack of datasets is largely due to the difficulties of large-scale real-world residential data acquisition and long-term expert screening. In order to address the issues and advance a benchmark dataset for various intelligent spatial design and analysis applications in the development of smart city, we introduce Residential Community Layout Planning (ReCo) Dataset, which is the first and largest open-source vector dataset related to real-world community to date. ReCo Dataset is presented in multiple data formats with 37,646 residential community layout plans, covering 598,728 residential buildings with height information. ReCo can be conveniently adapted for residential community layout related urban design tasks, e.g., generative layout design, morphological pattern recognition and spatial evaluation. To validate the utility of ReCo in automated residential community layout planning, two Generative Adversarial Network (GAN) based generative models are further applied to the dataset. We expect ReCo Dataset to inspire more creative and practical work in intelligent design and beyond.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
After May 3, 2024, this dataset and webpage will no longer be updated because hospitals are no longer required to report data on COVID-19 hospital admissions, and hospital capacity and occupancy data, to HHS through CDC’s National Healthcare Safety Network. Data voluntarily reported to NHSN after May 1, 2024, will be available starting May 10, 2024, at COVID Data Tracker Hospitalizations.
This time series dataset includes viral COVID-19 laboratory test [Polymerase chain reaction (PCR)] results from over 1,000 U.S. laboratories and testing locations including commercial and reference laboratories, public health laboratories, hospital laboratories, and other testing locations. Data are reported to state and jurisdictional health departments in accordance with applicable state or local law and in accordance with the Coronavirus Aid, Relief, and Economic Security (CARES) Act (CARES Act Section 18115).
Data are provisional and subject to change.
Data presented here is representative of diagnostic specimens being tested - not individual people - and excludes serology tests where possible. Data presented might not represent the most current counts for the most recent 3 days due to the time it takes to report testing information. The data may also not include results from all potential testing sites within the jurisdiction (e.g., non-laboratory or point of care test sites) and therefore reflect the majority, but not all, of COVID-19 testing being conducted in the United States.
Sources: CDC COVID-19 Electronic Laboratory Reporting (CELR), Commercial Laboratories, State Public Health Labs, In-House Hospital Labs
Data for each state is sourced from either data submitted directly by the state health department via COVID-19 electronic laboratory reporting (CELR), or a combination of commercial labs, public health labs, and in-house hospital labs. Data is taken from CELR for states that either submit line level data or submit aggregate counts which do not include serology tests.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The records in this dataset are general marine and coastal records of different taxonomic groups submitted to the National Biodiversity Data Centre.
In this document, comprehensive datasets are presented to advance research on information security breaches. The datasets include data on disclosed information security breaches affecting S&P500 companies between 2020 and 2023, collected through manual search of the Internet. Overall, the datasets include 504 companies, with detailed information security breach and financial data available for 97 firms that experienced a disclosed information security breach. This document will describe the datasets in detail, explain the data collection procedure and shows the initial versions of the datasets. Contact at Tilburg University Francesco Lelli Data files: 6 raw Microsoft Excel files (.xls) Supplemental material: Data_Publication_Package.pdf Detailed description of the data has been released in the following preprint: [Preprint in progress] Structure data package The folder contains the 6 .xls documents, the data publication package. Link to the preprint describing the dataset is in the description of the dataset itself. The six .xls documents are also present in their preferred file format csv (see Notes for further explanation).
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
*********************************
Disclaimer: PLEASE DO NOT USE THIS VERSION (Data has been made unavailable)
Another version, version 2.0.0, of this data set is available here, which contains the original data (+ details on the data set). The version published on the current page contains EEG data that has been pre-processed with a data-driven artifact removal procedure on a trial-by-trial basis. This can introduce slight differences across trials that can possibly be exploited by classifiers. Therefore, for proper benchmarking of auditory attention decoding algorithms that can learn to use such cues, it is recommended to NOT use this pre-processed data set, and instead use the unprocessed version 1.1.0 (the original data).
*********************************
This version of the data set was used in
Das, N., Vanthornhout, J., Francart, T., & Bertrand, A. “Stimulus-aware spatial filtering for single-trial neural response and temporal response function estimation in high-density EEG with applications in auditory research.” NeuroImage, Volume 204, 116211, Jan. 2020
https://doi.org/10.1016/j.neuroimage.2019.116211
The results and conclusions in the paper are not affected by the trial-by-trial application of the artifact removal procedure.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Winnetka by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Winnetka across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of male population, with 50.34% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Winnetka Population by Race & Ethnicity. You can refer the same here
In this paper, we introduce a victim dataset for the RoboCup Rescue competitions. The RoboCup Rescue robots have to collect points within several disciplines, e.g. a search task within an area to survey simulated baby doll (victim).
When a robot comes across a victim, a heat detector does not completely proof if this is a living being and not just a heat emitting somewhat else. Further investigations are necessary so that a face detection could prove the existence of a victim. Lots of face detection approaches can be found in literature, which manly are used for human face recognition. These cannot be straightforward used for victim faces which are, in case of the RoboCup Rescue competitions, typically dolls. Thus we present the results of standard approaches and developed an own approach via bag-of-visual-words (BoVW).
bongo2112/mulokoziepk-dreambooth-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community