https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
JackLiuAngel/alfred-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
The US-4 is a dataset of Ultrasound (US) images. It is a video-based image dataset that contains over 23,000 high-resolution images from four US video sub-datasets, where two sub-datasets are newly collected by experienced doctors for this dataset.
themex1380/sp-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Waste Classification and Recycling: Industries or municipal bodies could employ this model to automatically sort waste into paper or plastic categories, facilitating more efficient recycling processes.
Environmental Protection: Various organizations or government departments might use the model for capturing and monitoring plastic waste in public areas or natural environments, helping to measure pollution levels.
Retail and Supermarkets: It could be integrated into self-service checkout systems to automatically identify the difference between plastic and paper packaging, allowing for potential pricing differences or recycling initiatives.
Education and Research: Teachers, students, and researchers can use it as a practical tool for exploring machine learning or environmental sciences and promoting the importance of waste separation.
Smart Home Integration: The model could be integrated into a smart home system to guide residents in sorting their trash accurately and educating them on recycling.
The SciTail dataset is an entailment dataset created from multiple-choice science exams and web sentences. Each question and the correct answer choice are converted into an assertive statement to form the hypothesis. Information retrieval is used to obtain relevant text from a large text corpus of web sentences, and these sentences are used as a premise P. The annotation of such premise-hypothesis pair is crowdsourced as supports (entails) or not (neutral), in order to create the SciTail dataset. The dataset contains 27,026 examples with 10,101 examples with entails label and 16,925 examples with neutral label.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('sci_tail', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
The CMU CoNaLa, the Code/Natural Language Challenge dataset is a joint project from the Carnegie Mellon University NeuLab and Strudel labs. Its purpose is for testing the generation of code snippets from natural language. The data comes from StackOverflow questions. There are 2379 training and 500 test examples that were manually annotated. Every example has a natural language intent and its corresponding python snippet. In addition to the manually annotated dataset, there are also 598,237 mined intent-snippet pairs. These examples are similar to the hand-annotated ones except that they contain a probability if the pair is valid.
The dataset contains, by Census block-group, the variables used in the analysis and the resulting ranks and scores derived as described in the manuscript text. This dataset is associated with the following publication: Almeter, A., A. Tashie, A. Proctor, T. McAlexander, D. Browning, C. Rudder, L. Jackson, and R. Araujo. A Needs-Driven, Multi-Objective Approach to Allocate Urban Ecosystem Services from 10,000 Trees. Sustainability. MDPI AG, Basel, SWITZERLAND, 10(12): 4488, (2018).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Dalle3 1 Million+ High Quality Captions
Alt name: Human Preference Synthetic Dataset
Example grids for landscapes, cats, creatures, and fantasy are also available.
Description:
This dataset comprises of AI-generated images sourced from various websites and individuals, primarily focusing on Dalle 3 content, along with contributions from other AI systems of sufficient quality like Stable Diffusion and Midjourney (MJ v5 and above). As users… See the full description on the dataset page: https://huggingface.co/datasets/ProGamerGov/synthetic-dataset-1m-dalle3-high-quality-captions.
All-City event calendar - ARCHIVED For the new LA City Events dataset (refreshed daily), see https://data.lacity.org/A-Prosperous-City/LA-City-Events/rx9t-fp7k
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Prasad V Patil
Released under CC0: Public Domain
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Figo
Released under Apache 2.0
This repository contains programming data collected from 15 students during November and December of 2019 at Bielefeld University. Students were asked to implement gradient descent. Note that this data set contains only source code snapshots and neither timestamps nor personal information. All students programmed in a web environment, which is also contained in this repository.
This dataset was created by Leo Arruda
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The UF Fish Collection, dating to 1917, contains 214,205 lots and 2,300,803 specimens. Included are representatives of 8,250 species from 400 families. The collection includes 93 primary types and approximately 1,600 lots of secondary types representing 563 species. Also in the collection are 5,825 specimens of disarticulated and articulated skeletons representing 875 species. Especially notable are historic collections of large and important marine fishes as well as rapidly growing collections of freshwater fishes from Southeast Asia. In 2006, the museum expanded its program to archive frozen tissue samples with a newly established UF Genetic Resources Collection. Tissues of fishes are stored in -20ºC freezers and number 4,150 samples of 900 species. All specimens and tissues are databased online and available for loan.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Minneola by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Minneola across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of male population, with 52.37% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Minneola Population by Race & Ethnicity. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains annotated pictures of animals (like wild pigs and deer) from trail cameras in East Texas.
You can use this dataset and the detection API to create computer vision applications for hunting, monitoring animal population health, counting deer sightings, and more!
Automatically filter through hours of trail cam footage to find the times/frames when wild game is caught on camera.
This dataset contains shapefile boundaries for CA State, counties and places from the US Census Bureau's 2023 MAF/TIGER database. Current geography in the 2023 TIGER/Line Shapefiles generally reflects the boundaries of governmental units in effect as of January 1, 2023.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The University of California, Santa Barbara (UCSB) Herbarium has approximately 120,000 herbarium specimens of vascular plants, lichens, bryophytes, and marine macroalgae. The herbarium is housed at the Cheadle Center for Biodiversity and Ecological Restoration on the campus of UCSB. The vascular plant collection consist mainly of specimens from Santa Barbara County, including the northern Channel Islands, with additional collections from San Luis Obispo, Kern, and Ventura Counties, the southern Sierra Nevada region, southern California, and northern Mexico. Special collections include the J. R. Haller pine collection (5,000 specimens), with emphasis on population-level sampling of many western North American pine species, and the Cornelius H. Muller oak collection, with ca. 7,000 specimens from the USA and Mexico. Also conserved in the herbarium are ca. 69,000 slide preparations and spirit collections of Vernon I. Cheadle and Katherine Esau. There are 43 type specimens of plants and marine macroalgae. Incorporated collections include the Santa Rosa Island Reserve (SCIR) herbarium (1,500) and the marine macroalgae of the Santa Barbara Museum of Natural History (1,035), which contains some of the earliest collections of California seaweeds. Greg Wahlert is the current collections manager. Taxonomy and nomenclature follow the second edition of the Jepson Manual (Baldwin et al., 2012). Financial assistance with digitization efforts is provided in part by the UCSB Coastal Fund.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Our Signing in the Wild dataset consists of various videos harvested from YouTube containing people signing in various sign languages and doing so in diverse settings, environments, under complex signer and camera motion, and even group signing. This dataset is intended to be used for sign language detection. For the negative set, we created two classes of videos, labelled ‘speaking’ and ‘other’. Our motivation for the ‘speaking’ class is that speech is often accompanied by hand gestures (gesticulation), which can be easily confused with signing. Signing can be discriminated by its linguistic nature, i.e., its distinct phonological, morphological and categorical (discrete) structures, while gesticulation tends to be more spontaneous, idiosyncratic and analogue in nature. For the ‘other’ class, we looked for distractors to both ‘signing’ and ‘speaking’, i.e., videos containing hand movements that are quite similar to signing/gesticulation and thus might confuse a classifier. Examples include: miming, hand exercises, various manual activities like playing instruments, painting, writing, yoga and martial arts, sports like table tennis, etc. Also included are activities similar to speech, like people laughing, clapping, nodding, listening to other speakers, etc. A total of 1120 videos are included in our dataset, each video contributing the first 6.6 minutes, resulting in 2000 frames per video when sampled at 5Hz. We have 1.45 million video frames in total. Our videos are untrimmed, i.e., a video can contain multiple activities, background scenes, scene cuts, and other actions done by the same or different actors. Thus the videos are unconstrained both spatially and also temporally. This is in line with recent trends in video action recognition [8], and unlike ASLR datasets where trimmed videos are the norm. In particular, several videos in our dataset contain all 3 classes (occasionally with temporal overlap), and sometimes the same person alternating between signing and speaking. We performed manual groundtruthing at video frame level. Since action boundaries can be inherently fuzzy, we consider a short temporal context (10 frames) surrounding the frame to be labelled in order to decide on its class label. We also adopt certain spatial guidelines, e.g. mouth movements must be visible for action ‘speaking’, thus eliminating distant views and when the speaker turns his/her back to the camera. Ambiguous cases are left unlabelled. We annotate video segments that do not contain signing or speaking as ‘other’, including opening/closing credits, title screens, scene transitions, animations, background scenes, etc. If you find this dataset useful, please cite the following paper: Mark Borg, Kenneth P. Camilleri, "Sign Language Detection "In The Wild" With Recurrent Neural Networks", ICASSP 2019. Any comments, suggestions, feedback are welcome: mborg2005 gmail com
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
JackLiuAngel/alfred-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community