U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
California State Income Limits reflect updated median income and household income levels for acutely low-, extremely low-, very low-, low- and moderate-income households for California’s 58 counties (required by Health and Safety Code Section 50093). These income limits apply to State and local affordable housing programs statutorily linked to HUD income limits and differ from income limits applicable to other specific federal, State, or local programs.
Dataset used in World Bank Policy Research Working Paper #2876, published in World Bank Economic Review, No. 1, 2005, pp. 21-44.
The effects of globalization on income distribution in rich and poor countries are a matter of controversy. While international trade theory in its most abstract formulation implies that increased trade and foreign investment should make income distribution more equal in poor countries and less equal in rich countries, finding these effects has proved elusive. The author presents another attempt to discern the effects of globalization by using data from household budget surveys and looking at the impact of openness and foreign direct investment on relative income shares of low and high deciles. The author finds some evidence that at very low average income levels, it is the rich who benefit from openness. As income levels rise to those of countries such as Chile, Colombia, or Czech Republic, for example, the situation changes, and it is the relative income of the poor and the middle class that rises compared with the rich. It seems that openness makes income distribution worse before making it better-or differently in that the effect of openness on a country's income distribution depends on the country's initial income level.
Aggregate data [agg]
Data DescriptionThe CADDI dataset is designed to support research in in-class activity recognition using IMU data from low-cost sensors. It provides multimodal data capturing 19 different activities performed by 12 participants in a classroom environment, utilizing both IMU sensors from a Samsung Galaxy Watch 5 and synchronized stereo camera images. This dataset enables the development and validation of activity recognition models using sensor fusion techniques.Data Generation ProceduresThe data collection process involved recording both continuous and instantaneous activities that typically occur in a classroom setting. The activities were captured using a custom setup, which included:A Samsung Galaxy Watch 5 to collect accelerometer, gyroscope, and rotation vector data at 100Hz.A ZED stereo camera capturing 1080p images at 25-30 fps.A synchronized computer acting as a data hub, receiving IMU data and storing images in real-time.A D-Link DSR-1000AC router for wireless communication between the smartwatch and the computer.Participants were instructed to arrange their workspace as they would in a real classroom, including a laptop, notebook, pens, and a backpack. Data collection was performed under realistic conditions, ensuring that activities were captured naturally.Temporal and Spatial ScopeThe dataset contains a total of 472.03 minutes of recorded data.The IMU sensors operate at 100Hz, while the stereo camera captures images at 25-30Hz.Data was collected from 12 participants, each performing all 19 activities multiple times.The geographical scope of data collection was Alicante, Spain, under controlled indoor conditions.Dataset ComponentsThe dataset is organized into JSON and PNG files, structured hierarchically:IMU Data: Stored in JSON files, containing:Samsung Linear Acceleration Sensor (X, Y, Z values, 100Hz)LSM6DSO Gyroscope (X, Y, Z values, 100Hz)Samsung Rotation Vector (X, Y, Z, W quaternion values, 100Hz)Samsung HR Sensor (heart rate, 1Hz)OPT3007 Light Sensor (ambient light levels, 5Hz)Stereo Camera Images: High-resolution 1920×1080 PNG files from left and right cameras.Synchronization: Each IMU data record and image is timestamped for precise alignment.Data StructureThe dataset is divided into continuous and instantaneous activities:Continuous Activities (e.g., typing, writing, drawing) were recorded for 210 seconds, with the central 200 seconds retained.Instantaneous Activities (e.g., raising a hand, drinking) were repeated 20 times per participant, with data captured only during execution.The dataset is structured as:/continuous/subject_id/activity_name/ /camera_a/ → Left camera images /camera_b/ → Right camera images /sensors/ → JSON files with IMU data
/instantaneous/subject_id/activity_name/repetition_id/ /camera_a/ /camera_b/ /sensors/ Data Quality & Missing DataThe smartwatch buffers 100 readings per second before sending them, ensuring minimal data loss.Synchronization latency between the smartwatch and the computer is negligible.Not all IMU samples have corresponding images due to different recording rates.Outliers and anomalies were handled by discarding incomplete sequences at the start and end of continuous activities.Error Ranges & LimitationsSensor data may contain noise due to minor hand movements.The heart rate sensor operates at 1Hz, limiting its temporal resolution.Camera exposure settings were automatically adjusted, which may introduce slight variations in lighting.File Formats & Software CompatibilityIMU data is stored in JSON format, readable with Python’s json library.Images are in PNG format, compatible with all standard image processing tools.Recommended libraries for data analysis:Python: numpy, pandas, scikit-learn, tensorflow, pytorchVisualization: matplotlib, seabornDeep Learning: Keras, PyTorchPotential ApplicationsDevelopment of activity recognition models in educational settings.Study of student engagement based on movement patterns.Investigation of sensor fusion techniques combining visual and IMU data.This dataset represents a unique contribution to activity recognition research, providing rich multimodal data for developing robust models in real-world educational environments.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025caddiinclassactivitydetection, title={CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Monica Pina-Navarro and Miguel Cazorla and Francisco Gomez-Donoso}, year={2025}, eprint={2503.02853}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.02853}, }
Low income cut-offs (LICOs) before and after tax by community size and family size, in current dollars, annual.
Bats play crucial ecological roles and provide valuable ecosystem services, yet many populations face serious threats from various ecological disturbances. The North American Bat Monitoring Program (NABat) aims to assess status and trends of bat populations while developing innovative and community-driven conservation solutions using its unique data and technology infrastructure. To support scalability and transparency in the NABat acoustic data pipeline, we developed a fully-automated machine-learning algorithm. This dataset includes audio files of bat echolocation calls that were considered to develop V1.0 of the NABat machine-learning algorithm, however the test set (i.e., holdout dataset) has been excluded from this release. These recordings were collected by various bat monitoring partners across North America using ultrasonic acoustic recorders for stationary acoustic and mobile acoustic surveys. For more information on how these surveys may be conducted, see Chapters 4 and 5 of “A Plan for the North American Bat Monitoring Program” (https://doi.org/10.2737/SRS-GTR-208). These data were then post-processed by bat monitoring partners to remove noise files (or those that do not contain recognizable bat calls) and apply a species label to each file. There is undoubtedly variation in the steps that monitoring partners take to apply a species label, but the steps documented in “A Guide to Processing Bat Acoustic Data for the North American Bat Monitoring Program” (https://doi.org/10.3133/ofr20181068) include first processing with an automated classifier and then manually reviewing to confirm or downgrade the suggested species label. Once a manual ID label was applied, audio files of bat acoustic recordings were submitted to the NABat database in Waveform Audio File format. From these available files in the NABat database, we considered files from 35 classes (34 species and a noise class). Files for 4 species were excluded due to low sample size (Corynorhinus rafinesquii, N=3; Eumops floridanus, N =3; Lasiurus xanthinus, N = 4; Nyctinomops femorosaccus, N =11). From this pool, files were randomly selected until files for each species/grid cell combination were exhausted or the number of recordings reach 1250. The dataset was then randomly split into training, validation, and test sets (i.e., holdout dataset). This data release includes all files considered for training and validation, including files that had been excluded from model development and testing due to low sample size for a given species or because the threshold for species/grid cell combinations had been met. The test set (i.e., holdout dataset) is not included. Audio files are grouped by species, as indicated by the four-letter species code in the name of each folder. Definitions for each four-letter code, including Family, Genus, Species, and Common name, are also included as a dataset in this release.
For detailed information, visit the Tucson Equity Priority Index StoryMap.Download the layer's data dictionaryWhat is the Tucson Equity Priority Index (TEPI)?The Tucson Equity Priority Index (TEPI) is a tool that describes the distribution of socially vulnerable demographics. It categorizes the dataset into 5 classes that represent the differing prioritization needs based on the presence of social vulnerability: Low (0-20), Low-Moderate (20-40), Moderate (40-60), Moderate-High (60-80) High (80-100). Each class represents 20% of the dataset’s features in order of their values. The features within the Low (0-20) classification represent the areas that, when compared to all other locations in the study area, have the lowest need for prioritization, as they tend to have less socially vulnerable demographics. The features that fall into the High (80-100) classification represent the 20% of locations in the dataset that have the greatest need for prioritization, as they tend to have the highest proportions of socially vulnerable demographics. How is social vulnerability measured?The Tucson Equity Priority Index (TEPI) examines the proportion of vulnerability per feature using 11 demographic indicators:Income Below Poverty: Households with income at or below the federal poverty level (FPL), which in 2023 was $14,500 for an individual and $30,000 for a family of fourUnemployment: Measured as the percentage of unemployed persons in the civilian labor forceHousing Cost Burdened: Homeowners who spend more than 30% of their income on housing expenses, including mortgage, maintenance, and taxesRenter Cost Burdened: Renters who spend more than 30% of their income on rentNo Health Insurance: Those without private health insurance, Medicare, Medicaid, or any other plan or programNo Vehicle Access: Households without automobile, van, or truck accessHigh School Education or Less: Those highest level of educational attainment is a High School diploma, equivalency, or lessLimited English Ability: Those whose ability to speak English is "Less Than Well."People of Color: Those who identify as anything other than Non-Hispanic White Disability: Households with one or more physical or cognitive disabilities Age: Groups that tend to have higher levels of vulnerability, including children (those below 18), and seniors (those 65 and older)An overall percentile value is calculated for each feature based on the total proportion of the above indicators in each area. How are the variables combined?These indicators are divided into two main categories that we call Thematic Indices: Economic and Personal Characteristics. The two thematic indices are further divided into five sub-indices called Tier-2 Sub-Indices. Each Tier-2 Sub-Index contains 2-3 indicators. Indicators are the datasets used to measure vulnerability within each sub-index. The variables for each feature are re-scaled using the percentile normalization method, which converts them to the same scale using values between 0 to 100. The variables are then combined first into each of the five Tier-2 Sub-Indices, then the Thematic Indices, then the overall TEPI using the mean aggregation method and equal weighting. The resulting dataset is then divided into the five classes, where:High Vulnerability (80-100%): Representing the top classification, this category includes the highest 20% of regions that are the most socially vulnerable. These areas require the most focused attention. Moderate-High Vulnerability (60-80%): This upper-middle classification includes areas with higher levels of vulnerability compared to the median. While not the highest, these areas are more vulnerable than a majority of the dataset and should be considered for targeted interventions. Moderate Vulnerability (40-60%): Representing the middle or median quintile, this category includes areas of average vulnerability. These areas may show a balanced mix of high and low vulnerability. Detailed examination of specific indicators is recommended to understand the nuanced needs of these areas. Low-Moderate Vulnerability (20-40%): Falling into the lower-middle classification, this range includes areas that are less vulnerable than most but may still exhibit certain vulnerable characteristics. These areas typically have a mix of lower and higher indicators, with the lower values predominating. Low Vulnerability (0-20%): This category represents the bottom classification, encompassing the lowest 20% of data points. Areas in this range are the least vulnerable, making them the most resilient compared to all other features in the dataset.
For detailed information, visit the Tucson Equity Priority Index StoryMap.Download the layer's data dictionaryNote: This layer is symbolized to display the percentile distribution of the Limited Resources Sub-Index. However, it includes all data for each indicator and sub-index within the citywide census tracts TEPI.What is the Tucson Equity Priority Index (TEPI)?The Tucson Equity Priority Index (TEPI) is a tool that describes the distribution of socially vulnerable demographics. It categorizes the dataset into 5 classes that represent the differing prioritization needs based on the presence of social vulnerability: Low (0-20), Low-Moderate (20-40), Moderate (40-60), Moderate-High (60-80) High (80-100). Each class represents 20% of the dataset’s features in order of their values. The features within the Low (0-20) classification represent the areas that, when compared to all other locations in the study area, have the lowest need for prioritization, as they tend to have less socially vulnerable demographics. The features that fall into the High (80-100) classification represent the 20% of locations in the dataset that have the greatest need for prioritization, as they tend to have the highest proportions of socially vulnerable demographics. How is social vulnerability measured?The Tucson Equity Priority Index (TEPI) examines the proportion of vulnerability per feature using 11 demographic indicators:Income Below Poverty: Households with income at or below the federal poverty level (FPL), which in 2023 was $14,500 for an individual and $30,000 for a family of fourUnemployment: Measured as the percentage of unemployed persons in the civilian labor forceHousing Cost Burdened: Homeowners who spend more than 30% of their income on housing expenses, including mortgage, maintenance, and taxesRenter Cost Burdened: Renters who spend more than 30% of their income on rentNo Health Insurance: Those without private health insurance, Medicare, Medicaid, or any other plan or programNo Vehicle Access: Households without automobile, van, or truck accessHigh School Education or Less: Those highest level of educational attainment is a High School diploma, equivalency, or lessLimited English Ability: Those whose ability to speak English is "Less Than Well."People of Color: Those who identify as anything other than Non-Hispanic White Disability: Households with one or more physical or cognitive disabilities Age: Groups that tend to have higher levels of vulnerability, including children (those below 18), and seniors (those 65 and older)An overall percentile value is calculated for each feature based on the total proportion of the above indicators in each area. How are the variables combined?These indicators are divided into two main categories that we call Thematic Indices: Economic and Personal Characteristics. The two thematic indices are further divided into five sub-indices called Tier-2 Sub-Indices. Each Tier-2 Sub-Index contains 2-3 indicators. Indicators are the datasets used to measure vulnerability within each sub-index. The variables for each feature are re-scaled using the percentile normalization method, which converts them to the same scale using values between 0 to 100. The variables are then combined first into each of the five Tier-2 Sub-Indices, then the Thematic Indices, then the overall TEPI using the mean aggregation method and equal weighting. The resulting dataset is then divided into the five classes, where:High Vulnerability (80-100%): Representing the top classification, this category includes the highest 20% of regions that are the most socially vulnerable. These areas require the most focused attention. Moderate-High Vulnerability (60-80%): This upper-middle classification includes areas with higher levels of vulnerability compared to the median. While not the highest, these areas are more vulnerable than a majority of the dataset and should be considered for targeted interventions. Moderate Vulnerability (40-60%): Representing the middle or median quintile, this category includes areas of average vulnerability. These areas may show a balanced mix of high and low vulnerability. Detailed examination of specific indicators is recommended to understand the nuanced needs of these areas. Low-Moderate Vulnerability (20-40%): Falling into the lower-middle classification, this range includes areas that are less vulnerable than most but may still exhibit certain vulnerable characteristics. These areas typically have a mix of lower and higher indicators, with the lower values predominating. Low Vulnerability (0-20%): This category represents the bottom classification, encompassing the lowest 20% of data points. Areas in this range are the least vulnerable, making them the most resilient compared to all other features in the dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States US: Income Share Held by Highest 20% data was reported at 46.900 % in 2016. This records an increase from the previous number of 46.400 % for 2013. United States US: Income Share Held by Highest 20% data is updated yearly, averaging 46.000 % from Dec 1979 (Median) to 2016, with 11 observations. The data reached an all-time high of 46.900 % in 2016 and a record low of 41.200 % in 1979. United States US: Income Share Held by Highest 20% data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s United States – Table US.World Bank.WDI: Poverty. Percentage share of income or consumption is the share that accrues to subgroups of population indicated by deciles or quintiles. Percentage shares by quintile may not sum to 100 because of rounding.; ; World Bank, Development Research Group. Data are based on primary household survey data obtained from government statistical agencies and World Bank country departments. Data for high-income economies are from the Luxembourg Income Study database. For more information and methodology, please see PovcalNet (http://iresearch.worldbank.org/PovcalNet/index.htm).; ; The World Bank’s internationally comparable poverty monitoring database now draws on income or detailed consumption data from more than one thousand six hundred household surveys across 164 countries in six regions and 25 other high income countries (industrialized economies). While income distribution data are published for all countries with data available, poverty data are published for low- and middle-income countries and countries eligible to receive loans from the World Bank (such as Chile) and recently graduated countries (such as Estonia) only. See PovcalNet (http://iresearch.worldbank.org/PovcalNet/WhatIsNew.aspx) for definitions of geographical regions and industrialized countries.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RAV Network information periodically changes with additions or removal of data and users should confirm that information is current and accurate. The RAV Network Road Tables and RAV Mapping Tool can be found on the Main Roads Western Australia website, refer Hyperlink below.https://www.mainroads.wa.gov.au/heavy-vehicles/Main Roads Open Data: Restricted Access Networkshttps://portal-mainroads.opendata.arcgis.com/pages/hvs-networksUpdate Frequency: WeeklySpatial Coverage: Western AustraliaLegalYou are accessing this data pursuant to a Creative Commons (Attribution) Licence which has a disclaimer of warranties and limitation of liability. You accept that the data provided pursuant to the Licence is subject to changes. Main Roads WA website is the official and current source of RAV Network data.Pursuant to section 3 of the Licence you are provided with the following notice to be included when you Share the Licenced Material and when you Share your Adapted Material: The Commissioner of Main Roads is the creator and owner of the data and Licenced Material, which is accessed pursuant to a Creative Commons (Attribution) Licence, which has a disclaimer of warranties and limitation of liability. Main Roads WA website is the official and current source of RAV Network data.Licensinghttps://creativecommons.org/licenses/by/4.0/legalcode
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The following fruits, vegetables and nuts and are included: Apples (different varieties: Crimson Snow, Golden, Golden-Red, Granny Smith, Pink Lady, Red, Red Delicious), Apricot, Avocado, Avocado ripe, Banana (Yellow, Red, Lady Finger), Beans, Beetroot Red, Blackberry, Blueberry, Cabbage, Caju seed, Cactus fruit, Cantaloupe (2 varieties), Carambula, Carrot, Cauliflower, Cherimoya, Cherry (different varieties, Rainier), Cherry Wax (Yellow, Red, Black), Chestnut, Clementine, Cocos, Corn (with husk), Cucumber (ripened, regular), Dates, Eggplant, Fig, Ginger Root, Goosberry, Granadilla, Grape (Blue, Pink, White (different varieties)), Grapefruit (Pink, White), Guava, Hazelnut, Huckleberry, Kiwi, Kaki, Kohlrabi, Kumsquats, Lemon (normal, Meyer), Lime, Lychee, Mandarine, Mango (Green, Red), Mangostan, Maracuja, Melon Piel de Sapo, Mulberry, Nectarine (Regular, Flat), Nut (Forest, Pecan), Onion (Red, White), Orange, Papaya, Passion fruit, Peach (different varieties), Pepino, Pear (different varieties, Abate, Forelle, Kaiser, Monster, Red, Stone, Williams), Pepper (Red, Green, Orange, Yellow), Physalis (normal, with Husk), Pineapple (normal, Mini), Pistachio, Pitahaya Red, Plum (different varieties), Pomegranate, Pomelo Sweetie, Potato (Red, Sweet, White), Quince, Rambutan, Raspberry, Redcurrant, Salak, Strawberry (normal, Wedge), Tamarillo, Tangelo, Tomato (different varieties, Maroon, Cherry Red, Yellow, not ripened, Heart), Walnut, Watermelon, Zucchini (green and dark).
The dataset has 5 major branches:
-The 100x100 branch, where all images have 100x100 pixels. See _fruits-360_100x100_ folder.
-The original-size branch, where all images are at their original (captured) size. See _fruits-360_original-size_ folder.
-The meta branch, which contains additional information about the objects in the Fruits-360 dataset. See _fruits-360_dataset_meta_ folder.
-The multi branch, which contains images with multiple fruits, vegetables, nuts and seeds. These images are not labeled. See _fruits-360_multi_ folder.
-The _3_body_problem_ branch where the Training and Test folders contain different (varieties of) the 3 fruits and vegetables (Apples, Cherries and Tomatoes). See _fruits-360_3-body-problem_ folder.
Mihai Oltean, Fruits-360 dataset, 2017-
Total number of images: 138704.
Training set size: 103993 images.
Test set size: 34711 images.
Number of classes: 206 (fruits, vegetables, nuts and seeds).
Image size: 100x100 pixels.
Total number of images: 58363.
Training set size: 29222 images.
Validation set size: 14614 images
Test set size: 14527 images.
Number of classes: 90 (fruits, vegetables, nuts and seeds).
Image size: various (original, captured, size) pixels.
Total number of images: 47033.
Training set size: 34800 images.
Test set size: 12233 images.
Number of classes: 3 (Apples, Cherries, Tomatoes).
Number of varieties: Apples = 29; Cherries = 12; Tomatoes = 19.
Image size: 100x100 pixels.
Number of classes: 26 (fruits, vegetables, nuts and seeds).
Number of images: 150.
image_index_100.jpg (e.g. 31_100.jpg) or
r_image_index_100.jpg (e.g. r_31_100.jpg) or
r?_image_index_100.jpg (e.g. r2_31_100.jpg)
where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis. "100" comes from image size (100x100 pixels).
Different varieties of the same fruit (apple, for instance) are stored as belonging to different classes.
r?_image_index.jpg (e.g. r2_31.jpg)
where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis.
The name of the image files in the new version does NOT contain the "_100" suffix anymore. This will help you to make the distinction between the original-size branch and the 100x100 branch.
The file's name is the concatenation of the names of the fruits inside that picture.
The Fruits-360 dataset can be downloaded from:
Kaggle https://www.kaggle.com/moltean/fruits
GitHub https://github.com/fruits-360
Fruits and vegetables were planted in the shaft of a low-speed motor (3 rpm) and a short movie of 20 seconds was recorded.
A Logitech C920 camera was used for filming the fruits. This is one of the best webcams available.
Behind the fruits, we placed a white sheet of paper as a background.
Here i...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This multi-subject and multi-session EEG dataset for modelling human visual object recognition (MSS) contains:
More details about the dataset are described as follows.
32 participants were recruited from college students in Beijing, of which 4 were female, and 28 were male, with an age range of 21-33 years. 100 sessions were conducted. They were paid and gave written informed consent. The study was conducted under the approval of the ethical committee of the Institute of Automation at the Chinese Academy of Sciences, with the approval number: IA21-2410-020201.
After every 50 sequences, there was a break for the participants to rest. Each rapid serial sequence lasted approximately 7.5 seconds, starting with a 750ms blank screen with a white fixation cross, followed by 20 or 21 images presented at 5 Hz with a 50% duty cycle. The sequence ended with another 750ms blank screen.
After the rapid serial sequence, there was a 2-second interval during which participants were instructed to blink and then report whether a special image appeared in the sequence using a keyboard. During each run, 20 sequences were randomly inserted with additional special images at random positions. The special images are logos for brain-computer interfaces.
Each image was displayed for 1 second and was followed by 11 choice boxes (1 correct class box, 9 random class boxes, and 1 reject box). Participants were required to select the correct class of the displayed image using a mouse to increase their engagement. After the selection, a white fixation cross was displayed for 1 second in the centre of the screen to remind participants to pay attention to the upcoming task.
The stimuli are from two image databases, ImageNet and PASCAL. The final set consists of 10,000 images, with 500 images for each class.
In the derivatives/annotations folder, there are additional information of MSS:
The EEG signals were pre-processed using the MNE package, version 1.3.1, with Python 3.9.16. The data was sampled at a rate of 1,000 Hz with a bandpass filter applied between 0.1 and 100 Hz. A notch filter was used to remove 50 Hz power frequency. Epochs were created for each trial ranging from 0 to 500 ms relative to stimulus onset. No further preprocessing or artefact correction methods were applied in technical validation. However, researchers may want to consider widely used preprocessing steps such as baseline correction or eye movement correction. After the preprocessing, each session resulted in two matrices: RSVP EEG data matrix of shape (8,000 image conditions × 122 EEG channels × 125 EEG time points) and low-speed EEG data matrix of shape (400 image conditions × 122 EEG channels × 125 EEG time points).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SoloFace: A Single-Face Dataset for Resource-Constrained Face Detection and Tracking
Description
SoloFace is a custom dataset derived from the COCO-Faces and Visual Wake Word datasets, specifically designed for single-face detection tasks in resource-constrained environments. This dataset is ideal for developing machine learning models for embedded AI applications, such as TinyML, which operate on low-power devices. Each image either contains a single human face or no face, with corresponding labels providing class information and bounding box coordinates for face detection. The dataset includes data augmentation to ensure robustness across diverse conditions, such as variations in lighting, scale, and orientation.
Dataset Structure
The dataset is organized into three subsets: train
, test
, and val
. Each subset contains:
images/
: .jpg
image files.labels/
: .json
label files with matching filenames to the images.Label Format
Each .json
label file includes:
image
: Name of the corresponding image file.class
: 1
if a face is present, 0
otherwise.bbox
: Normalized bounding box coordinates [top_left_x, top_left_y, bottom_right_x, bottom_right_y]
. If no face is present, the bounding box is set to [0.0, 0.0, 0.01, 0.01]
.Statistics
Original Dataset:
After Data Augmentation:
Class Distribution:
Data Augmentation Details
To improve model robustness, the following augmentation techniques were applied to the training set:
Each augmentation preserved bounding box consistency with the transformed images.
Usage This dataset supports the following use cases:
Loading the Dataset
unzip soloface-detection-dataset.zip
soloface-detection-dataset/
├── train/
│ ├── images/
│ ├── labels/
├── test/
│ ├── images/
│ ├── labels/
├── val/
│ ├── images/
│ ├── labels/
License
This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
For more details, visit the CC BY 4.0 License.
Contact
For inquiries or collaborations, please contact:
sahabidyut999@gmail.com
study.riya1792@gmail.com
This format fits Zenodo's description field requirements while providing clarity and structure. Let me know if further refinements are needed!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets of articles and their associated quality assessment rating from the English Wikipedia. Each dataset is self-contained as it also includes all content (wiki markup) associated with a given revision. The datasets have been split into a 90% training set and 10% test set using a stratified random sampling strategy.The 2017 dataset is the preferred dataset to use, contains 32,460 articles, and was gathered on 2017/09/10. The 2015 dataset is maintained for historic reference, and contains 30,272 articles gathered on 2015/02/05.The articles were sampled from six of English Wikipedia's seven assessment classes, with the exception of the Featured Article class, which contains all (2015 dataset) or almost all (2017 dataset) articles in that class at the time. Articles are assumed to belong to the highest quality class they are rated as and article history has been mined to find the appropriate revision associated with a given quality rating. Due to the low usage of A-class articles, this class is not part of the datasets. For more details, see "The Success and Failure of Quality Improvement Projects in Peer Production Communities" by Warncke-Wang et al. (CSCW 2015), linked below. These datasets have been used in training the wikiclass Python library machine learner, also linked below.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/B9TEWMhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/B9TEWM
This dataset contains replication files for "The Fading American Dream: Trends in Absolute Income Mobility Since 1940" by Raj Chetty, David Grusky, Maximilian Hell, Nathaniel Hendren, Robert Manduca, and Jimmy Narang. For more information, see https://opportunityinsights.org/paper/the-fading-american-dream/. A summary of the related publication follows. One of the defining features of the “American Dream” is the ideal that children have a higher standard of living than their parents. We assess whether the U.S. is living up to this ideal by estimating rates of “absolute income mobility” – the fraction of children who earn more than their parents – since 1940. We measure absolute mobility by comparing children’s household incomes at age 30 (adjusted for inflation using the Consumer Price Index) with their parents’ household incomes at age 30. We find that rates of absolute mobility have fallen from approximately 90% for children born in 1940 to 50% for children born in the 1980s. Absolute income mobility has fallen across the entire income distribution, with the largest declines for families in the middle class. These findings are unaffected by using alternative price indices to adjust for inflation, accounting for taxes and transfers, measuring income at later ages, and adjusting for changes in household size. Absolute mobility fell in all 50 states, although the rate of decline varied, with the largest declines concentrated in states in the industrial Midwest, such as Michigan and Illinois. The decline in absolute mobility is especially steep – from 95% for children born in 1940 to 41% for children born in 1984 – when we compare the sons’ earnings to their fathers’ earnings. Why have rates of upward income mobility fallen so sharply over the past half-century? There have been two important trends that have affected the incomes of children born in the 1980s relative to those born in the 1940s and 1950s: lower Gross Domestic Product (GDP) growth rates and greater inequality in the distribution of growth. We find that most of the decline in absolute mobility is driven by the more unequal distribution of economic growth rather than the slowdown in aggregate growth rates. When we simulate an economy that restores GDP growth to the levels experienced in the 1940s and 1950s but distributes that growth across income groups as it is distributed today, absolute mobility only increases to 62%. In contrast, maintaining GDP at its current level but distributing it more broadly across income groups – at it was distributed for children born in the 1940s – would increase absolute mobility to 80%, thereby reversing more than two-thirds of the decline in absolute mobility. These findings show that higher growth rates alone are insufficient to restore absolute mobility to the levels experienced in mid-century America. Under the current distribution of GDP, we would need real GDP growth rates above 6% per year to return to rates of absolute mobility in the 1940s. Intuitively, because a large fraction of GDP goes to a small fraction of high-income households today, higher GDP growth does not substantially increase the number of children who earn more than their parents. Of course, this does not mean that GDP growth does not matter: changing the distribution of growth naturally has smaller effects on absolute mobility when there is very little growth to be distributed. The key point is that increasing absolute mobility substantially would require more broad-based economic growth. We conclude that absolute mobility has declined sharply in America over the past half-century primarily because of the growth in inequality. If one wants to revive the “American Dream” of high rates of absolute mobility, one must have an interest in growth that is shared more broadly across the income distribution.
In this paper we propose an innovative learning algorithm - a variation of One-class ? Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with much reduced computational complexities. The proposed technique returns an approximate solution, nearly as good as the solution set obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in much reduced time. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Through out the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate the effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class ? SVMs while reducing both training time and test time by several factors.
Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas, annual.
This dataset provides an additional "Grazing Potential" land use class to the previously published U.S. Geological Survey (USGS) National Water-Quality Program (NAWQA) Wall-to-Wall Anthropogenic Land-use Trends (NWALT) product (Falcone, 2015, USGS Data Series 948). As with the NWALT, the dataset consists of five national 60-m land use grids, for the years 1974, 1982, 1992, 2002, 2012. The only change to the dataset is, for every year, some pixels which are class 50 "Low-use" in the NWALT, are reclassified to a new class 46 "Grazing Potential Expanded". The purpose of the re-classification is to identify areas which are likely to have had at least some grazing activity based on agreement of historical land cover/use datasets, and not already captured as another land use class by the original NWALT. The re-classification occurred as follows: pixel would otherwise be in class 50 (Low Use), is in an Agriculture or Grazed class in Marschner and Anderson (1967), is in an Agriculture or Rangeland class in 1970s-era GIRAS, and is in a Grassland/Herbaceous class (71) in the NLCD 2011, without restrictions to proximity to water or slope. Falcone, J.A., 2015, U.S. conterminous wall-to-wall anthropogenic land use trends (NWALT), 1974–2012: U.S. Geological Survey Data Series 948, 33 p. plus appendixes 3–6 as separate files, http://dx.doi.org/10.3133/ds948. Marschner, F.J. and Anderson, J.R., 1967, Major land uses in the United States, U.S. Geological Survey, http://water.usgs.gov/GIS/metadata/usgswrd/XML/na70_landuse.xml
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Title Object Detection Model for Identification of Poisonous Plants in the Chesapeake Bay Watershed by Shameer Rao
Model Overview Time outside is crucial for our health, but there are risks in the great outdoors. One of the most significant issues we may encounter is poisonous plants. There isn't a singular rule to recognize them; they are not all bright red, nor do they all have three leaves. An encounter with a harmful plant could cause rashes, itching, and swelling. This object detection model will identify the four most common poisonous plants in the Chesapeake Bay Watershed, with the intended audience being American states within the Chesapeake Bay area (Delaware, Maryland, New York, Pennsylvania, Virginia, and West Virginia—and the District of Columbia). The model will have five classes; Giant Hogweed (Heracleum mantegazzianum), Poison Hemlock (Conium maculatum), Spotted Water Hemlock (Cicuta maculata), Mayapple (Podophyllum peltatum), and a null class.
Model Structure Roboflow will be used to create the model, with five classes; Giant Hogweed (Heracleum mantegazzianum), Poison Hemlock (Conium maculatum), Spotted Water Hemlock (Cicuta maculata), Mayapple (Podophyllum peltatum), and a null class to store nonessential results. I have chosen Roboflow for its in-depth analytics and various optimization tools. The smart annotation tool is one of its best features and increases workflow when annotating. Additionally, having more familiarity with Roboflow compared to Google’s Teachable Machine is also an advantage.
Data Collection Plan Each class will be trained with 100 images taken during the daytime, containing as little background noise as possible, and focused on most parts of the plants. All photos must be in JPEG or PNG format. Image size will not be an eliminating factor, but all images will be backed-up on Google Drive in case of cropping or editing the pictures. With these parameters set, I hope that the rules either eliminate or reduce bias within this model. Additionally, the images will be collected from the iNaturalist, CDC, NPS, and MD DNR websites.
Minimal Viable Product The object detection model should reach a 40.0% mAP, 50.0% precision, and 40.0% recall to be considered a success, with each class at a 50% accuracy rate as well. Although my initial benchmark is low, I aim to reach this threshold in the first or second iteration of the model. Upon reaching this threshold, the final milestone should increase up to 65.0% mAP, 75.0% precision, and 60.0% recall. These milestones should be feasible as I reached 67.2% mAP, 76.0% precision, and 61.1% recall on my second iteration of the Shark Tooth Model. I expect Giant Hogweed (Heracleum mantegazzianum), and Spotted Water Hemlock (Cicuta maculata) classes to have a lower accuracy rate due to their similarity in features. Additionally, I expect Mayapple (Podophyllum peltatum) to perform the best as it has more distinct features than the other three classes.
Use cases for this project:
Ecological Conservation: Conservationists and ecologists in the Chesapeake Bay Watershed can use the object detection model to monitor and track the spread of these poisonous plant species. By detecting their presence in various ecosystems, specialists can take appropriate measures to control their growth and prevent damage to native species.
Public Health and Safety: Local governments and parks departments can utilize this model to identify and remove poisonous plants from public spaces such as parks, hiking trails, and playgrounds. This would reduce the risk of accidental exposure to these plants, ensuring a safer outdoor environment for the community.
Agricultural Management: Farmers and landowners in the Chesapeake Bay Watershed can use the computer vision model to detect the presence of poisonous plants on their property. This would help them avoid cultivating or accidentally spreading these toxic invaders, safeguarding their crops and livestock from possible harm.
Botanical Research: Researchers studying the ecology of the Chesapeake Bay Watershed can use the object detection model to conduct large-scale surveys of poisonous plant populations in the region. This data would provide valuable information on the distribution, abundance, and interactions between these toxic species and the surrounding environment.
Environmental Education: Educators can incorporate the object detection model into educational programs to teach students and the public about poisonous plants found in the Chesapeake Bay Watershed. This would raise awareness of these hazardous species, fostering a better understanding of local ecosystems and promoting responsible outdoor behaviors.
COMPASS-XP is a dataset of matched photographic and X-ray images of single objects, made available for use in Machine Learning & Computer Vision research, in particular in the context of transport security. Objects are imaged in multiple poses, and accompanied by metadata including labels for whether we consider the object to be dangerous in the context of aviation. Object classes overlap with those in the popular ImageNet Large Scale Visual Recognition Challenge class set and theWordNet lexical database, and identifiers for shared classes in both schemes are also provided.
Hardware Configuration
Photographs were captured with a Sony DSC-W800 compact digital camera. X-ray scans were obtained
using a Gilardoni FEP ME 536 mailroom X-ray machine, distributed in the UK by Todd Research
under the name TR50. The scanner is dual energy and generates several image outputs:
• Low: Raw 8-bit greyscale data from the scanner’s low energy X-ray channel.
• High: Raw 8-bit greyscale data from the scanner’s high energy X-ray channel.
• Density: 8-bit greyscale data representing inferred material density computed from the two channels.
• Grey: RGB PNG image representing a combination of both low and high energy channels with some appearance improvements. Although nominally greyscale, the image does include subtle duotone-style colouration.
• Colour RGB PNG image with false colour palette representing material density.
In practice the grey and colour versions are probably most useful, but for completeness the dataset includes all variants for each scan.
Data Files Image files are supplied in six subdirectories, corresponding to the five X-ray image variants above plus photos. X-rays are provided in PNG format, while photos are JPEG. Each scan is identified by a numeric index, which is also used to name the file, padded with leading zeros to always be 4 digits long.
Scan metadata is provided in the accompanying tab-delimited text file, meta.txt. This includes the
following columns:
• basename: The zero-padded identifier for the scan. All six image type variants for the same class-instance-pose have the same basename. X-ray files are named basename.png while photos are basename.jpg.
• class: The object class in the scan.
• instance: An integer identifying the object instance. Instances start at 1 for each class.
• pose: An integer identifying the object pose. Poses start at 1 for each instance.
• scan tray: Either A, indicating that the pose was imaged in a weighted tray, or N indicating it was not.
• dangerous: Whether the object was considered dangerous (True/False).
• IN id: Numeric index of the object class in the ILSVRC list of 1000 classes, or empty if the class isn’t present there.
• WN id: WordNet identifier for the object class, or empty if the class isn’t present inWordNet.
License The COMPASS-XP dataset was acquired as part of a research project funded by the UK Government Future Aviation Security Solutions programme. Both the images and their metadata are licensed under the Creative Commons Attribution 4.0 International License and may be freely used for research and commercial purpose, including derivative works, providing the source is acknowledged.
COMPASS-XP Dataset Authors Lewis D. Griffin*, Matthew Caldwell, Jerone T. A. Andrews Computational Security Science Group, UCL * l.griffin@cs.ucl.ac.uk
The Exclusively Dark (ExDARK) dataset is a collection of 7,363 low-light images from very low-light environments to twilight (i.e 10 different conditions) with 12 object classes (similar to PASCAL VOC) annotated on both image class level and local object bounding boxes.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
California State Income Limits reflect updated median income and household income levels for acutely low-, extremely low-, very low-, low- and moderate-income households for California’s 58 counties (required by Health and Safety Code Section 50093). These income limits apply to State and local affordable housing programs statutorily linked to HUD income limits and differ from income limits applicable to other specific federal, State, or local programs.