67 datasets found

Income Limits by County
data.ca.gov
catalog.data.gov
csv, docx
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Housing and Community Development (2024). Income Limits by County [Dataset]. https://data.ca.gov/dataset/income-limits-by-county
Explore at:
csv(15447), csv(15546), docx(31186)Available download formats
Dataset updated
Feb 7, 2024
Dataset provided by
California Department of Housing & Community Developmenthttps://hcd.ca.gov/
Authors
California Department of Housing and Community Development
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
California State Income Limits reflect updated median income and household income levels for acutely low-, extremely low-, very low-, low- and moderate-income households for California’s 58 counties (required by Health and Safety Code Section 50093). These income limits apply to State and local affordable housing programs statutorily linked to HUD income limits and differ from income limits applicable to other specific federal, State, or local programs.
w
Globalization and Income Distribution Dataset 1975-2002 - Aruba,...
microdata.worldbank.org
dev.ihsn.org
+2more
Updated Oct 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Branko L. Milanovic (2023). Globalization and Income Distribution Dataset 1975-2002 - Aruba, Afghanistan, Angola...and 188 more [Dataset]. https://microdata.worldbank.org/index.php/catalog/1786
Explore at:
Dataset updated
Oct 26, 2023
Dataset authored and provided by
Branko L. Milanovic
Time period covered
1975 - 2002
Area covered
Angola
Description
Abstract

Dataset used in World Bank Policy Research Working Paper #2876, published in World Bank Economic Review, No. 1, 2005, pp. 21-44.

The effects of globalization on income distribution in rich and poor countries are a matter of controversy. While international trade theory in its most abstract formulation implies that increased trade and foreign investment should make income distribution more equal in poor countries and less equal in rich countries, finding these effects has proved elusive. The author presents another attempt to discern the effects of globalization by using data from household budget surveys and looking at the impact of openness and foreign direct investment on relative income shares of low and high deciles. The author finds some evidence that at very low average income levels, it is the rich who benefit from openness. As income levels rise to those of countries such as Chile, Colombia, or Czech Republic, for example, the situation changes, and it is the relative income of the poor and the middle class that rises compared with the rich. It seems that openness makes income distribution worse before making it better-or differently in that the effect of openness on a country's income distribution depends on the country's initial income level.

Kind of data

Aggregate data [agg]
u
Data from: CADDI: An in-Class Activity Detection Dataset using IMU data from...
observatorio-cientifico.ua.es
scidb.cn
Updated 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Pina-Navarro, Monica; Gomez-Donoso, Francisco; Cazorla, Miguel; Marquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Pina-Navarro, Monica; Gomez-Donoso, Francisco; Cazorla, Miguel (2025). CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors [Dataset]. https://observatorio-cientifico.ua.es/documentos/668fc49bb9e7c03b01be251c
Explore at:
Dataset updated
2025
Authors
Marquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Pina-Navarro, Monica; Gomez-Donoso, Francisco; Cazorla, Miguel; Marquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Pina-Navarro, Monica; Gomez-Donoso, Francisco; Cazorla, Miguel
Description
Data DescriptionThe CADDI dataset is designed to support research in in-class activity recognition using IMU data from low-cost sensors. It provides multimodal data capturing 19 different activities performed by 12 participants in a classroom environment, utilizing both IMU sensors from a Samsung Galaxy Watch 5 and synchronized stereo camera images. This dataset enables the development and validation of activity recognition models using sensor fusion techniques.Data Generation ProceduresThe data collection process involved recording both continuous and instantaneous activities that typically occur in a classroom setting. The activities were captured using a custom setup, which included:A Samsung Galaxy Watch 5 to collect accelerometer, gyroscope, and rotation vector data at 100Hz.A ZED stereo camera capturing 1080p images at 25-30 fps.A synchronized computer acting as a data hub, receiving IMU data and storing images in real-time.A D-Link DSR-1000AC router for wireless communication between the smartwatch and the computer.Participants were instructed to arrange their workspace as they would in a real classroom, including a laptop, notebook, pens, and a backpack. Data collection was performed under realistic conditions, ensuring that activities were captured naturally.Temporal and Spatial ScopeThe dataset contains a total of 472.03 minutes of recorded data.The IMU sensors operate at 100Hz, while the stereo camera captures images at 25-30Hz.Data was collected from 12 participants, each performing all 19 activities multiple times.The geographical scope of data collection was Alicante, Spain, under controlled indoor conditions.Dataset ComponentsThe dataset is organized into JSON and PNG files, structured hierarchically:IMU Data: Stored in JSON files, containing:Samsung Linear Acceleration Sensor (X, Y, Z values, 100Hz)LSM6DSO Gyroscope (X, Y, Z values, 100Hz)Samsung Rotation Vector (X, Y, Z, W quaternion values, 100Hz)Samsung HR Sensor (heart rate, 1Hz)OPT3007 Light Sensor (ambient light levels, 5Hz)Stereo Camera Images: High-resolution 1920×1080 PNG files from left and right cameras.Synchronization: Each IMU data record and image is timestamped for precise alignment.Data StructureThe dataset is divided into continuous and instantaneous activities:Continuous Activities (e.g., typing, writing, drawing) were recorded for 210 seconds, with the central 200 seconds retained.Instantaneous Activities (e.g., raising a hand, drinking) were repeated 20 times per participant, with data captured only during execution.The dataset is structured as:/continuous/subject_id/activity_name/ /camera_a/ → Left camera images /camera_b/ → Right camera images /sensors/ → JSON files with IMU data

/instantaneous/subject_id/activity_name/repetition_id/ /camera_a/ /camera_b/ /sensors/ Data Quality & Missing DataThe smartwatch buffers 100 readings per second before sending them, ensuring minimal data loss.Synchronization latency between the smartwatch and the computer is negligible.Not all IMU samples have corresponding images due to different recording rates.Outliers and anomalies were handled by discarding incomplete sequences at the start and end of continuous activities.Error Ranges & LimitationsSensor data may contain noise due to minor hand movements.The heart rate sensor operates at 1Hz, limiting its temporal resolution.Camera exposure settings were automatically adjusted, which may introduce slight variations in lighting.File Formats & Software CompatibilityIMU data is stored in JSON format, readable with Python’s json library.Images are in PNG format, compatible with all standard image processing tools.Recommended libraries for data analysis:Python: numpy, pandas, scikit-learn, tensorflow, pytorchVisualization: matplotlib, seabornDeep Learning: Keras, PyTorchPotential ApplicationsDevelopment of activity recognition models in educational settings.Study of student engagement based on movement patterns.Investigation of sensor fusion techniques combining visual and IMU data.This dataset represents a unique contribution to activity recognition research, providing rich multimodal data for developing robust models in real-world educational environments.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025caddiinclassactivitydetection, title={CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Monica Pina-Navarro and Miguel Cazorla and Francisco Gomez-Donoso}, year={2025}, eprint={2503.02853}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.02853}, }
Low income cut-offs (LICOs) before and after tax by community size and...
www150.statcan.gc.ca
ouvert.canada.ca
+1more
Updated May 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Low income cut-offs (LICOs) before and after tax by community size and family size, in current dollars [Dataset]. http://doi.org/10.25318/1110024101-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1110024101-eng
Dataset updated
May 1, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Low income cut-offs (LICOs) before and after tax by community size and family size, in current dollars, annual.
d
Training dataset for NABat Machine Learning V1.0
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Training dataset for NABat Machine Learning V1.0 [Dataset]. https://catalog.data.gov/dataset/training-dataset-for-nabat-machine-learning-v1-0
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Description
Bats play crucial ecological roles and provide valuable ecosystem services, yet many populations face serious threats from various ecological disturbances. The North American Bat Monitoring Program (NABat) aims to assess status and trends of bat populations while developing innovative and community-driven conservation solutions using its unique data and technology infrastructure. To support scalability and transparency in the NABat acoustic data pipeline, we developed a fully-automated machine-learning algorithm. This dataset includes audio files of bat echolocation calls that were considered to develop V1.0 of the NABat machine-learning algorithm, however the test set (i.e., holdout dataset) has been excluded from this release. These recordings were collected by various bat monitoring partners across North America using ultrasonic acoustic recorders for stationary acoustic and mobile acoustic surveys. For more information on how these surveys may be conducted, see Chapters 4 and 5 of “A Plan for the North American Bat Monitoring Program” (https://doi.org/10.2737/SRS-GTR-208). These data were then post-processed by bat monitoring partners to remove noise files (or those that do not contain recognizable bat calls) and apply a species label to each file. There is undoubtedly variation in the steps that monitoring partners take to apply a species label, but the steps documented in “A Guide to Processing Bat Acoustic Data for the North American Bat Monitoring Program” (https://doi.org/10.3133/ofr20181068) include first processing with an automated classifier and then manually reviewing to confirm or downgrade the suggested species label. Once a manual ID label was applied, audio files of bat acoustic recordings were submitted to the NABat database in Waveform Audio File format. From these available files in the NABat database, we considered files from 35 classes (34 species and a noise class). Files for 4 species were excluded due to low sample size (Corynorhinus rafinesquii, N=3; Eumops floridanus, N =3; Lasiurus xanthinus, N = 4; Nyctinomops femorosaccus, N =11). From this pool, files were randomly selected until files for each species/grid cell combination were exhausted or the number of recordings reach 1250. The dataset was then randomly split into training, validation, and test sets (i.e., holdout dataset). This data release includes all files considered for training and validation, including files that had been excluded from model development and testing due to low sample size for a given species or because the threshold for species/grid cell combinations had been met. The test set (i.e., holdout dataset) is not included. Audio files are grouped by species, as indicated by the four-letter species code in the name of each folder. Definitions for each four-letter code, including Family, Genus, Species, and Common name, are also included as a dataset in this release.
t
Tucson Equity Priority Index (TEPI): Citywide Census Tracts
teds.tucsonaz.gov
hub.arcgis.com
Updated Jun 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tucson (2024). Tucson Equity Priority Index (TEPI): Citywide Census Tracts [Dataset]. https://teds.tucsonaz.gov/maps/cotgis::tucson-equity-priority-index-tepi-citywide-census-tracts
Explore at:
Dataset updated
Jun 27, 2024
Dataset authored and provided by
City of Tucson
Area covered

Description
For detailed information, visit the Tucson Equity Priority Index StoryMap.Download the layer's data dictionaryWhat is the Tucson Equity Priority Index (TEPI)?The Tucson Equity Priority Index (TEPI) is a tool that describes the distribution of socially vulnerable demographics. It categorizes the dataset into 5 classes that represent the differing prioritization needs based on the presence of social vulnerability: Low (0-20), Low-Moderate (20-40), Moderate (40-60), Moderate-High (60-80) High (80-100). Each class represents 20% of the dataset’s features in order of their values. The features within the Low (0-20) classification represent the areas that, when compared to all other locations in the study area, have the lowest need for prioritization, as they tend to have less socially vulnerable demographics. The features that fall into the High (80-100) classification represent the 20% of locations in the dataset that have the greatest need for prioritization, as they tend to have the highest proportions of socially vulnerable demographics. How is social vulnerability measured?The Tucson Equity Priority Index (TEPI) examines the proportion of vulnerability per feature using 11 demographic indicators:Income Below Poverty: Households with income at or below the federal poverty level (FPL), which in 2023 was $14,500 for an individual and $30,000 for a family of fourUnemployment: Measured as the percentage of unemployed persons in the civilian labor forceHousing Cost Burdened: Homeowners who spend more than 30% of their income on housing expenses, including mortgage, maintenance, and taxesRenter Cost Burdened: Renters who spend more than 30% of their income on rentNo Health Insurance: Those without private health insurance, Medicare, Medicaid, or any other plan or programNo Vehicle Access: Households without automobile, van, or truck accessHigh School Education or Less: Those highest level of educational attainment is a High School diploma, equivalency, or lessLimited English Ability: Those whose ability to speak English is "Less Than Well."People of Color: Those who identify as anything other than Non-Hispanic White Disability: Households with one or more physical or cognitive disabilities Age: Groups that tend to have higher levels of vulnerability, including children (those below 18), and seniors (those 65 and older)An overall percentile value is calculated for each feature based on the total proportion of the above indicators in each area. How are the variables combined?These indicators are divided into two main categories that we call Thematic Indices: Economic and Personal Characteristics. The two thematic indices are further divided into five sub-indices called Tier-2 Sub-Indices. Each Tier-2 Sub-Index contains 2-3 indicators. Indicators are the datasets used to measure vulnerability within each sub-index. The variables for each feature are re-scaled using the percentile normalization method, which converts them to the same scale using values between 0 to 100. The variables are then combined first into each of the five Tier-2 Sub-Indices, then the Thematic Indices, then the overall TEPI using the mean aggregation method and equal weighting. The resulting dataset is then divided into the five classes, where:High Vulnerability (80-100%): Representing the top classification, this category includes the highest 20% of regions that are the most socially vulnerable. These areas require the most focused attention. Moderate-High Vulnerability (60-80%): This upper-middle classification includes areas with higher levels of vulnerability compared to the median. While not the highest, these areas are more vulnerable than a majority of the dataset and should be considered for targeted interventions. Moderate Vulnerability (40-60%): Representing the middle or median quintile, this category includes areas of average vulnerability. These areas may show a balanced mix of high and low vulnerability. Detailed examination of specific indicators is recommended to understand the nuanced needs of these areas. Low-Moderate Vulnerability (20-40%): Falling into the lower-middle classification, this range includes areas that are less vulnerable than most but may still exhibit certain vulnerable characteristics. These areas typically have a mix of lower and higher indicators, with the lower values predominating. Low Vulnerability (0-20%): This category represents the bottom classification, encompassing the lowest 20% of data points. Areas in this range are the least vulnerable, making them the most resilient compared to all other features in the dataset.
a
Limited Resources Sub-Index: TEPI Citywide Census Tracts
cotgis.hub.arcgis.com
Updated Jul 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tucson (2024). Limited Resources Sub-Index: TEPI Citywide Census Tracts [Dataset]. https://cotgis.hub.arcgis.com/maps/cotgis::limited-resources-sub-index-tepi-citywide-census-tracts
Explore at:
Dataset updated
Jul 2, 2024
Dataset authored and provided by
City of Tucson
Area covered

Description
For detailed information, visit the Tucson Equity Priority Index StoryMap.Download the layer's data dictionaryNote: This layer is symbolized to display the percentile distribution of the Limited Resources Sub-Index. However, it includes all data for each indicator and sub-index within the citywide census tracts TEPI.What is the Tucson Equity Priority Index (TEPI)?The Tucson Equity Priority Index (TEPI) is a tool that describes the distribution of socially vulnerable demographics. It categorizes the dataset into 5 classes that represent the differing prioritization needs based on the presence of social vulnerability: Low (0-20), Low-Moderate (20-40), Moderate (40-60), Moderate-High (60-80) High (80-100). Each class represents 20% of the dataset’s features in order of their values. The features within the Low (0-20) classification represent the areas that, when compared to all other locations in the study area, have the lowest need for prioritization, as they tend to have less socially vulnerable demographics. The features that fall into the High (80-100) classification represent the 20% of locations in the dataset that have the greatest need for prioritization, as they tend to have the highest proportions of socially vulnerable demographics. How is social vulnerability measured?The Tucson Equity Priority Index (TEPI) examines the proportion of vulnerability per feature using 11 demographic indicators:Income Below Poverty: Households with income at or below the federal poverty level (FPL), which in 2023 was $14,500 for an individual and $30,000 for a family of fourUnemployment: Measured as the percentage of unemployed persons in the civilian labor forceHousing Cost Burdened: Homeowners who spend more than 30% of their income on housing expenses, including mortgage, maintenance, and taxesRenter Cost Burdened: Renters who spend more than 30% of their income on rentNo Health Insurance: Those without private health insurance, Medicare, Medicaid, or any other plan or programNo Vehicle Access: Households without automobile, van, or truck accessHigh School Education or Less: Those highest level of educational attainment is a High School diploma, equivalency, or lessLimited English Ability: Those whose ability to speak English is "Less Than Well."People of Color: Those who identify as anything other than Non-Hispanic White Disability: Households with one or more physical or cognitive disabilities Age: Groups that tend to have higher levels of vulnerability, including children (those below 18), and seniors (those 65 and older)An overall percentile value is calculated for each feature based on the total proportion of the above indicators in each area. How are the variables combined?These indicators are divided into two main categories that we call Thematic Indices: Economic and Personal Characteristics. The two thematic indices are further divided into five sub-indices called Tier-2 Sub-Indices. Each Tier-2 Sub-Index contains 2-3 indicators. Indicators are the datasets used to measure vulnerability within each sub-index. The variables for each feature are re-scaled using the percentile normalization method, which converts them to the same scale using values between 0 to 100. The variables are then combined first into each of the five Tier-2 Sub-Indices, then the Thematic Indices, then the overall TEPI using the mean aggregation method and equal weighting. The resulting dataset is then divided into the five classes, where:High Vulnerability (80-100%): Representing the top classification, this category includes the highest 20% of regions that are the most socially vulnerable. These areas require the most focused attention. Moderate-High Vulnerability (60-80%): This upper-middle classification includes areas with higher levels of vulnerability compared to the median. While not the highest, these areas are more vulnerable than a majority of the dataset and should be considered for targeted interventions. Moderate Vulnerability (40-60%): Representing the middle or median quintile, this category includes areas of average vulnerability. These areas may show a balanced mix of high and low vulnerability. Detailed examination of specific indicators is recommended to understand the nuanced needs of these areas. Low-Moderate Vulnerability (20-40%): Falling into the lower-middle classification, this range includes areas that are less vulnerable than most but may still exhibit certain vulnerable characteristics. These areas typically have a mix of lower and higher indicators, with the lower values predominating. Low Vulnerability (0-20%): This category represents the bottom classification, encompassing the lowest 20% of data points. Areas in this range are the least vulnerable, making them the most resilient compared to all other features in the dataset.
United States US: Income Share Held by Highest 20%
ceicdata.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com, United States US: Income Share Held by Highest 20% [Dataset]. https://www.ceicdata.com/en/united-states/poverty/us-income-share-held-by-highest-20
Explore at:
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 1979 - Dec 1, 2016
Area covered
United States
Description
United States US: Income Share Held by Highest 20% data was reported at 46.900 % in 2016. This records an increase from the previous number of 46.400 % for 2013. United States US: Income Share Held by Highest 20% data is updated yearly, averaging 46.000 % from Dec 1979 (Median) to 2016, with 11 observations. The data reached an all-time high of 46.900 % in 2016 and a record low of 41.200 % in 1979. United States US: Income Share Held by Highest 20% data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s United States – Table US.World Bank.WDI: Poverty. Percentage share of income or consumption is the share that accrues to subgroups of population indicated by deciles or quintiles. Percentage shares by quintile may not sum to 100 because of rounding.; ; World Bank, Development Research Group. Data are based on primary household survey data obtained from government statistical agencies and World Bank country departments. Data for high-income economies are from the Luxembourg Income Study database. For more information and methodology, please see PovcalNet (http://iresearch.worldbank.org/PovcalNet/index.htm).; ; The World Bank’s internationally comparable poverty monitoring database now draws on income or detailed consumption data from more than one thousand six hundred household surveys across 164 countries in six regions and 25 other high income countries (industrialized economies). While income distribution data are published for all countries with data available, poverty data are published for low- and middle-income countries and countries eligible to receive loans from the World Bank (such as Chile) and recently graduated countries (such as Estonia) only. See PovcalNet (http://iresearch.worldbank.org/PovcalNet/WhatIsNew.aspx) for definitions of geographical regions and industrialized countries.
d
Class 1 RAV Low Loader Overmass Category 1 - Datasets - data.wa.gov.au
catalogue.data.wa.gov.au
Updated Jan 21, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Class 1 RAV Low Loader Overmass Category 1 - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/mrwa-class-1-rav-low-loader-overmass-category-1
Explore at:
Dataset updated
Jan 21, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RAV Network information periodically changes with additions or removal of data and users should confirm that information is current and accurate. The RAV Network Road Tables and RAV Mapping Tool can be found on the Main Roads Western Australia website, refer Hyperlink below.https://www.mainroads.wa.gov.au/heavy-vehicles/Main Roads Open Data: Restricted Access Networkshttps://portal-mainroads.opendata.arcgis.com/pages/hvs-networksUpdate Frequency: WeeklySpatial Coverage: Western AustraliaLegalYou are accessing this data pursuant to a Creative Commons (Attribution) Licence which has a disclaimer of warranties and limitation of liability. You accept that the data provided pursuant to the Licence is subject to changes. Main Roads WA website is the official and current source of RAV Network data.Pursuant to section 3 of the Licence you are provided with the following notice to be included when you Share the Licenced Material and when you Share your Adapted Material: The Commissioner of Main Roads is the creator and owner of the data and Licenced Material, which is accessed pursuant to a Creative Commons (Attribution) Licence, which has a disclaimer of warranties and limitation of liability. Main Roads WA website is the official and current source of RAV Network data.Licensinghttps://creativecommons.org/licenses/by/4.0/legalcode
Fruits-360 dataset
kaggle.com
paperswithcode.com
+1more
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mihai Oltean (2025). Fruits-360 dataset [Dataset]. https://www.kaggle.com/datasets/moltean/fruits
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mihai Oltean
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds

Version: 2025.06.07.0

Content

The following fruits, vegetables and nuts and are included: Apples (different varieties: Crimson Snow, Golden, Golden-Red, Granny Smith, Pink Lady, Red, Red Delicious), Apricot, Avocado, Avocado ripe, Banana (Yellow, Red, Lady Finger), Beans, Beetroot Red, Blackberry, Blueberry, Cabbage, Caju seed, Cactus fruit, Cantaloupe (2 varieties), Carambula, Carrot, Cauliflower, Cherimoya, Cherry (different varieties, Rainier), Cherry Wax (Yellow, Red, Black), Chestnut, Clementine, Cocos, Corn (with husk), Cucumber (ripened, regular), Dates, Eggplant, Fig, Ginger Root, Goosberry, Granadilla, Grape (Blue, Pink, White (different varieties)), Grapefruit (Pink, White), Guava, Hazelnut, Huckleberry, Kiwi, Kaki, Kohlrabi, Kumsquats, Lemon (normal, Meyer), Lime, Lychee, Mandarine, Mango (Green, Red), Mangostan, Maracuja, Melon Piel de Sapo, Mulberry, Nectarine (Regular, Flat), Nut (Forest, Pecan), Onion (Red, White), Orange, Papaya, Passion fruit, Peach (different varieties), Pepino, Pear (different varieties, Abate, Forelle, Kaiser, Monster, Red, Stone, Williams), Pepper (Red, Green, Orange, Yellow), Physalis (normal, with Husk), Pineapple (normal, Mini), Pistachio, Pitahaya Red, Plum (different varieties), Pomegranate, Pomelo Sweetie, Potato (Red, Sweet, White), Quince, Rambutan, Raspberry, Redcurrant, Salak, Strawberry (normal, Wedge), Tamarillo, Tangelo, Tomato (different varieties, Maroon, Cherry Red, Yellow, not ripened, Heart), Walnut, Watermelon, Zucchini (green and dark).

Branches

The dataset has 5 major branches:

-The 100x100 branch, where all images have 100x100 pixels. See _fruits-360_100x100_ folder.

-The original-size branch, where all images are at their original (captured) size. See _fruits-360_original-size_ folder.

-The meta branch, which contains additional information about the objects in the Fruits-360 dataset. See _fruits-360_dataset_meta_ folder.

-The multi branch, which contains images with multiple fruits, vegetables, nuts and seeds. These images are not labeled. See _fruits-360_multi_ folder.

-The _3_body_problem_ branch where the Training and Test folders contain different (varieties of) the 3 fruits and vegetables (Apples, Cherries and Tomatoes). See _fruits-360_3-body-problem_ folder.

How to cite

Mihai Oltean, Fruits-360 dataset, 2017-

Dataset properties

For the 100x100 branch

Total number of images: 138704.

Training set size: 103993 images.

Test set size: 34711 images.

Number of classes: 206 (fruits, vegetables, nuts and seeds).

Image size: 100x100 pixels.

For the original-size branch

Total number of images: 58363.

Training set size: 29222 images.

Validation set size: 14614 images

Test set size: 14527 images.

Number of classes: 90 (fruits, vegetables, nuts and seeds).

Image size: various (original, captured, size) pixels.

For the 3-body-problem branch

Total number of images: 47033.

Training set size: 34800 images.

Test set size: 12233 images.

Number of classes: 3 (Apples, Cherries, Tomatoes).

Number of varieties: Apples = 29; Cherries = 12; Tomatoes = 19.

Image size: 100x100 pixels.

For the meta branch

Number of classes: 26 (fruits, vegetables, nuts and seeds).

For the multi branch

Number of images: 150.

Filename format:

For the 100x100 branch

image_index_100.jpg (e.g. 31_100.jpg) or

r_image_index_100.jpg (e.g. r_31_100.jpg) or

r?_image_index_100.jpg (e.g. r2_31_100.jpg)

where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis. "100" comes from image size (100x100 pixels).

Different varieties of the same fruit (apple, for instance) are stored as belonging to different classes.

For the original-size branch

r?_image_index.jpg (e.g. r2_31.jpg)

where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis.

The name of the image files in the new version does NOT contain the "_100" suffix anymore. This will help you to make the distinction between the original-size branch and the 100x100 branch.

For the multi branch

The file's name is the concatenation of the names of the fruits inside that picture.

Alternate download

The Fruits-360 dataset can be downloaded from:

Kaggle https://www.kaggle.com/moltean/fruits

GitHub https://github.com/fruits-360

How fruits were filmed

Fruits and vegetables were planted in the shaft of a low-speed motor (3 rpm) and a short movie of 20 seconds was recorded.

A Logitech C920 camera was used for filming the fruits. This is one of the best webcams available.

Behind the fruits, we placed a white sheet of paper as a background.

Here i...
Data from: A multi-subject and multi-session EEG dataset for modelling human...
openneuro.org
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuning Xue; Bu Jin; Jie Jiang; Longteng Guo; Jin Zhou; Changyong Wang; Jing Liu (2025). A multi-subject and multi-session EEG dataset for modelling human visual object recognition [Dataset]. http://doi.org/10.18112/openneuro.ds005589.v1.0.3
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds005589.v1.0.3
Dataset updated
Jun 7, 2025
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Shuning Xue; Bu Jin; Jie Jiang; Longteng Guo; Jin Zhou; Changyong Wang; Jing Liu
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Overview

This multi-subject and multi-session EEG dataset for modelling human visual object recognition (MSS) contains:

122-channel EEG data collected on 32 participants during natural visual stimulation;

totally 100 sessions for 1.5 hours each;

each session consists of 4 RSVP runs and 4 low-speed presentation runs;

each participant completed between 1 to 5 sessions on different days, around one week apart.

More details about the dataset are described as follows.

Participants

32 participants were recruited from college students in Beijing, of which 4 were female, and 28 were male, with an age range of 21-33 years. 100 sessions were conducted. They were paid and gave written informed consent. The study was conducted under the approval of the ethical committee of the Institute of Automation at the Chinese Academy of Sciences, with the approval number: IA21-2410-020201.

Experimental Procedures

RSVP experiment: During the RSVP experiment, the participants were shown images at a rate of 5 Hz, and each run consisted of 2,000 trials. There were 20 image categories, with 100 images in each category, making up the 2,000 stimuli. The 100 images in each category were further divided into five image sequences, resulting in 100 image sequences per run. Each sequence was composed of 20 images from the same class, and the 100 sequences were presented in a pseudo-random order.

After every 50 sequences, there was a break for the participants to rest. Each rapid serial sequence lasted approximately 7.5 seconds, starting with a 750ms blank screen with a white fixation cross, followed by 20 or 21 images presented at 5 Hz with a 50% duty cycle. The sequence ended with another 750ms blank screen.

After the rapid serial sequence, there was a 2-second interval during which participants were instructed to blink and then report whether a special image appeared in the sequence using a keyboard. During each run, 20 sequences were randomly inserted with additional special images at random positions. The special images are logos for brain-computer interfaces.

Low-speed experiment: During the low-speed experiment, each run consisted of 100 trials, with 1 second per image for a slower paradigm. The 100 stimuli were presented in a pseudo-random order and included 20 image categories, each containing 5 images. A break was given to the participants after every 20 images for them to rest.

Each image was displayed for 1 second and was followed by 11 choice boxes (1 correct class box, 9 random class boxes, and 1 reject box). Participants were required to select the correct class of the displayed image using a mouse to increase their engagement. After the selection, a white fixation cross was displayed for 1 second in the centre of the screen to remind participants to pay attention to the upcoming task.

Stimuli

The stimuli are from two image databases, ImageNet and PASCAL. The final set consists of 10,000 images, with 500 images for each class.

Annotations

In the derivatives/annotations folder, there are additional information of MSS:

Videos of two paradigms.

Dataset_info: Main features of MSS.

Experiment_schedule: Schedule of each session.

Stimuli_source: Source categories of ImageNet and PASCAL.

Subject_info: Age and sex of participants.

Task_event: The meaning of eventID.

Preprocessing

The EEG signals were pre-processed using the MNE package, version 1.3.1, with Python 3.9.16. The data was sampled at a rate of 1,000 Hz with a bandpass filter applied between 0.1 and 100 Hz. A notch filter was used to remove 50 Hz power frequency. Epochs were created for each trial ranging from 0 to 500 ms relative to stimulus onset. No further preprocessing or artefact correction methods were applied in technical validation. However, researchers may want to consider widely used preprocessing steps such as baseline correction or eye movement correction. After the preprocessing, each session resulted in two matrices: RSVP EEG data matrix of shape (8,000 image conditions × 122 EEG channels × 125 EEG time points) and low-speed EEG data matrix of shape (400 image conditions × 122 EEG channels × 125 EEG time points).
SoloFace: A Single-Face Dataset for Resource-Constrained Face Detection and...
zenodo.org
zip
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Riya Samanta; Riya Samanta; Bidyut Saha; Bidyut Saha (2024). SoloFace: A Single-Face Dataset for Resource-Constrained Face Detection and Tracking [Dataset]. http://doi.org/10.5281/zenodo.14474899
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14474899
Dataset updated
Dec 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Riya Samanta; Riya Samanta; Bidyut Saha; Bidyut Saha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SoloFace: A Single-Face Dataset for Resource-Constrained Face Detection and Tracking

Description
SoloFace is a custom dataset derived from the COCO-Faces and Visual Wake Word datasets, specifically designed for single-face detection tasks in resource-constrained environments. This dataset is ideal for developing machine learning models for embedded AI applications, such as TinyML, which operate on low-power devices. Each image either contains a single human face or no face, with corresponding labels providing class information and bounding box coordinates for face detection. The dataset includes data augmentation to ensure robustness across diverse conditions, such as variations in lighting, scale, and orientation.

Dataset Structure
The dataset is organized into three subsets: train, test, and val. Each subset contains:

images/: .jpg image files.

labels/: .json label files with matching filenames to the images.

Label Format
Each .json label file includes:

image: Name of the corresponding image file.

class: 1 if a face is present, 0 otherwise.

bbox: Normalized bounding box coordinates [top_left_x, top_left_y, bottom_right_x, bottom_right_y]. If no face is present, the bounding box is set to [0.0, 0.0, 0.01, 0.01].

Statistics

Original Dataset:

Training images: 11,272

Testing images: 3,732

Validation images: 434

After Data Augmentation:

Training images: 56,360

Testing and validation images remain unchanged.

Class Distribution:

50% of images contain a single visible human face.

50% contain no human face.

Data Augmentation Details
To improve model robustness, the following augmentation techniques were applied to the training set:

Geometric Transformations: Random rotation (±15 degrees), scaling (±20%), and horizontal flipping (50%).

Color Transformations: Brightness and contrast adjustments (±30%).

Cropping: Random cropping up to 10% from image edges.

Each augmentation preserved bounding box consistency with the transformed images.

Usage This dataset supports the following use cases:

Training lightweight face detection models optimized for microcontroller deployment.

Benchmarking single-face detection models in resource-constrained environments.

Research on model robustness and efficiency.

Loading the Dataset

Download the dataset.

Extract the dataset using:
unzip soloface-detection-dataset.zip

Dataset structure:
soloface-detection-dataset/ ├── train/ │ ├── images/ │ ├── labels/ ├── test/ │ ├── images/ │ ├── labels/ ├── val/ │ ├── images/ │ ├── labels/

License
This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Permissions: Copy, distribute, and adapt for any purpose, including commercial.

Conditions: Provide proper attribution, a link to the license, and indicate changes.

Restrictions: No additional legal or technological restrictions.

For more details, visit the CC BY 4.0 License.

Contact
For inquiries or collaborations, please contact:

Bidyut Saha: sahabidyut999@gmail.com

Riya Samanta: study.riya1792@gmail.com

This format fits Zenodo's description field requirements while providing clarity and structure. Let me know if further refinements are needed!
f
English Wikipedia Quality Asssessment Dataset
figshare.com
application/bzip2
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Morten Warncke-Wang (2023). English Wikipedia Quality Asssessment Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.1375406.v2
Explore at:
application/bzip2Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1375406.v2
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Morten Warncke-Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets of articles and their associated quality assessment rating from the English Wikipedia. Each dataset is self-contained as it also includes all content (wiki markup) associated with a given revision. The datasets have been split into a 90% training set and 10% test set using a stratified random sampling strategy.The 2017 dataset is the preferred dataset to use, contains 32,460 articles, and was gathered on 2017/09/10. The 2015 dataset is maintained for historic reference, and contains 30,272 articles gathered on 2015/02/05.The articles were sampled from six of English Wikipedia's seven assessment classes, with the exception of the Featured Article class, which contains all (2015 dataset) or almost all (2017 dataset) articles in that class at the time. Articles are assumed to belong to the highest quality class they are rated as and article history has been mined to find the appropriate revision associated with a given quality rating. Due to the low usage of A-class articles, this class is not part of the datasets. For more details, see "The Success and Failure of Quality Improvement Projects in Peer Production Communities" by Warncke-Wang et al. (CSCW 2015), linked below. These datasets have been used in training the wikiclass Python library machine learner, also linked below.
H
Replication Data for: The Fading American Dream: Trends in Absolute Income...
dataverse.harvard.edu
search.dataone.org
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raj Chetty; David Grusky; Maximilian Hell; Nathaniel Hendren; Robert Manduca; Jimmy Narang (2022). Replication Data for: The Fading American Dream: Trends in Absolute Income Mobility Since 1940 [Dataset]. http://doi.org/10.7910/DVN/B9TEWM
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/B9TEWM
Dataset updated
Feb 23, 2022
Dataset provided by
Harvard Dataverse
Authors
Raj Chetty; David Grusky; Maximilian Hell; Nathaniel Hendren; Robert Manduca; Jimmy Narang
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/B9TEWMhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/B9TEWM
Description
This dataset contains replication files for "The Fading American Dream: Trends in Absolute Income Mobility Since 1940" by Raj Chetty, David Grusky, Maximilian Hell, Nathaniel Hendren, Robert Manduca, and Jimmy Narang. For more information, see https://opportunityinsights.org/paper/the-fading-american-dream/. A summary of the related publication follows. One of the defining features of the “American Dream” is the ideal that children have a higher standard of living than their parents. We assess whether the U.S. is living up to this ideal by estimating rates of “absolute income mobility” – the fraction of children who earn more than their parents – since 1940. We measure absolute mobility by comparing children’s household incomes at age 30 (adjusted for inflation using the Consumer Price Index) with their parents’ household incomes at age 30. We find that rates of absolute mobility have fallen from approximately 90% for children born in 1940 to 50% for children born in the 1980s. Absolute income mobility has fallen across the entire income distribution, with the largest declines for families in the middle class. These findings are unaffected by using alternative price indices to adjust for inflation, accounting for taxes and transfers, measuring income at later ages, and adjusting for changes in household size. Absolute mobility fell in all 50 states, although the rate of decline varied, with the largest declines concentrated in states in the industrial Midwest, such as Michigan and Illinois. The decline in absolute mobility is especially steep – from 95% for children born in 1940 to 41% for children born in 1984 – when we compare the sons’ earnings to their fathers’ earnings. Why have rates of upward income mobility fallen so sharply over the past half-century? There have been two important trends that have affected the incomes of children born in the 1980s relative to those born in the 1940s and 1950s: lower Gross Domestic Product (GDP) growth rates and greater inequality in the distribution of growth. We find that most of the decline in absolute mobility is driven by the more unequal distribution of economic growth rather than the slowdown in aggregate growth rates. When we simulate an economy that restores GDP growth to the levels experienced in the 1940s and 1950s but distributes that growth across income groups as it is distributed today, absolute mobility only increases to 62%. In contrast, maintaining GDP at its current level but distributing it more broadly across income groups – at it was distributed for children born in the 1940s – would increase absolute mobility to 80%, thereby reversing more than two-thirds of the decline in absolute mobility. These findings show that higher growth rates alone are insufficient to restore absolute mobility to the levels experienced in mid-century America. Under the current distribution of GDP, we would need real GDP growth rates above 6% per year to return to rates of absolute mobility in the 1940s. Intuitively, because a large fraction of GDP goes to a small fraction of high-income households today, higher GDP growth does not substantially increase the number of children who earn more than their parents. Of course, this does not mean that GDP growth does not matter: changing the distribution of growth naturally has smaller effects on absolute mobility when there is very little growth to be distributed. The key point is that increasing absolute mobility substantially would require more broad-based economic growth. We conclude that absolute mobility has declined sharply in America over the past half-century primarily because of the growth in inequality. If one wants to revive the “American Dream” of high rates of absolute mobility, one must have an interest in growth that is shared more broadly across the income distribution.
d
Data from: Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach
catalog.data.gov
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach [Dataset]. https://catalog.data.gov/dataset/sparse-solutions-for-single-class-svms-a-bi-criterion-approach
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
In this paper we propose an innovative learning algorithm - a variation of One-class ? Support Vector Machines (SVMs) learning algorithm to produce sparser solutions with much reduced computational complexities. The proposed technique returns an approximate solution, nearly as good as the solution set obtained by the classical approach, by minimizing the original risk function along with a regularization term. We introduce a bi-criterion optimization that helps guide the search towards the optimal set in much reduced time. The outcome of the proposed learning technique was compared with the benchmark one-class Support Vector machines algorithm which more often leads to solutions with redundant support vectors. Through out the analysis, the problem size for both optimization routines was kept consistent. We have tested the proposed algorithm on a variety of data sources under different conditions to demonstrate the effectiveness. In all cases the proposed algorithm closely preserves the accuracy of standard one-class ? SVMs while reducing both training time and test time by several factors.
Income of individuals by age group, sex and income source, Canada, provinces...
www150.statcan.gc.ca
ouvert.canada.ca
+2more
Updated May 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas [Dataset]. http://doi.org/10.25318/1110023901-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1110023901-eng
Dataset updated
May 1, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas, annual.
d
Mapping enhanced grazing potential based on the NAWQA Wall-to-wall...
catalog.data.gov
data.usgs.gov
+2more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Mapping enhanced grazing potential based on the NAWQA Wall-to-wall Anthropogenic Land-use Trends (NWALT) product, 1974-2012 [Dataset]. https://catalog.data.gov/dataset/mapping-enhanced-grazing-potential-based-on-the-nawqa-wall-to-wall-anthropogenic-land-1974
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This dataset provides an additional "Grazing Potential" land use class to the previously published U.S. Geological Survey (USGS) National Water-Quality Program (NAWQA) Wall-to-Wall Anthropogenic Land-use Trends (NWALT) product (Falcone, 2015, USGS Data Series 948). As with the NWALT, the dataset consists of five national 60-m land use grids, for the years 1974, 1982, 1992, 2002, 2012. The only change to the dataset is, for every year, some pixels which are class 50 "Low-use" in the NWALT, are reclassified to a new class 46 "Grazing Potential Expanded". The purpose of the re-classification is to identify areas which are likely to have had at least some grazing activity based on agreement of historical land cover/use datasets, and not already captured as another land use class by the original NWALT. The re-classification occurred as follows: pixel would otherwise be in class 50 (Low Use), is in an Agriculture or Grazed class in Marschner and Anderson (1967), is in an Agriculture or Rangeland class in 1970s-era GIRAS, and is in a Grassland/Herbaceous class (71) in the NLCD 2011, without restrictions to proximity to water or slope. Falcone, J.A., 2015, U.S. conterminous wall-to-wall anthropogenic land use trends (NWALT), 1974–2012: U.S. Geological Survey Data Series 948, 33 p. plus appendixes 3–6 as separate files, http://dx.doi.org/10.3133/ds948. Marschner, F.J. and Anderson, J.R., 1967, Major land uses in the United States, U.S. Geological Survey, http://water.usgs.gov/GIS/metadata/usgswrd/XML/na70_landuse.xml
R
Object Detection Model For Identification Of Poisonous Plants In The...
universe.roboflow.com
zip
Updated Jun 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI MODEL V2 (2023). Object Detection Model For Identification Of Poisonous Plants In The Chesapeake Bay Watershed Dataset [Dataset]. https://universe.roboflow.com/ai-model-v2/object-detection-model-for-identification-of-poisonous-plants-in-the-chesapeake-bay-watershed-p8o4b
Explore at:
zipAvailable download formats
Dataset updated
Jun 12, 2023
Dataset authored and provided by
AI MODEL V2
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Chesapeake Bay
Variables measured
Poisonous Plants Bounding Boxes
Description
Project Title Object Detection Model for Identification of Poisonous Plants in the Chesapeake Bay Watershed by Shameer Rao

Model Overview Time outside is crucial for our health, but there are risks in the great outdoors. One of the most significant issues we may encounter is poisonous plants. There isn't a singular rule to recognize them; they are not all bright red, nor do they all have three leaves. An encounter with a harmful plant could cause rashes, itching, and swelling. This object detection model will identify the four most common poisonous plants in the Chesapeake Bay Watershed, with the intended audience being American states within the Chesapeake Bay area (Delaware, Maryland, New York, Pennsylvania, Virginia, and West Virginia—and the District of Columbia). The model will have five classes; Giant Hogweed (Heracleum mantegazzianum), Poison Hemlock (Conium maculatum), Spotted Water Hemlock (Cicuta maculata), Mayapple (Podophyllum peltatum), and a null class.

Model Structure Roboflow will be used to create the model, with five classes; Giant Hogweed (Heracleum mantegazzianum), Poison Hemlock (Conium maculatum), Spotted Water Hemlock (Cicuta maculata), Mayapple (Podophyllum peltatum), and a null class to store nonessential results. I have chosen Roboflow for its in-depth analytics and various optimization tools. The smart annotation tool is one of its best features and increases workflow when annotating. Additionally, having more familiarity with Roboflow compared to Google’s Teachable Machine is also an advantage.

Data Collection Plan Each class will be trained with 100 images taken during the daytime, containing as little background noise as possible, and focused on most parts of the plants. All photos must be in JPEG or PNG format. Image size will not be an eliminating factor, but all images will be backed-up on Google Drive in case of cropping or editing the pictures. With these parameters set, I hope that the rules either eliminate or reduce bias within this model. Additionally, the images will be collected from the iNaturalist, CDC, NPS, and MD DNR websites.

Minimal Viable Product The object detection model should reach a 40.0% mAP, 50.0% precision, and 40.0% recall to be considered a success, with each class at a 50% accuracy rate as well. Although my initial benchmark is low, I aim to reach this threshold in the first or second iteration of the model. Upon reaching this threshold, the final milestone should increase up to 65.0% mAP, 75.0% precision, and 60.0% recall. These milestones should be feasible as I reached 67.2% mAP, 76.0% precision, and 61.1% recall on my second iteration of the Shark Tooth Model. I expect Giant Hogweed (Heracleum mantegazzianum), and Spotted Water Hemlock (Cicuta maculata) classes to have a lower accuracy rate due to their similarity in features. Additionally, I expect Mayapple (Podophyllum peltatum) to perform the best as it has more distinct features than the other three classes.

Use cases for this project:

Ecological Conservation: Conservationists and ecologists in the Chesapeake Bay Watershed can use the object detection model to monitor and track the spread of these poisonous plant species. By detecting their presence in various ecosystems, specialists can take appropriate measures to control their growth and prevent damage to native species.

Public Health and Safety: Local governments and parks departments can utilize this model to identify and remove poisonous plants from public spaces such as parks, hiking trails, and playgrounds. This would reduce the risk of accidental exposure to these plants, ensuring a safer outdoor environment for the community.

Agricultural Management: Farmers and landowners in the Chesapeake Bay Watershed can use the computer vision model to detect the presence of poisonous plants on their property. This would help them avoid cultivating or accidentally spreading these toxic invaders, safeguarding their crops and livestock from possible harm.

Botanical Research: Researchers studying the ecology of the Chesapeake Bay Watershed can use the object detection model to conduct large-scale surveys of poisonous plant populations in the region. This data would provide valuable information on the distribution, abundance, and interactions between these toxic species and the surrounding environment.

Environmental Education: Educators can incorporate the object detection model into educational programs to teach students and the public about poisonous plants found in the Chesapeake Bay Watershed. This would raise awareness of these hazardous species, fostering a better understanding of local ecosystems and promoting responsible outdoor behaviors.
P
COMPASS-XP Dataset
paperswithcode.com
live.european-language-grid.eu
+1more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lewis D. Griffin; Matthew Caldwell; Jerone T. A. Andrews; Helene Bohler, COMPASS-XP Dataset [Dataset]. https://paperswithcode.com/dataset/compass-xp
Explore at:
Authors
Lewis D. Griffin; Matthew Caldwell; Jerone T. A. Andrews; Helene Bohler
Description
COMPASS-XP is a dataset of matched photographic and X-ray images of single objects, made available for use in Machine Learning & Computer Vision research, in particular in the context of transport security. Objects are imaged in multiple poses, and accompanied by metadata including labels for whether we consider the object to be dangerous in the context of aviation. Object classes overlap with those in the popular ImageNet Large Scale Visual Recognition Challenge class set and theWordNet lexical database, and identifiers for shared classes in both schemes are also provided.

Hardware Configuration Photographs were captured with a Sony DSC-W800 compact digital camera. X-ray scans were obtained using a Gilardoni FEP ME 536 mailroom X-ray machine, distributed in the UK by Todd Research under the name TR50. The scanner is dual energy and generates several image outputs: • Low: Raw 8-bit greyscale data from the scanner’s low energy X-ray channel.
• High: Raw 8-bit greyscale data from the scanner’s high energy X-ray channel.
• Density: 8-bit greyscale data representing inferred material density computed from the two channels.
• Grey: RGB PNG image representing a combination of both low and high energy channels with some appearance improvements. Although nominally greyscale, the image does include subtle duotone-style colouration.
• Colour RGB PNG image with false colour palette representing material density.

In practice the grey and colour versions are probably most useful, but for completeness the dataset includes all variants for each scan.

Data Files Image files are supplied in six subdirectories, corresponding to the five X-ray image variants above plus photos. X-rays are provided in PNG format, while photos are JPEG. Each scan is identified by a numeric index, which is also used to name the file, padded with leading zeros to always be 4 digits long.

Scan metadata is provided in the accompanying tab-delimited text file, meta.txt. This includes the following columns: • basename: The zero-padded identifier for the scan. All six image type variants for the same class-instance-pose have the same basename. X-ray files are named basename.png while photos are basename.jpg.
• class: The object class in the scan.
• instance: An integer identifying the object instance. Instances start at 1 for each class.
• pose: An integer identifying the object pose. Poses start at 1 for each instance.
• scan tray: Either A, indicating that the pose was imaged in a weighted tray, or N indicating it was not.
• dangerous: Whether the object was considered dangerous (True/False).
• IN id: Numeric index of the object class in the ILSVRC list of 1000 classes, or empty if the class isn’t present there.
• WN id: WordNet identifier for the object class, or empty if the class isn’t present inWordNet.

License The COMPASS-XP dataset was acquired as part of a research project funded by the UK Government Future Aviation Security Solutions programme. Both the images and their metadata are licensed under the Creative Commons Attribution 4.0 International License and may be freely used for research and commercial purpose, including derivative works, providing the source is acknowledged.

COMPASS-XP Dataset Authors Lewis D. Griffin*, Matthew Caldwell, Jerone T. A. Andrews Computational Security Science Group, UCL * l.griffin@cs.ucl.ac.uk
P
ExDark Dataset
paperswithcode.com
Updated Nov 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuen Peng Loh; Chee Seng Chan (2021). ExDark Dataset [Dataset]. https://paperswithcode.com/dataset/exdark
Explore at:
Dataset updated
Nov 19, 2023
Authors
Yuen Peng Loh; Chee Seng Chan
Description
The Exclusively Dark (ExDARK) dataset is a collection of 7,363 low-light images from very low-light environments to twilight (i.e 10 different conditions) with 12 object classes (similar to PASCAL VOC) annotated on both image class level and local object bounding boxes.

Facebook

Twitter

Click to copy link

Link copied

Cite

California Department of Housing and Community Development (2024). Income Limits by County [Dataset]. https://data.ca.gov/dataset/income-limits-by-county

Income Limits by County

Explore at:

csv(15447), csv(15546), docx(31186)Available download formats

Dataset updated

Feb 7, 2024

Dataset provided by

California Department of Housing & Community Developmenthttps://hcd.ca.gov/

Authors

California Department of Housing and Community Development

License

U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically

Description

California State Income Limits reflect updated median income and household income levels for acutely low-, extremely low-, very low-, low- and moderate-income households for California’s 58 counties (required by Health and Safety Code Section 50093). These income limits apply to State and local affordable housing programs statutorily linked to HUD income limits and differ from income limits applicable to other specific federal, State, or local programs.

Clear search

Close search

Google apps

Main menu

Income Limits by County

Globalization and Income Distribution Dataset 1975-2002 - Aruba,...

Abstract

Kind of data

Data from: CADDI: An in-Class Activity Detection Dataset using IMU data from...

Low income cut-offs (LICOs) before and after tax by community size and...

Training dataset for NABat Machine Learning V1.0

Tucson Equity Priority Index (TEPI): Citywide Census Tracts

Limited Resources Sub-Index: TEPI Citywide Census Tracts

United States US: Income Share Held by Highest 20%

Class 1 RAV Low Loader Overmass Category 1 - Datasets - data.wa.gov.au

Fruits-360 dataset

Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds

Version: 2025.06.07.0

Content

Branches

How to cite

Dataset properties

For the 100x100 branch

For the original-size branch

For the 3-body-problem branch

For the meta branch

For the multi branch

Filename format:

For the 100x100 branch

For the original-size branch

For the multi branch

Alternate download

How fruits were filmed

Data from: A multi-subject and multi-session EEG dataset for modelling human...

Overview

Participants

Experimental Procedures

Stimuli

Annotations

Preprocessing

SoloFace: A Single-Face Dataset for Resource-Constrained Face Detection and...

English Wikipedia Quality Asssessment Dataset

Replication Data for: The Fading American Dream: Trends in Absolute Income...

Data from: Sparse Solutions for Single Class SVMs: A Bi-Criterion Approach

Income of individuals by age group, sex and income source, Canada, provinces...

Mapping enhanced grazing potential based on the NAWQA Wall-to-wall...

Object Detection Model For Identification Of Poisonous Plants In The...

COMPASS-XP Dataset

ExDark Dataset

Income Limits by County