Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Middle Inlet, Wisconsin, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
https://i.neilsberg.com/ch/middle-inlet-wi-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Middle Inlet, Wisconsin (in 2022 inflation-adjusted dollars))">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Middle Inlet town median household income. You can refer the same here
Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas, annual.
This table presents income shares, thresholds, tax shares, and total counts of individual Canadian tax filers, with a focus on high income individuals (95% income threshold, 99% threshold, etc.). Income thresholds are based on national threshold values, regardless of selected geography; for example, the number of Nova Scotians in the top 1% will be calculated as the number of taxfiling Nova Scotians whose total income exceeded the 99% national income threshold. Different definitions of income are available in the table namely market, total, and after-tax income, both with and without capital gains.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Eugene, OR, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Eugene median household income. You can refer the same here
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Action recognition has received increasing attentions from the computer vision and machine learning community in the last decades. Ever since then, the recognition task has evolved from single view recording under controlled laboratory environment to unconstrained environment (i.e., surveillance environment or user generated videos). Furthermore, recent work focused on other aspect of action recognition problem, such as cross-view classification, cross domain learning, multi-modality learning, and action localization. Despite the large variations of studies, we observed limited works that explore the open-set and open-view classification problem, which is a genuine inherited properties in action recognition problem. In other words, a well designed algorithm should robustly identify an unfamiliar action as “unknown” and achieved similar performance across sensors with similar field of view. The Multi-Camera Action Dataset (MCAD) is designed to evaluate the open-view classification problem under surveillance environment.
In our multi-camera action dataset, different from common action datasets we use a total of five cameras, which can be divided into two types of cameras (StaticandPTZ), to record actions. Particularly, there are three Static cameras (Cam04 & Cam05 & Cam06) with fish eye effect and two PanTilt-Zoom (PTZ) cameras (PTZ04 & PTZ06). Static camera has a resolution of 1280×960 pixels, while PTZ camera has a resolution of 704×576 pixels and a smaller field of view than Static camera. What’s more, we don’t control the illumination environment. We even set two contrasting conditions (Daytime and Nighttime environment) which makes our dataset more challenge than many controlled datasets with strongly controlled illumination environment.The distribution of the cameras is shown in the picture on the right.
We identified 18 units single person daily actions with/without object which are inherited from the KTH, IXMAS, and TRECIVD datasets etc. The list and the definition of actions are shown in the table. These actions can also be divided into 4 types actions. Micro action without object (action ID of 01, 02 ,05) and with object (action ID of 10, 11, 12 ,13). Intense action with object (action ID of 03, 04 ,06, 07, 08, 09) and with object (action ID of 14, 15, 16, 17, 18). We recruited a total of 20 human subjects. Each candidate repeats 8 times (4 times during the day and 4 times in the evening) of each action under one camera. In the recording process, we use five cameras to record each action sample separately. During recording stage we just tell candidates the action name then they could perform the action freely with their own habit, only if they do the action in the field of view of the current camera. This can make our dataset much closer to reality. As a results there is high intra action class variation among different action samples as shown in picture of action samples.
URL: http://mmas.comp.nus.edu.sg/MCAD/MCAD.html
Resources:
How to Cite:
Please cite the following paper if you use the MCAD dataset in your work (papers, articles, reports, books, software, etc):
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The GameOfLife dataset is an algorithmically generated dataset based off John Horton Conway's Game of Life. Conway's Game of Life follows a strict set off rules at each "generation" (simulation step) where cells alternate between a dead and alive state based on number of surrounding alive cells. These rules can be found on the Game of Life's Wikipedia pageThis dataset is one of the three hidden datasets used by the 2025 NAS Unseen-Data Challenge at AutoML.The goal of this dataset is to predict the number of cells alive in the next generation. This task is relatively simple for a human to do if a bit tedious, and should theoretically be simple for Machine Learning algorithms. Each cells's state is calculated based off the number of alive neighbour's in the previous step. Effectively for every cell we only need to look at the surrounding eight cells (3x3 square, minus the centre) which means all information for each cell can be found from a 3x3 Convolution, which is a very common kernel size to use. The dataset was used to make sure that participants appraoches could handle simple tasks along with the more complicated tasks to make sure they did not overcomplicate their submission.There are 70,000 images in the dataset where each image is a randomly generated starting configuration of the Game of Life, with a random level of density (number of initial alive cells).The data is stored in a channels-first format with a shape of (n, 1, 10, 10) where n is the number of samples in the corresponding set (50,000 for training, 10,000 for validation, and 10,000 for testing).There are 25 classes in this dataset, where the label (0..24) represents the number of alive celss in the next generation and images are evenly distributed by class across the dataset (2800 each, 2000, 400, 400 for training, validation and testing respectively). We limit the data to 25 classes despite theoretically a limit of 0-100, we do this as the higher classes are increasingly unlikely to occur, and would take much longer to create a balanced dataset. Excluding 0, the lower numbers also get increasingly unlikely, though more likely than higher numbers, we wanted to prevent gaps and therefore limited to 25 contiguous classesNumPy (.npy) files can be opened through the NumPy Python library, using the numpy.load()
function by inputting the path to the file into the function as a parameter. The metadata file contains some basic information about the datasets, and can be opened in many text editors such as vim, nano, notepad++, notepad, etc.
For detailed information, visit the Tucson Equity Priority Index StoryMap.Download the layer's data dictionaryWhat is the Tucson Equity Priority Index (TEPI)?The Tucson Equity Priority Index (TEPI) is a tool that describes the distribution of socially vulnerable demographics. It categorizes the dataset into 5 classes that represent the differing prioritization needs based on the presence of social vulnerability: Low (0-20), Low-Moderate (20-40), Moderate (40-60), Moderate-High (60-80) High (80-100). Each class represents 20% of the dataset’s features in order of their values. The features within the Low (0-20) classification represent the areas that, when compared to all other locations in the study area, have the lowest need for prioritization, as they tend to have less socially vulnerable demographics. The features that fall into the High (80-100) classification represent the 20% of locations in the dataset that have the greatest need for prioritization, as they tend to have the highest proportions of socially vulnerable demographics. How is social vulnerability measured?The Tucson Equity Priority Index (TEPI) examines the proportion of vulnerability per feature using 11 demographic indicators:Income Below Poverty: Households with income at or below the federal poverty level (FPL), which in 2023 was $14,500 for an individual and $30,000 for a family of fourUnemployment: Measured as the percentage of unemployed persons in the civilian labor forceHousing Cost Burdened: Homeowners who spend more than 30% of their income on housing expenses, including mortgage, maintenance, and taxesRenter Cost Burdened: Renters who spend more than 30% of their income on rentNo Health Insurance: Those without private health insurance, Medicare, Medicaid, or any other plan or programNo Vehicle Access: Households without automobile, van, or truck accessHigh School Education or Less: Those highest level of educational attainment is a High School diploma, equivalency, or lessLimited English Ability: Those whose ability to speak English is "Less Than Well."People of Color: Those who identify as anything other than Non-Hispanic White Disability: Households with one or more physical or cognitive disabilities Age: Groups that tend to have higher levels of vulnerability, including children (those below 18), and seniors (those 65 and older)An overall percentile value is calculated for each feature based on the total proportion of the above indicators in each area. How are the variables combined?These indicators are divided into two main categories that we call Thematic Indices: Economic and Personal Characteristics. The two thematic indices are further divided into five sub-indices called Tier-2 Sub-Indices. Each Tier-2 Sub-Index contains 2-3 indicators. Indicators are the datasets used to measure vulnerability within each sub-index. The variables for each feature are re-scaled using the percentile normalization method, which converts them to the same scale using values between 0 to 100. The variables are then combined first into each of the five Tier-2 Sub-Indices, then the Thematic Indices, then the overall TEPI using the mean aggregation method and equal weighting. The resulting dataset is then divided into the five classes, where:High Vulnerability (80-100%): Representing the top classification, this category includes the highest 20% of regions that are the most socially vulnerable. These areas require the most focused attention. Moderate-High Vulnerability (60-80%): This upper-middle classification includes areas with higher levels of vulnerability compared to the median. While not the highest, these areas are more vulnerable than a majority of the dataset and should be considered for targeted interventions. Moderate Vulnerability (40-60%): Representing the middle or median quintile, this category includes areas of average vulnerability. These areas may show a balanced mix of high and low vulnerability. Detailed examination of specific indicators is recommended to understand the nuanced needs of these areas. Low-Moderate Vulnerability (20-40%): Falling into the lower-middle classification, this range includes areas that are less vulnerable than most but may still exhibit certain vulnerable characteristics. These areas typically have a mix of lower and higher indicators, with the lower values predominating. Low Vulnerability (0-20%): This category represents the bottom classification, encompassing the lowest 20% of data points. Areas in this range are the least vulnerable, making them the most resilient compared to all other features in the dataset.
The National Hydrography Dataset Plus High Resolution (NHDplus High Resolution) maps the lakes, ponds, streams, rivers and other surface waters of the United States. Created by the US Geological Survey, NHDPlus High Resolution provides mean annual flow and velocity estimates for rivers and streams. Additional attributes provide connections between features facilitating complicated analyses.For more information on the NHDPlus High Resolution dataset see the User’s Guide for the National Hydrography Dataset Plus (NHDPlus) High Resolution.Dataset SummaryPhenomenon Mapped: Surface waters and related features of the United States and associated territoriesGeographic Extent: The Contiguous United States, Hawaii, portions of Alaska, Puerto Rico, Guam, US Virgin Islands, Northern Marianas Islands, and American SamoaProjection: Web Mercator Auxiliary Sphere Visible Scale: Visible at all scales but layer draws best at scales larger than 1:1,000,000Source: USGSUpdate Frequency: AnnualPublication Date: July 2022This layer was symbolized in the ArcGIS Map Viewer and while the features will draw in the Classic Map Viewer the advanced symbology will not. Prior to publication, the network and non-network flowline feature classes were combined into a single flowline layer. Similarly, the Area and Waterbody feature classes were merged under a single schema.Attribute fields were added to the flowline and waterbody layers to simplify symbology and enhance the layer's pop-ups. Fields added include Pop-up Title, Pop-up Subtitle, Esri Symbology (waterbodies only), and Feature Code Description. All other attributes are from the original dataset. No data values -9999 and -9998 were converted to Null values.What can you do with this layer?Feature layers work throughout the ArcGIS system. Generally your work flow with feature layers will begin in ArcGIS Online or ArcGIS Pro. Below are just a few of the things you can do with a feature service in Online and Pro.ArcGIS OnlineAdd this layer to a map in the map viewer. The layer or a map containing it can be used in an application. Change the layer’s transparency and set its visibility rangeOpen the layer’s attribute table and make selections. Selections made in the map or table are reflected in the other. Center on selection allows you to zoom to features selected in the map or table and show selected records allows you to view the selected records in the table.Apply filters. For example you can set a filter to show larger streams and rivers using the mean annual flow attribute or the stream order attribute.Change the layer’s style and symbologyAdd labels and set their propertiesCustomize the pop-upUse as an input to the ArcGIS Online analysis tools. This layer works well as a reference layer with the trace downstream and watershed tools. The buffer tool can be used to draw protective boundaries around streams and the extract data tool can be used to create copies of portions of the data.ArcGIS ProAdd this layer to a 2d or 3d map.Use as an input to geoprocessing. For example, copy features allows you to select then export portions of the data to a new feature class.Change the symbology and the attribute field used to symbolize the dataOpen table and make interactive selections with the mapModify the pop-upsApply Definition Queries to create sub-sets of the layerThis layer is part of the ArcGIS Living Atlas of the World that provides an easy way to explore the landscape layers and many other beautiful and authoritative maps on hundreds of topics.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.
For detailed information, visit the Tucson Equity Priority Index StoryMap.Download the Data DictionaryWhat is the Tucson Equity Priority Index (TEPI)?The Tucson Equity Priority Index (TEPI) is a tool that describes the distribution of socially vulnerable demographics. It categorizes the dataset into 5 classes that represent the differing prioritization needs based on the presence of social vulnerability: Low (0-20), Low-Moderate (20-40), Moderate (40-60), Moderate-High (60-80) High (80-100). Each class represents 20% of the dataset’s features in order of their values. The features within the Low (0-20) classification represent the areas that, when compared to all other locations in the study area, have the lowest need for prioritization, as they tend to have less socially vulnerable demographics. The features that fall into the High (80-100) classification represent the 20% of locations in the dataset that have the greatest need for prioritization, as they tend to have the highest proportions of socially vulnerable demographics. How is social vulnerability measured?The Tucson Equity Priority Index (TEPI) examines the proportion of vulnerability per feature using 11 demographic indicators:Income Below Poverty: Households with income at or below the federal poverty level (FPL), which in 2023 was $14,500 for an individual and $30,000 for a family of fourUnemployment: Measured as the percentage of unemployed persons in the civilian labor forceHousing Cost Burdened: Homeowners who spend more than 30% of their income on housing expenses, including mortgage, maintenance, and taxesRenter Cost Burdened: Renters who spend more than 30% of their income on rentNo Health Insurance: Those without private health insurance, Medicare, Medicaid, or any other plan or programNo Vehicle Access: Households without automobile, van, or truck accessHigh School Education or Less: Those highest level of educational attainment is a High School diploma, equivalency, or lessLimited English Ability: Those whose ability to speak English is "Less Than Well."People of Color: Those who identify as anything other than Non-Hispanic White Disability: Households with one or more physical or cognitive disabilities Age: Groups that tend to have higher levels of vulnerability, including children (those below 18), and seniors (those 65 and older)An overall percentile value is calculated for each feature based on the total proportion of the above indicators in each area. How are the variables combined?These indicators are divided into two main categories that we call Thematic Indices: Economic and Personal Characteristics. The two thematic indices are further divided into five sub-indices called Tier-2 Sub-Indices. Each Tier-2 Sub-Index contains 2-3 indicators. Indicators are the datasets used to measure vulnerability within each sub-index. The variables for each feature are re-scaled using the percentile normalization method, which converts them to the same scale using values between 0 to 100. The variables are then combined first into each of the five Tier-2 Sub-Indices, then the Thematic Indices, then the overall TEPI using the mean aggregation method and equal weighting. The resulting dataset is then divided into the five classes, where:High Vulnerability (80-100%): Representing the top classification, this category includes the highest 20% of regions that are the most socially vulnerable. These areas require the most focused attention. Moderate-High Vulnerability (60-80%): This upper-middle classification includes areas with higher levels of vulnerability compared to the median. While not the highest, these areas are more vulnerable than a majority of the dataset and should be considered for targeted interventions. Moderate Vulnerability (40-60%): Representing the middle or median quintile, this category includes areas of average vulnerability. These areas may show a balanced mix of high and low vulnerability. Detailed examination of specific indicators is recommended to understand the nuanced needs of these areas. Low-Moderate Vulnerability (20-40%): Falling into the lower-middle classification, this range includes areas that are less vulnerable than most but may still exhibit certain vulnerable characteristics. These areas typically have a mix of lower and higher indicators, with the lower values predominating. Low Vulnerability (0-20%): This category represents the bottom classification, encompassing the lowest 20% of data points. Areas in this range are the least vulnerable, making them the most resilient compared to all other features in the dataset.
https://cdla.dev/permissive-1-0/https://cdla.dev/permissive-1-0/
This dataset contains high-resolution aerial imagery from the USDA NAIP program [1], high-resolution land cover labels from the Chesapeake Conservancy, low-resolution land cover labels from the USGS NLCD 2011 dataset, low-resolution multi-spectral imagery from Landsat 8, and high-resolution building footprint masks from Microsoft Bing, formatted to accelerate machine learning research into land cover mapping. The Chesapeake Conservancy spent over 10 months and $1.3 million creating a consistent six-class land cover dataset covering the Chesapeake Bay watershed. While the purpose of the mapping effort by the Chesapeake Conservancy was to create land cover data to be used in conservation efforts, the same data can be used to train machine learning models that can be applied over even wider areas. The organization of this dataset (detailed below) will allow users to easily test questions related to this problem of geographic generalization, i.e. how to train machine learning models that can be applied over even wider areas. For example, this dataset can be used to directly estimate how well a model trained on data from Maryland can generalize over the remainder of the Chesapeake Bay.
Families of tax filers; Single-earner and dual-earner census families by number of children (final T1 Family File; T1FF).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This curated dataset offers a valuable resource for deep learning applications in drug discovery and repositioning domains. It contains 5,350 high-resolution images systematically categorized into pharmacological classes and molecular targets. The pharmacological classes encompass antifungals, antivirals, corticosteroids, diuretics, and non-steroidal anti-inflammatory drugs (NSAIDs), while the molecular targets emphasize Alzheimer's disease-related enzymes, including acetylcholinesterase, butyrylcholinesterase, and beta-secretase 1. The dataset was meticulously compiled using data from well-established databases, including DrugBank, ChEMBL, and DUD-E, ensuring diversity and quality in the compounds selected for training. Active compounds (true positives) were sourced from DrugBank and ChEMBL, while decoy compounds (true negatives) were generated using the DUD-E protocol. The decoy compounds are designed to match the physicochemical properties of active compounds while lacking binding affinity, creating a robust benchmark for machine learning evaluation. The balanced structure of the dataset, with equal representation of true positive and decoy compounds, enhances its suitability for binary and multi-class classification tasks. The collection of compounds is diverse and of high quality, thus supporting a wide range of deep learning tasks, including pharmacological class prediction, virtual screening, and molecular target identification. This ultimately advances computational approaches in drug discovery.
With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.
What is the Tucson Equity Tree Index (TETI)?The Tucson Equity Tree Index (TETI) is a tool that describes the distribution of tree canopy as it relates to social vulnerability. It categorizes the dataset into 5 classes that represent the differing prioritization needs for improving tree canopy coverage: Low (0-20), Low-Moderate (20-40), Moderate (40-60), Moderate-High (60-80) High (80-100). Each class represents 20% of the dataset’s features in order of their values. The features within the Low (0-20) classification represent the areas that, when compared to all other locations in the study area, have the lowest need for tree planting prioritization, as they tend to have more existing tree canopy and less socially vulnerable demographics. The features that fall into the High (80-100) classification represent the 20% of locations in the dataset that have the greatest need for tree planting prioritization, as they tend to have the least existing tree canopy and the highest proportions of socially vulnerable demographics. How is tree canopy measured?To measure tree canopy, the TETI calculates a Gap Score using the difference between the goal canopy coverage per feature (15%) and the percent of existing tree canopy coverage. The existing tree canopy coverage was calculated from PAG’s 2015 tree canopy dataset, which can be viewed here.How is social vulnerability measured?The TETI incorporates the Tucson Equity Priority Index (TEPI), which examines the proportion of vulnerability per feature using 11 demographic indicators:• % of households below poverty• % unemployed• % with no personal vehicle access• % with no health insurance• % that experience renter cost-burden• % that experience homeowner cost-burden• % that are a vulnerable age (children under 18 and seniors 65 and older)• % of households with 1 or more disabilities• % people of color• % highest educational attainment is high school or less• % with low English proficiencyAn overall percentile value is calculated for each feature based on the total proportion of the above indicators in each area. How are the variables combined?The two variables (tree canopy gap score and vulnerability score) for each feature are re-scaled using the percentile normalization method, which converts them to the same scale using values between 0 to 100. The variables are then combined using the mean aggregation method and weighting the tree canopy gap score and the vulnerability score equally. The resulting dataset is then divided into the five classes, where:High Vulnerability (80-100%): Representing the top classification, this category includes the highest 20% of regions that are the most socially vulnerable and have the largest tree canopy gap scores. These areas require the most focused attention. Moderate-High Vulnerability (60-80%): This upper-middle classification includes areas with higher levels of vulnerability and tree canopy gap scores compared to the median. While not the highest, these areas are more vulnerable and in need of tree canopy than a majority of the dataset and should be considered for targeted interventions. Moderate Vulnerability (40-60%): Representing the middle or median quintile, this category includes areas of average vulnerability and canopy gap scores. These areas may show a balanced mix of high and low vulnerability and canopy indicators. Detailed examination of specific indicators is recommended to understand the nuanced needs of these areas. Low-Moderate Vulnerability (20-40%): Falling into the lower-middle classification, this range includes areas that are less vulnerable and have more tree canopy than most but may still exhibit certain vulnerable characteristics or tree canopy needs. These areas typically have a mix of lower and higher indicators, with the lower values predominating. Low Vulnerability (0-20%): This category represents the bottom classification, encompassing the lowest 20% of data points. Areas in this range are the least vulnerable and have the greatest existing tree canopy, making them the most resilient compared to all other features in the dataset. How is the TETI different from the Tucson Tree Equity Score index?The TETI is designed to be the updated version of the Tucson Tree Equity score index in that:The demographic variables of vulnerability use the most current data and reflect how the City of Tucson as a whole is defining social vulnerabilityThis version of TETI uses census block groups instead of Tucson neighborhoods, which results in areas that are more comparable to one another in population size and an overall index that is more comparable to other existing indices (Tucson Equity Priority Index (TEPI), the Climate and Economic Justice Screening Tool (CEJST), the American Forests Tree Equity Score National Explorer, etc.)The classification method for the TETI results in equal ranges for each classification based on relative need for prioritization, whereas the Tucson Tree Equity Index has a more arbitrary classification scheme and does not have equal classification ranges
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Doodleverse/Segmentation Zoo/Seg2Map Res-UNet models for DeepGlobe/7-class segmentation of RGB 512x512 high-res. images
These Residual-UNet model data are based on the DeepGlobe dataset
Models have been created using Segmentation Gym* using the following dataset**: https://www.kaggle.com/datasets/balraj98/deepglobe-land-cover-classification-dataset
Image size used by model: 512 x 512 x 3 pixels
classes: 1. urban 2. agricultural 3. rangeland 4. forest 5. water 6. bare 7. unknown
File descriptions
For each model, there are 5 files with the same root name:
'.json' config file: this is the file that was used by Segmentation Gym* to create the weights file. It contains instructions for how to make the model and the data it used, as well as instructions for how to use the model for prediction. It is a handy wee thing and mastering it means mastering the entire Doodleverse.
'.h5' weights file: this is the file that was created by the Segmentation Gym* function train_model.py
. It contains the trained model's parameter weights. It can called by the Segmentation Gym* function seg_images_in_folder.py
. Models may be ensembled.
'_modelcard.json' model card file: this is a json file containing fields that collectively describe the model origins, training choices, and dataset that the model is based upon. There is some redundancy between this file and the config
file (described above) that contains the instructions for the model training and implementation. The model card file is not used by the program but is important metadata so it is important to keep with the other files that collectively make the model and is such is considered part of the model
'_model_history.npz' model training history file: this numpy archive file contains numpy arrays describing the training and validation losses and metrics. It is created by the Segmentation Gym function train_model.py
'.png' model training loss and mean IoU plot: this png file contains plots of training and validation losses and mean IoU scores during model training. A subset of data inside the .npz file. It is created by the Segmentation Gym function train_model.py
Additionally, BEST_MODEL.txt contains the name of the model with the best validation loss and mean IoU
References *Segmentation Gym: Buscombe, D., & Goldstein, E. B. (2022). A reproducible and reusable pipeline for segmentation of geoscientific imagery. Earth and Space Science, 9, e2022EA002332. https://doi.org/10.1029/2022EA002332 See: https://github.com/Doodleverse/segmentation_gym
**Demir, I., Koperski, K., Lindenbaum, D., Pang, G., Huang, J., Basu, S., Hughes, F., Tuia, D. and Raskar, R., 2018. Deepglobe 2018: A challenge to parse the earth through satellite images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 172-181).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Goldenhar Syndrome Craniofacial Image Dataset (Goldenhar-CFID) is a high-resolution dataset designed for the automated detection and classification of craniofacial abnormalities associated with Goldenhar Syndrome (GS). It comprises 4,483 images, categorized into seven distinct classes of craniofacial deformities. This dataset serves as a valuable resource for researchers in medical image analysis, deep learning, and clinical decision-making. Dataset Characteristics: Total Images: 4,483 Number of Classes: 7 Image Format: JPG Image Resolution: 640 x 640 pixels Annotation: Each image is manually labeled and verified by medical experts Data Preprocessing: Auto-orientation and histogram equalization applied for enhanced feature detection Augmentation Techniques: Rotation, scaling, brightness adjustments, flipping, and contrast modifications Categories and Annotations The dataset includes images categorized into seven craniofacial deformities: Cleft Lip and Palate – Congenital anomaly where the upper lip and/or palate fails to develop properly. Epibulbar Dermoid Tumor – Benign growth on the eye’s surface, typically at the cornea-sclera junction. Eyelid Coloboma – Defect characterized by a partial or complete absence of eyelid tissue. Facial Asymmetry – Uneven development of facial structures. Malocclusion – Misalignment of the teeth and jaws. Microtia – Underdeveloped or absent outer ear. Vertebral Abnormality – Irregular development of spinal vertebrae. Dataset Structure and Splitting The dataset consists of four main subdirectories: Original – Contains 547 raw images. Unaugmented Balanced – Contains 210 images per class. Augmented Unbalanced – Includes 4,483 images with augmentation. Augmented Balanced – Contains 756 images per class. The dataset is split into: Training Set: 80% Validation Set: 10% Test Set: 10%
*This dataset is authored by ESRI and is being shared as a direct link to the feature service by Pend Oreille County. NHD is a primary hydrologic reference used by our organization.The National Hydrography Dataset Plus High Resolution (NHDplus High Resolution) maps the lakes, ponds, streams, rivers and other surface waters of the United States. Created by the US Geological Survey, NHDPlus High Resolution provides mean annual flow and velocity estimates for rivers and streams. Additional attributes provide connections between features facilitating complicated analyses.For more information on the NHDPlus High Resolution dataset see the User’s Guide for the National Hydrography Dataset Plus (NHDPlus) High Resolution.Dataset SummaryPhenomenon Mapped: Surface waters and related features of the United States and associated territoriesCoordinate System: Web Mercator Auxiliary Sphere Extent: The Contiguous United States, Hawaii, portions of Alaska, Puerto Rico, Guam, US Virgin Islands, Northern Marianas Islands, and American Samoa Visible Scale: Visible at all scales but layer draws best at scales larger than 1:1,000,000Source: USGSPublication Date: July 2022This layer was symbolized in the ArcGIS Map Viewer and while the features will draw in the Classic Map Viewer the advanced symbology will not.Prior to publication, the network and non-network flowline feature classes were combined into a single flowline layer. Similarly, the Area and Waterbody feature classes were merged under a single schema.Attribute fields were added to the flowline and waterbody layers to simplify symbology and enhance the layer's pop-ups. Fields added include Pop-up Title, Pop-up Subtitle, Esri Symbology (waterbodies only), and Feature Code Description. All other attributes are from the original dataset. No data values -9999 and -9998 were converted to Null values.What can you do with this Feature Layer?Feature layers work throughout the ArcGIS system. Generally your work flow with feature layers will begin in ArcGIS Online or ArcGIS Pro. Below are just a few of the things you can do with a feature service in Online and Pro.ArcGIS OnlineAdd this layer to a map in the map viewer. The layer or a map containing it can be used in an application. Change the layer’s transparency and set its visibility rangeOpen the layer’s attribute table and make selections. Selections made in the map or table are reflected in the other. Center on selection allows you to zoom to features selected in the map or table and show selected records allows you to view the selected records in the table.Apply filters. For example you can set a filter to show larger streams and rivers using the mean annual flow attribute or the stream order attribute.Change the layer’s style and symbologyAdd labels and set their propertiesCustomize the pop-upUse as an input to the ArcGIS Online analysis tools. This layer works well as a reference layer with the trace downstream and watershed tools. The buffer tool can be used to draw protective boundaries around streams and the extract data tool can be used to create copies of portions of the data.ArcGIS ProAdd this layer to a 2d or 3d map.Use as an input to geoprocessing. For example, copy features allows you to select then export portions of the data to a new feature class.Change the symbology and the attribute field used to symbolize the dataOpen table and make interactive selections with the mapModify the pop-upsApply Definition Queries to create sub-sets of the layerThis layer is part of the ArcGIS Living Atlas of the World that provides an easy way to explore the landscape layers and many other beautiful and authoritative maps on hundreds of topics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview This is the largest Gastrointestinal dataset generously provided by Simula Research Laboratory in Norway
You can read their research paper here in Nature
In total, the dataset contains 10,662 labeled images stored using the JPEG format. The images can be found in the images folder. The classes, which each of the images belong to, correspond to the folder they are stored in (e.g., the ’polyp’ folder contains all polyp images, the ’barretts’ folder contains all images of Barrett’s esophagus, etc.). Each class-folder is located in a subfolder describing the type of finding, which again is located in a folder describing wheter it is a lower GI or upper GI finding. The number of images per class are not balanced, which is a general challenge in the medical field due to the fact that some findings occur more often than others. This adds an additional challenge for researchers, since methods applied to the data should also be able to learn from a small amount of training data. The labeled images represent 23 different classes of findings.
The data is collected during real gastro- and colonoscopy examinations at a Hospital in Norway and partly labeled by experienced gastrointestinal endoscopists.
Use Cases
"Artificial intelligence is currently a hot topic in medicine. The fact that medical data is often sparse and hard to obtain due to legal restrictions and lack of medical personnel to perform the cumbersome and tedious labeling of the data, leads to technical limitations. In this respect, we share the Hyper-Kvasir dataset, which is the largest image and video dataset from the gastrointestinal tract available today."
"We have used the labeled data to research the classification and segmentation of GI findings using both computer vision and ML approaches to potentially be used in live and post-analysis of patient examinations. Areas of potential utilization are analysis, classification, segmentation, and retrieval of images and videos with particular findings or particular properties from the computer science area. The labeled data can also be used for teaching and training in medical education. Having expert gastroenterologists providing the ground truths over various findings, HyperKvasir provides a unique and diverse learning set for future clinicians. Moreover, the unlabeled data is well suited for semi-supervised and unsupervised methods, and, if even more ground truth data is needed, the users of the data can use their own local medical experts to provide the needed labels. Finally, the videos can in addition be used to simulate live endoscopies feeding the video into the system like it is captured directly from the endoscopes enable developers to do image classification."
Borgli, H., Thambawita, V., Smedsrud, P.H. et al. HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci Data 7, 283 (2020). https://doi.org/10.1038/s41597-020-00622-y
Using this Dataset
Hyper-Kvasir is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source. This means that in all documents and papers that use or refer to the Hyper-Kvasir dataset or report experimental results based on the dataset, a reference to the related article needs to be added: PREPRINT: https://osf.io/mkzcq/. Additionally, one should provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About Roboflow
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their boilerplate code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
https://doi.org/10.17026/fp39-0x58https://doi.org/10.17026/fp39-0x58
This dataset belongs to the following dissertation: Barend Johannes Geyser (2017). The model for the accompaniment of seekers with a Christian background into silence in their quest for wholeness. Radboud UniversityData gathering has taken place by means of phenomenological interviews , observations and making field notes during the interviews, as well as video-stimulated recall. The interview transcripts are written in the South African language.This dataset contains the interview transcripts.The researcher decided to select participants who were starting with their second half of life, thus from 40 to 55 years of age. The participants are all from a Christian background and they were all living in the Northern suburbs of Johannesburg, which means that they are from the socio- economic middle class and upper middle class. There are 3 women and 5 men interviewed. The interviews involve the conscious selection of certain participants. In this instance, the participants are seekers that ask for accompaniment into silence. They are all Christian seekers on a quest for wholeness and investigating the possibility of the practice of silence as an aid in their quest. They all attempted to practice silence in some or other way for at least three years.In addition to the eight interview transcripts, a read me text is added to explain the context of the dataset.
The U.S. Geological Survey (USGS) has generated land surface form classes for the contiguous United States. These land surface form classes were created as part of an effort to map standardized, terrestrial ecosystems for the nation using a classification developed by NatureServe (Comer and others, 2003). Ecosystem distributions were modeled using a biophysical stratification approach developed for South America (Sayre and others, 2008) and now being implemented globally (Sayre and others, 2007). Land surface forms strongly influence the differentiation and distribution of terrestrial ecosystems, and are one of the key input layers in the ecosystem delineation process. The methodology used to produce these land surface form classes was developed by the Missouri Resource Assessment Partnership (MoRAP). MoRAP made modifications to Hammond's (1964a, 1964b) land surface form classification, which allowed the use of 30-meter source data and a 1 km2 window for neighborhood analysis (True 2002, True and others, 2000). While Hammond's methodology was based on three variables, slope, local relief, and profile type, MoRAP's methodology uses only slope and local relief (True 2002). Slope is classified as gently sloping or not gently sloping using a slope threshold of 8%, local relief is classified into five classes (0-15m, 15-30m, 30-90m, 90-150m, and >150m), and eight landform classes (flat plains, smooth plains, irregular plains, escarpments, low hills, hills, breaks, and low mountains) were derived by combining slope class and local relief. The USGS implementation of the MoRAP methodology was executed using the USGS 30-meter National Elevation Dataset (NED) and an existing USGS slope dataset. In this implementation, a new land surface form class, the high mountains/deep canyons class, was identified by using an additional local relief class (> 400m). The drainage channels class was derived independently from the other land surface form classes. This class was derived using two of Andrew Weiss's slope position classes, "valley" and "lower slope" (Weiss 2001, Jenness 2006). The USGS implemented Weiss's algorithm using the 30-meter NED and a 1 km2 neighborhood analysis window. The resultant drainage channel class was combined into the final land surface forms dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Middle Inlet, Wisconsin, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
https://i.neilsberg.com/ch/middle-inlet-wi-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Middle Inlet, Wisconsin (in 2022 inflation-adjusted dollars))">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Middle Inlet town median household income. You can refer the same here