Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Amherst, New York, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Amherst town median household income. You can refer the same here
Facebook
TwitterIncome of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas, annual.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
.pkl or .npz.Some example classes include: - Animals: beaver, dolphin, otter, elephant, snake. - Plants: apple, orange, mushroom, palm tree, pine tree. - Vehicles: bicycle, bus, motorcycle, train, rocket. - Everyday Objects: clock, keyboard, lamp, table, chair.
Facebook
TwitterFor detailed information, visit the Tucson Equity Priority Index StoryMap.Download the Data DictionaryWhat is the Tucson Equity Priority Index (TEPI)?The Tucson Equity Priority Index (TEPI) is a tool that describes the distribution of socially vulnerable demographics. It categorizes the dataset into 5 classes that represent the differing prioritization needs based on the presence of social vulnerability: Low (0-20), Low-Moderate (20-40), Moderate (40-60), Moderate-High (60-80) High (80-100). Each class represents 20% of the dataset’s features in order of their values. The features within the Low (0-20) classification represent the areas that, when compared to all other locations in the study area, have the lowest need for prioritization, as they tend to have less socially vulnerable demographics. The features that fall into the High (80-100) classification represent the 20% of locations in the dataset that have the greatest need for prioritization, as they tend to have the highest proportions of socially vulnerable demographics. How is social vulnerability measured?The Tucson Equity Priority Index (TEPI) examines the proportion of vulnerability per feature using 11 demographic indicators:Income Below Poverty: Households with income at or below the federal poverty level (FPL), which in 2023 was $14,500 for an individual and $30,000 for a family of fourUnemployment: Measured as the percentage of unemployed persons in the civilian labor forceHousing Cost Burdened: Homeowners who spend more than 30% of their income on housing expenses, including mortgage, maintenance, and taxesRenter Cost Burdened: Renters who spend more than 30% of their income on rentNo Health Insurance: Those without private health insurance, Medicare, Medicaid, or any other plan or programNo Vehicle Access: Households without automobile, van, or truck accessHigh School Education or Less: Those highest level of educational attainment is a High School diploma, equivalency, or lessLimited English Ability: Those whose ability to speak English is "Less Than Well."People of Color: Those who identify as anything other than Non-Hispanic White Disability: Households with one or more physical or cognitive disabilities Age: Groups that tend to have higher levels of vulnerability, including children (those below 18), and seniors (those 65 and older)An overall percentile value is calculated for each feature based on the total proportion of the above indicators in each area. How are the variables combined?These indicators are divided into two main categories that we call Thematic Indices: Economic and Personal Characteristics. The two thematic indices are further divided into five sub-indices called Tier-2 Sub-Indices. Each Tier-2 Sub-Index contains 2-3 indicators. Indicators are the datasets used to measure vulnerability within each sub-index. The variables for each feature are re-scaled using the percentile normalization method, which converts them to the same scale using values between 0 to 100. The variables are then combined first into each of the five Tier-2 Sub-Indices, then the Thematic Indices, then the overall TEPI using the mean aggregation method and equal weighting. The resulting dataset is then divided into the five classes, where:High Vulnerability (80-100%): Representing the top classification, this category includes the highest 20% of regions that are the most socially vulnerable. These areas require the most focused attention. Moderate-High Vulnerability (60-80%): This upper-middle classification includes areas with higher levels of vulnerability compared to the median. While not the highest, these areas are more vulnerable than a majority of the dataset and should be considered for targeted interventions. Moderate Vulnerability (40-60%): Representing the middle or median quintile, this category includes areas of average vulnerability. These areas may show a balanced mix of high and low vulnerability. Detailed examination of specific indicators is recommended to understand the nuanced needs of these areas. Low-Moderate Vulnerability (20-40%): Falling into the lower-middle classification, this range includes areas that are less vulnerable than most but may still exhibit certain vulnerable characteristics. These areas typically have a mix of lower and higher indicators, with the lower values predominating. Low Vulnerability (0-20%): This category represents the bottom classification, encompassing the lowest 20% of data points. Areas in this range are the least vulnerable, making them the most resilient compared to all other features in the dataset.
Facebook
Twitterhttps://www.incomebyzipcode.com/terms#TERMShttps://www.incomebyzipcode.com/terms#TERMS
A dataset listing the richest zip codes in West Virginia per the most current US Census data, including information on rank and average income.
Facebook
TwitterThis dataset is real data of 5,000 records collected from a private learning provider. The dataset includes key attributes necessary for exploring patterns, correlations, and insights related to academic performance.
Columns: 01. Student_ID: Unique identifier for each student. 02. First_Name: Student’s first name. 03. Last_Name: Student’s last name. 04. Email: Contact email (can be anonymized). 05. Gender: Male, Female, Other. 06. Age: The age of the student. 07. Department: Student's department (e.g., CS, Engineering, Business). 08. Attendance (%): Attendance percentage (0-100%). 09. Midterm_Score: Midterm exam score (out of 100). 10. Final_Score: Final exam score (out of 100). 11. Assignments_Avg: Average score of all assignments (out of 100). 12. Quizzes_Avg: Average quiz scores (out of 100). 13. Participation_Score: Score based on class participation (0-10). 14. Projects_Score: Project evaluation score (out of 100). 15. Total_Score: Weighted sum of all grades. 16. Grade: Letter grade (A, B, C, D, F). 17. Study_Hours_per_Week: Average study hours per week. 18. Extracurricular_Activities: Whether the student participates in extracurriculars (Yes/No). 19. Internet_Access_at_Home: Does the student have access to the internet at home? (Yes/No). 20. Parent_Education_Level: Highest education level of parents (None, High School, Bachelor's, Master's, PhD). 21. Family_Income_Level: Low, Medium, High. 22. Stress_Level (1-10): Self-reported stress level (1: Low, 10: High). 23. Sleep_Hours_per_Night: Average hours of sleep per night.
The Attendance is not part of the Total_Score or has very minimal weight.
Calculating the weighted sum: Total Score=a⋅Midterm+b⋅Final+c⋅Assignments+d⋅Quizzes+e⋅Participation+f⋅Projects
| Component | Weight (%) |
|---|---|
| Midterm | 15% |
| Final | 25% |
| Assignments Avg | 15% |
| Quizzes Avg | 10% |
| Participation | 5% |
| Projects Score | 30% |
| Total | 100% |
Dataset contains: - Missing values (nulls): in some records (e.g., Attendance, Assignments, or Parent Education Level). - Bias in some Datae (ex: grading e.g., students with high attendance get slightly better grades). - Imbalanced distributions (e.g., some departments having more students).
Note: - The dataset is real, but I included some bias to create a greater challenge for my students. - Some Columns have been masked as the Data owner requested. "Students_Grading_Dataset_Biased.csv" contains the biased Dataset "Students Performance Dataset" Contains the masked dataset
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Winchester, VA, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Winchester median household income. You can refer the same here
Facebook
TwitterThe National Hydrography Dataset Plus High Resolution (NHDplus High Resolution) maps the lakes, ponds, streams, rivers and other surface waters of the United States. Created by the US Geological Survey, NHDPlus High Resolution provides mean annual flow and velocity estimates for rivers and streams. Additional attributes provide connections between features facilitating complicated analyses.For more information on the NHDPlus High Resolution dataset see the User’s Guide for the National Hydrography Dataset Plus (NHDPlus) High Resolution.Dataset SummaryPhenomenon Mapped: Surface waters and related features of the United States and associated territoriesGeographic Extent: The Contiguous United States, Hawaii, portions of Alaska, Puerto Rico, Guam, US Virgin Islands, Northern Marianas Islands, and American SamoaProjection: Web Mercator Auxiliary Sphere Visible Scale: Visible at all scales but layer draws best at scales larger than 1:1,000,000Source: USGSUpdate Frequency: AnnualPublication Date: July 2022This layer was symbolized in the ArcGIS Map Viewer and while the features will draw in the Classic Map Viewer the advanced symbology will not. Prior to publication, the network and non-network flowline feature classes were combined into a single flowline layer. Similarly, the Area and Waterbody feature classes were merged under a single schema.Attribute fields were added to the flowline and waterbody layers to simplify symbology and enhance the layer's pop-ups. Fields added include Pop-up Title, Pop-up Subtitle, Esri Symbology (waterbodies only), and Feature Code Description. All other attributes are from the original dataset. No data values -9999 and -9998 were converted to Null values.What can you do with this layer?Feature layers work throughout the ArcGIS system. Generally your work flow with feature layers will begin in ArcGIS Online or ArcGIS Pro. Below are just a few of the things you can do with a feature service in Online and Pro.ArcGIS OnlineAdd this layer to a map in the map viewer. The layer or a map containing it can be used in an application. Change the layer’s transparency and set its visibility rangeOpen the layer’s attribute table and make selections. Selections made in the map or table are reflected in the other. Center on selection allows you to zoom to features selected in the map or table and show selected records allows you to view the selected records in the table.Apply filters. For example you can set a filter to show larger streams and rivers using the mean annual flow attribute or the stream order attribute.Change the layer’s style and symbologyAdd labels and set their propertiesCustomize the pop-upUse as an input to the ArcGIS Online analysis tools. This layer works well as a reference layer with the trace downstream and watershed tools. The buffer tool can be used to draw protective boundaries around streams and the extract data tool can be used to create copies of portions of the data.ArcGIS ProAdd this layer to a 2d or 3d map.Use as an input to geoprocessing. For example, copy features allows you to select then export portions of the data to a new feature class.Change the symbology and the attribute field used to symbolize the dataOpen table and make interactive selections with the mapModify the pop-upsApply Definition Queries to create sub-sets of the layerThis layer is part of the ArcGIS Living Atlas of the World that provides an easy way to explore the landscape layers and many other beautiful and authoritative maps on hundreds of topics.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Action recognition has received increasing attentions from the computer vision and machine learning community in the last decades. Ever since then, the recognition task has evolved from single view recording under controlled laboratory environment to unconstrained environment (i.e., surveillance environment or user generated videos). Furthermore, recent work focused on other aspect of action recognition problem, such as cross-view classification, cross domain learning, multi-modality learning, and action localization. Despite the large variations of studies, we observed limited works that explore the open-set and open-view classification problem, which is a genuine inherited properties in action recognition problem. In other words, a well designed algorithm should robustly identify an unfamiliar action as “unknown” and achieved similar performance across sensors with similar field of view. The Multi-Camera Action Dataset (MCAD) is designed to evaluate the open-view classification problem under surveillance environment.
In our multi-camera action dataset, different from common action datasets we use a total of five cameras, which can be divided into two types of cameras (StaticandPTZ), to record actions. Particularly, there are three Static cameras (Cam04 & Cam05 & Cam06) with fish eye effect and two PanTilt-Zoom (PTZ) cameras (PTZ04 & PTZ06). Static camera has a resolution of 1280×960 pixels, while PTZ camera has a resolution of 704×576 pixels and a smaller field of view than Static camera. What’s more, we don’t control the illumination environment. We even set two contrasting conditions (Daytime and Nighttime environment) which makes our dataset more challenge than many controlled datasets with strongly controlled illumination environment.The distribution of the cameras is shown in the picture on the right.
We identified 18 units single person daily actions with/without object which are inherited from the KTH, IXMAS, and TRECIVD datasets etc. The list and the definition of actions are shown in the table. These actions can also be divided into 4 types actions. Micro action without object (action ID of 01, 02 ,05) and with object (action ID of 10, 11, 12 ,13). Intense action with object (action ID of 03, 04 ,06, 07, 08, 09) and with object (action ID of 14, 15, 16, 17, 18). We recruited a total of 20 human subjects. Each candidate repeats 8 times (4 times during the day and 4 times in the evening) of each action under one camera. In the recording process, we use five cameras to record each action sample separately. During recording stage we just tell candidates the action name then they could perform the action freely with their own habit, only if they do the action in the field of view of the current camera. This can make our dataset much closer to reality. As a results there is high intra action class variation among different action samples as shown in picture of action samples.
URL: http://mmas.comp.nus.edu.sg/MCAD/MCAD.html
Resources:
How to Cite:
Please cite the following paper if you use the MCAD dataset in your work (papers, articles, reports, books, software, etc):
Facebook
TwitterWith the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.
Facebook
TwitterNASA's Making Earth System Data Records for Use in Research Environments (MEaSUREs) Global Land Cover Mapping and Estimation (GLanCE) annual 30 meter (m) Version 1 data product provides global land cover and land cover change data derived from Landsat 5 Thematic Mapper (TM), Landsat 7 Enhanced Thematic Mapper Plus (ETM+), and Landsat 8 Operational Land Imager (OLI). These maps provide the user community with land cover type, land cover change, metrics characterizing the magnitude and seasonality of greenness of each pixel, and the magnitude of change. GLanCE data products will be provided using a set of seven continental grids that use Lambert Azimuthal Equal Area projections parameterized to minimize distortion for each continent. Currently, North America, South America, Europe, and Oceania are available. This dataset is useful for a wide range of applications, including ecosystem, climate, and hydrologic modeling; monitoring the response of terrestrial ecosystems to climate change; carbon accounting; and land management. The GLanCE data product provides seven layers: the land cover class, the estimated day of year of change, integer identifier for class in previous year, median and amplitude of the Enhanced Vegetation Index (EVI2) in the year, rate of change in EVI2, and the change in EVI2 median from previous year to current year. A low-resolution browse image representing EVI2 amplitude is also available for each granule.Known Issues Version 1.0 of the data set does not include Quality Assurance, Leaf Type or Leaf Phenology. These layers are populated with fill values. These layers will be included in future releases of the data product. * Science Data Set (SDS) values may be missing, or of lower quality, at years when land cover change occurs. This issue is a by-product of the fact that Continuous Change Detection and Classification (CCDC) does not fit models or provide synthetic reflectance values during short periods of time between time segments. * The accuracy of mapping results varies by land cover class and geography. Specifically, distinguishing between shrubs and herbaceous cover is challenging at high latitudes and in arid and semi-arid regions. Hence, the accuracy of shrub cover, herbaceous cover, and to some degree bare cover, is lower than for other classes. * Due to the combined effects of large solar zenith angles, short growing seasons, lower availability of high-resolution imagery to support training data, the representation of land cover at land high latitudes in the GLanCE product is lower than in mid latitudes. * Shadows and large variation in local zenith angles decrease the accuracy of the GLanCE product in regions with complex topography, especially at high latitudes. * Mapping results may include artifacts from variation in data density in overlap zones between Landsat scenes relative to mapping results in non-overlap zones. * Regions with low observation density due to cloud cover, especially in the tropics, and/or poor data density (e.g. Alaska, Siberia, West Africa) have lower map quality. * Artifacts from the Landsat 7 Scan Line Corrector failure are occasionally evident in the GLanCE map product. High proportions of missing data in regions with snow and ice at high elevations result in missing data in the GLanCE SDSs.* The GlanCE data product tends to modestly overpredict developed land cover in arid regions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview This is the largest Gastrointestinal dataset generously provided by Simula Research Laboratory in Norway
You can read their research paper here in Nature
In total, the dataset contains 10,662 labeled images stored using the JPEG format. The images can be found in the images folder. The classes, which each of the images belong to, correspond to the folder they are stored in (e.g., the ’polyp’ folder contains all polyp images, the ’barretts’ folder contains all images of Barrett’s esophagus, etc.). Each class-folder is located in a subfolder describing the type of finding, which again is located in a folder describing wheter it is a lower GI or upper GI finding. The number of images per class are not balanced, which is a general challenge in the medical field due to the fact that some findings occur more often than others. This adds an additional challenge for researchers, since methods applied to the data should also be able to learn from a small amount of training data. The labeled images represent 23 different classes of findings.
The data is collected during real gastro- and colonoscopy examinations at a Hospital in Norway and partly labeled by experienced gastrointestinal endoscopists.
Use Cases
"Artificial intelligence is currently a hot topic in medicine. The fact that medical data is often sparse and hard to obtain due to legal restrictions and lack of medical personnel to perform the cumbersome and tedious labeling of the data, leads to technical limitations. In this respect, we share the Hyper-Kvasir dataset, which is the largest image and video dataset from the gastrointestinal tract available today."
"We have used the labeled data to research the classification and segmentation of GI findings using both computer vision and ML approaches to potentially be used in live and post-analysis of patient examinations. Areas of potential utilization are analysis, classification, segmentation, and retrieval of images and videos with particular findings or particular properties from the computer science area. The labeled data can also be used for teaching and training in medical education. Having expert gastroenterologists providing the ground truths over various findings, HyperKvasir provides a unique and diverse learning set for future clinicians. Moreover, the unlabeled data is well suited for semi-supervised and unsupervised methods, and, if even more ground truth data is needed, the users of the data can use their own local medical experts to provide the needed labels. Finally, the videos can in addition be used to simulate live endoscopies feeding the video into the system like it is captured directly from the endoscopes enable developers to do image classification."
Borgli, H., Thambawita, V., Smedsrud, P.H. et al. HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci Data 7, 283 (2020). https://doi.org/10.1038/s41597-020-00622-y
Using this Dataset
Hyper-Kvasir is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source. This means that in all documents and papers that use or refer to the Hyper-Kvasir dataset or report experimental results based on the dataset, a reference to the related article needs to be added: PREPRINT: https://osf.io/mkzcq/. Additionally, one should provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About Roboflow
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their boilerplate code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset corresponding to the Seismic waveform tomography of the Central and Eastern Mediterranean upper mantle
This dataset belongs to the seismic waveform tomography of the Central and Eastern Mediterranean by Blom, Gokhberg and Fichtner, Solid Earth (Discussions), 2019. Seismic tomography is an inverse problem where the internal elastic structure of the Earth (the upper ~500 km) is determined from seismograms (the vibrations of the Earth as a result of earthquakes, as recorded by seismometers at the Earth's surface). This inverse problem is cast as an optimisation where the misfit between observed and synthetic seismograms is minimised: waveform tomography (often referred to as full waveform inversion or FWI). Synthetic seismograms are produced by simulating the elastic wavefield of earthquakes within the Earth. The optimisation problem is solved by iterative, deterministic, gradient-based inversion. Gradients are computed using the adjoint method, which requires one forward wavefield simulation and one adjoint wavefield simulation per earthquake used in the project.
The inversion was carried out over several frequency bands, starting with the longest periods and including a progressively broader frequency band. Within each frequency band, ~10-20 iterations were carried out, totalling to a hundred iterations. Synthetic seismograms and iteration information are stored for a subset of iterations, notably those where human interaction (i.e. the selection of events / data windows) took place.
Here, we describe:
Contents of this package
How to set up the data package
tar -xf ~/Downloads/LASIF_scripts.tar -C [/path/to/conda/environments]/lasif_ext/lib/python2.7/site-packages/
# make project directory
mkdir CEMed_project_Blometal
cd CEMed_project_Blometal
# extract data tarballs into it
tar -xf ~/Downloads/EMed_full.complete.tar
tar -xf ~/Downloads/EMed_window_kernels.tar
tar -xf ~/Downloads/MODEL_FILES.tar
# make scripts directory and extract scripts into it
mkdir conda_stuff
tar -xf ~/Downloads/SCRIPTS.tar -C conda_stuff
# make data analysis directory
mkdir data_analysis
cd data_analysis
# extract analysis tools
tar -xf ~/Downloads/NPY_FILES.tar
tar -xf ~/Downloads/FIGURE_SCRIPTS.tar
tar -xf ~/Downloads/figs_png.tar
Now the project should be ready for inspection. The following things can be done, for example:
conda activate lasif_ext
cd CEMed_project_Blometal
jupyter notebook
This should open up a browser tab that shows the directory structure. Navigate to data_analysis/FIGURE_scripts and click on one of the .ipynb files to open it. If you press 'Kernel' > 'Restart kernel and run all' at the top, all cells will be launched automatically. This should work out of the box.
References:
Blom, N., Gokhberg, A., and Fichtner, A.: Seismic waveform tomography of the Central and Eastern Mediterranean upper mantle, Solid Earth Discuss., https://doi.org/10.5194/se-2019-152, in review, 2019.
Gokhberg, A., Fichtner, A., 2016. Full-waveform inversion on heterogeneous HPC systems. Comp. & Geosci. 89, 260-268. https://doi.org/10.1016/j.cageo.2015.12.013
Krischer, L., Fichtner, A., Zukauskaitė, S., and Igel, H. (2015), Large‐Scale Seismic Inversion Framework, Seismological Research Letters, 86(4), 1198–1207. doi:10.1785/0220140248
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Doodleverse/Segmentation Zoo/Seg2Map Res-UNet models for DeepGlobe/7-class segmentation of RGB 512x512 high-res. images
These Residual-UNet model data are based on the DeepGlobe dataset
Models have been created using Segmentation Gym* using the following dataset**: https://www.kaggle.com/datasets/balraj98/deepglobe-land-cover-classification-dataset
Image size used by model: 512 x 512 x 3 pixels
classes: 1. urban 2. agricultural 3. rangeland 4. forest 5. water 6. bare 7. unknown
File descriptions
For each model, there are 5 files with the same root name:
'.json' config file: this is the file that was used by Segmentation Gym* to create the weights file. It contains instructions for how to make the model and the data it used, as well as instructions for how to use the model for prediction. It is a handy wee thing and mastering it means mastering the entire Doodleverse.
'.h5' weights file: this is the file that was created by the Segmentation Gym* function train_model.py. It contains the trained model's parameter weights. It can called by the Segmentation Gym* function seg_images_in_folder.py. Models may be ensembled.
'_modelcard.json' model card file: this is a json file containing fields that collectively describe the model origins, training choices, and dataset that the model is based upon. There is some redundancy between this file and the config file (described above) that contains the instructions for the model training and implementation. The model card file is not used by the program but is important metadata so it is important to keep with the other files that collectively make the model and is such is considered part of the model
'_model_history.npz' model training history file: this numpy archive file contains numpy arrays describing the training and validation losses and metrics. It is created by the Segmentation Gym function train_model.py
'.png' model training loss and mean IoU plot: this png file contains plots of training and validation losses and mean IoU scores during model training. A subset of data inside the .npz file. It is created by the Segmentation Gym function train_model.py
Additionally, BEST_MODEL.txt contains the name of the model with the best validation loss and mean IoU
References *Segmentation Gym: Buscombe, D., & Goldstein, E. B. (2022). A reproducible and reusable pipeline for segmentation of geoscientific imagery. Earth and Space Science, 9, e2022EA002332. https://doi.org/10.1029/2022EA002332 See: https://github.com/Doodleverse/segmentation_gym
**Demir, I., Koperski, K., Lindenbaum, D., Pang, G., Huang, J., Basu, S., Hughes, F., Tuia, D. and Raskar, R., 2018. Deepglobe 2018: A challenge to parse the earth through satellite images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 172-181).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This STN-PLAD (STN Power Line Assets Dataset) dataset was generated to aid power line companies in detecting high-voltage power line towers in order to avoid putting workers at risk. The dataset contains 5 different annotated object classes such as Stockbridge damper, spacer, transmission tower, tower plate, and insulator. Each object class has an average of 18.1 annotated instances in total images of 133 power lines. The images were taken from various angles, resolutions, backgrounds, and illumination. All the images were taken through a hi-res UAV. This dataset can make use of popular deep object detection methods.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Goldenhar Syndrome Craniofacial Image Dataset (Goldenhar-CFID) is a high-resolution dataset designed for the automated detection and classification of craniofacial abnormalities associated with Goldenhar Syndrome (GS). It comprises 4,483 images, categorized into seven distinct classes of craniofacial deformities. This dataset serves as a valuable resource for researchers in medical image analysis, deep learning, and clinical decision-making. Dataset Characteristics: Total Images: 4,483 Number of Classes: 7 Image Format: JPG Image Resolution: 640 x 640 pixels Annotation: Each image is manually labeled and verified by medical experts Data Preprocessing: Auto-orientation and histogram equalization applied for enhanced feature detection Augmentation Techniques: Rotation, scaling, brightness adjustments, flipping, and contrast modifications Categories and Annotations The dataset includes images categorized into seven craniofacial deformities: Cleft Lip and Palate – Congenital anomaly where the upper lip and/or palate fails to develop properly. Epibulbar Dermoid Tumor – Benign growth on the eye’s surface, typically at the cornea-sclera junction. Eyelid Coloboma – Defect characterized by a partial or complete absence of eyelid tissue. Facial Asymmetry – Uneven development of facial structures. Malocclusion – Misalignment of the teeth and jaws. Microtia – Underdeveloped or absent outer ear. Vertebral Abnormality – Irregular development of spinal vertebrae. Dataset Structure and Splitting The dataset consists of four main subdirectories: Original – Contains 547 raw images. Unaugmented Balanced – Contains 210 images per class. Augmented Unbalanced – Includes 4,483 images with augmentation. Augmented Balanced – Contains 756 images per class. The dataset is split into: Training Set: 80% Validation Set: 10% Test Set: 10%
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Malaysia, particularly Pahang, experiences devastating floods annually, causing significant damage. The objective of the research was to create a flood susceptibility map for the designated area by employing an Ensemble Machine Learning (EML) algorithm based on geographic information system (GIS). By analyzing nine key factors from a geospatial database, flood susceptibility map was created with the ArcGIS software (ESRI ArcGIS Pro v3.0.1 x64). The Random Forest (RF) model was employed in this study to categorize the study area into distinct flood susceptibility classes. The Feature selection (FS) method was used to ranking the flood influencing factors. To validate the flood susceptibility models, standard statistical measures and the Area Under the Curve (AUC) were employed. The FS ranking demonstrated that the primary attributes to flooding in the study region are rainfall and elevation, with slope, geology, curvature, flow accumulation, flow direction, distance from the river, and land use/land cover (LULC) patterns ranking subsequently. The categories of ’very high’ and ’high’ class collectively made up 37.1% and 26.3% of the total area, respectively. The flood vulnerability assessment of Pahang found that the Eastern, Southern, and central regions were at high risk of flooding due to intense precipitation, low-lying topography with steep inclines, proximity to the shoreline and rivers, and abundant flooded vegetation, crops, urban areas, bare ground, and rangeland. Conversely, areas with dense tree canopies or forests were less susceptible to flooding in this research area. The ROC analysis demonstrated strong performance on the validation datasets, with an AUC value of >0.73 and accuracy scores exceeding 0.71. Research on flood susceptibility mapping can enhance risk reduction strategies and improve flood management in vulnerable areas. Technological advancements and expertise provide opportunities for more sophisticated methods, leading to better prepared and resilient communities.
Facebook
Twitterhttps://doi.org/10.17026/fp39-0x58https://doi.org/10.17026/fp39-0x58
This dataset belongs to the following dissertation: Barend Johannes Geyser (2017). The model for the accompaniment of seekers with a Christian background into silence in their quest for wholeness. Radboud UniversityData gathering has taken place by means of phenomenological interviews , observations and making field notes during the interviews, as well as video-stimulated recall. The interview transcripts are written in the South African language.This dataset contains the interview transcripts.The researcher decided to select participants who were starting with their second half of life, thus from 40 to 55 years of age. The participants are all from a Christian background and they were all living in the Northern suburbs of Johannesburg, which means that they are from the socio- economic middle class and upper middle class. There are 3 women and 5 men interviewed. The interviews involve the conscious selection of certain participants. In this instance, the participants are seekers that ask for accompaniment into silence. They are all Christian seekers on a quest for wholeness and investigating the possibility of the practice of silence as an aid in their quest. They all attempted to practice silence in some or other way for at least three years.In addition to the eight interview transcripts, a read me text is added to explain the context of the dataset.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the English Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.
This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.
Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.
Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:
The Image Captioning Dataset serves various applications across different domains:
Facebook
Twitterhttps://www.incomebyzipcode.com/terms#TERMShttps://www.incomebyzipcode.com/terms#TERMS
A dataset listing the richest zip codes in New Jersey per the most current US Census data, including information on rank and average income.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Amherst, New York, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Amherst town median household income. You can refer the same here