60 datasets found
  1. N

    Income Distribution by Quintile: Mean Household Income in Amherst, New York...

    • neilsberg.com
    csv, json
    Updated Mar 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Income Distribution by Quintile: Mean Household Income in Amherst, New York // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/amherst-ny-median-household-income/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Mar 3, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Amherst, New York
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Amherst, New York, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 18,852, while the mean income for the highest quintile (20% of households with the highest income) is 296,153. This indicates that the top earners earn 16 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 495,426, which is 167.29% higher compared to the highest quintile, and 2627.98% higher compared to the lowest quintile.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2023 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Amherst town median household income. You can refer the same here

  2. Income of individuals by age group, sex and income source, Canada, provinces...

    • www150.statcan.gc.ca
    • open.canada.ca
    • +1more
    Updated May 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2025). Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas [Dataset]. http://doi.org/10.25318/1110023901-eng
    Explore at:
    Dataset updated
    May 1, 2025
    Dataset provided by
    Government of Canadahttp://www.gg.ca/
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas, annual.

  3. cifar-100-python

    • kaggle.com
    zip
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ThanhTan (2024). cifar-100-python [Dataset]. https://www.kaggle.com/datasets/duongthanhtan/cifar-100-python
    Explore at:
    zip(168517675 bytes)Available download formats
    Dataset updated
    Dec 26, 2024
    Authors
    ThanhTan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    CIFAR-100 Dataset

    1. Overview

    • CIFAR-100 is an extension of the CIFAR-10 dataset, with more classes and finer-grained categorization.
    • It contains 100 classes, making it more challenging than CIFAR-10, which has only 10 classes.
    • Each image in CIFAR-100 is labeled with both a fine label (specific category) and a coarse label (broader category, such as animals or vehicles).

    2. Dataset Details

    • Number of Images: 60,000 color images in total.
      • 50,000 for training.
      • 10,000 for testing.
    • Image Size: Each image is a small 32x32 pixel RGB (color) image.
    • Classes: 100 classes, grouped into 20 superclasses.
      • Each superclass contains 5 related classes.

    3. Fine and Coarse Labels

    • Fine Labels: The dataset has specific categories, such as 'apple', 'bicycle', 'rose', etc.
    • Coarse Labels: These are broader categories, like 'fruit', 'flower', 'vehicle', etc.

    4. Applications

    • Image Classification: Used for training models to classify images into their respective categories.
    • Feature Extraction: Useful for benchmarking feature extraction techniques in computer vision.
    • Transfer Learning: Often used to pre-train models for other similar tasks.
    • Deep Learning Research: Commonly used to test architectures like CNNs (Convolutional Neural Networks).

    5. Challenges

    • The images are very small (32x32 pixels), making it harder for models to learn intricate details.
    • High class count (100) increases classification complexity.
    • Intra-class variability and inter-class similarity make it a challenging dataset for classification.

    6. File Format

    • The dataset is usually available in Python-friendly formats like .pkl or .npz.
    • It can also be downloaded and loaded using frameworks like TensorFlow or PyTorch.

    7. Example Classes

    Some example classes include: - Animals: beaver, dolphin, otter, elephant, snake. - Plants: apple, orange, mushroom, palm tree, pine tree. - Vehicles: bicycle, bus, motorcycle, train, rocket. - Everyday Objects: clock, keyboard, lamp, table, chair.

  4. t

    Tucson Equity Priority Index (TEPI): Ward 1 Census Block Groups

    • teds.tucsonaz.gov
    Updated Feb 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tucson (2025). Tucson Equity Priority Index (TEPI): Ward 1 Census Block Groups [Dataset]. https://teds.tucsonaz.gov/datasets/tucson-equity-priority-index-tepi-ward-1-census-block-groups/explore
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset authored and provided by
    City of Tucson
    Area covered
    Description

    For detailed information, visit the Tucson Equity Priority Index StoryMap.Download the Data DictionaryWhat is the Tucson Equity Priority Index (TEPI)?The Tucson Equity Priority Index (TEPI) is a tool that describes the distribution of socially vulnerable demographics. It categorizes the dataset into 5 classes that represent the differing prioritization needs based on the presence of social vulnerability: Low (0-20), Low-Moderate (20-40), Moderate (40-60), Moderate-High (60-80) High (80-100). Each class represents 20% of the dataset’s features in order of their values. The features within the Low (0-20) classification represent the areas that, when compared to all other locations in the study area, have the lowest need for prioritization, as they tend to have less socially vulnerable demographics. The features that fall into the High (80-100) classification represent the 20% of locations in the dataset that have the greatest need for prioritization, as they tend to have the highest proportions of socially vulnerable demographics. How is social vulnerability measured?The Tucson Equity Priority Index (TEPI) examines the proportion of vulnerability per feature using 11 demographic indicators:Income Below Poverty: Households with income at or below the federal poverty level (FPL), which in 2023 was $14,500 for an individual and $30,000 for a family of fourUnemployment: Measured as the percentage of unemployed persons in the civilian labor forceHousing Cost Burdened: Homeowners who spend more than 30% of their income on housing expenses, including mortgage, maintenance, and taxesRenter Cost Burdened: Renters who spend more than 30% of their income on rentNo Health Insurance: Those without private health insurance, Medicare, Medicaid, or any other plan or programNo Vehicle Access: Households without automobile, van, or truck accessHigh School Education or Less: Those highest level of educational attainment is a High School diploma, equivalency, or lessLimited English Ability: Those whose ability to speak English is "Less Than Well."People of Color: Those who identify as anything other than Non-Hispanic White Disability: Households with one or more physical or cognitive disabilities Age: Groups that tend to have higher levels of vulnerability, including children (those below 18), and seniors (those 65 and older)An overall percentile value is calculated for each feature based on the total proportion of the above indicators in each area. How are the variables combined?These indicators are divided into two main categories that we call Thematic Indices: Economic and Personal Characteristics. The two thematic indices are further divided into five sub-indices called Tier-2 Sub-Indices. Each Tier-2 Sub-Index contains 2-3 indicators. Indicators are the datasets used to measure vulnerability within each sub-index. The variables for each feature are re-scaled using the percentile normalization method, which converts them to the same scale using values between 0 to 100. The variables are then combined first into each of the five Tier-2 Sub-Indices, then the Thematic Indices, then the overall TEPI using the mean aggregation method and equal weighting. The resulting dataset is then divided into the five classes, where:High Vulnerability (80-100%): Representing the top classification, this category includes the highest 20% of regions that are the most socially vulnerable. These areas require the most focused attention. Moderate-High Vulnerability (60-80%): This upper-middle classification includes areas with higher levels of vulnerability compared to the median. While not the highest, these areas are more vulnerable than a majority of the dataset and should be considered for targeted interventions. Moderate Vulnerability (40-60%): Representing the middle or median quintile, this category includes areas of average vulnerability. These areas may show a balanced mix of high and low vulnerability. Detailed examination of specific indicators is recommended to understand the nuanced needs of these areas. Low-Moderate Vulnerability (20-40%): Falling into the lower-middle classification, this range includes areas that are less vulnerable than most but may still exhibit certain vulnerable characteristics. These areas typically have a mix of lower and higher indicators, with the lower values predominating. Low Vulnerability (0-20%): This category represents the bottom classification, encompassing the lowest 20% of data points. Areas in this range are the least vulnerable, making them the most resilient compared to all other features in the dataset.

  5. i

    Richest Zip Codes in West Virginia

    • incomebyzipcode.com
    Updated Dec 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cubit Planning, Inc. (2024). Richest Zip Codes in West Virginia [Dataset]. https://www.incomebyzipcode.com/westvirginia
    Explore at:
    Dataset updated
    Dec 18, 2024
    Dataset authored and provided by
    Cubit Planning, Inc.
    License

    https://www.incomebyzipcode.com/terms#TERMShttps://www.incomebyzipcode.com/terms#TERMS

    Area covered
    West Virginia
    Description

    A dataset listing the richest zip codes in West Virginia per the most current US Census data, including information on rank and average income.

  6. Student Performance & Behavior Dataset

    • kaggle.com
    zip
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmoud Elhemaly (2025). Student Performance & Behavior Dataset [Dataset]. https://www.kaggle.com/datasets/mahmoudelhemaly/students-grading-dataset
    Explore at:
    zip(1020509 bytes)Available download formats
    Dataset updated
    May 28, 2025
    Authors
    Mahmoud Elhemaly
    Description

    Student Performance & Behavior Dataset

    This dataset is real data of 5,000 records collected from a private learning provider. The dataset includes key attributes necessary for exploring patterns, correlations, and insights related to academic performance.

    Columns: 01. Student_ID: Unique identifier for each student. 02. First_Name: Student’s first name. 03. Last_Name: Student’s last name. 04. Email: Contact email (can be anonymized). 05. Gender: Male, Female, Other. 06. Age: The age of the student. 07. Department: Student's department (e.g., CS, Engineering, Business). 08. Attendance (%): Attendance percentage (0-100%). 09. Midterm_Score: Midterm exam score (out of 100). 10. Final_Score: Final exam score (out of 100). 11. Assignments_Avg: Average score of all assignments (out of 100). 12. Quizzes_Avg: Average quiz scores (out of 100). 13. Participation_Score: Score based on class participation (0-10). 14. Projects_Score: Project evaluation score (out of 100). 15. Total_Score: Weighted sum of all grades. 16. Grade: Letter grade (A, B, C, D, F). 17. Study_Hours_per_Week: Average study hours per week. 18. Extracurricular_Activities: Whether the student participates in extracurriculars (Yes/No). 19. Internet_Access_at_Home: Does the student have access to the internet at home? (Yes/No). 20. Parent_Education_Level: Highest education level of parents (None, High School, Bachelor's, Master's, PhD). 21. Family_Income_Level: Low, Medium, High. 22. Stress_Level (1-10): Self-reported stress level (1: Low, 10: High). 23. Sleep_Hours_per_Night: Average hours of sleep per night.

    The Attendance is not part of the Total_Score or has very minimal weight.

    Calculating the weighted sum: Total Score=a⋅Midterm+b⋅Final+c⋅Assignments+d⋅Quizzes+e⋅Participation+f⋅Projects

    ComponentWeight (%)
    Midterm15%
    Final25%
    Assignments Avg15%
    Quizzes Avg10%
    Participation5%
    Projects Score30%
    Total100%

    Dataset contains: - Missing values (nulls): in some records (e.g., Attendance, Assignments, or Parent Education Level). - Bias in some Datae (ex: grading e.g., students with high attendance get slightly better grades). - Imbalanced distributions (e.g., some departments having more students).

    Note: - The dataset is real, but I included some bias to create a greater challenge for my students. - Some Columns have been masked as the Data owner requested. "Students_Grading_Dataset_Biased.csv" contains the biased Dataset "Students Performance Dataset" Contains the masked dataset

  7. N

    Income Distribution by Quintile: Mean Household Income in Winchester, VA //...

    • neilsberg.com
    csv, json
    Updated Mar 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Income Distribution by Quintile: Mean Household Income in Winchester, VA // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/winchester-va-median-household-income/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Mar 3, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Winchester, Virginia
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Winchester, VA, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 14,125, while the mean income for the highest quintile (20% of households with the highest income) is 215,015. This indicates that the top earners earn 15 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 344,621, which is 160.28% higher compared to the highest quintile, and 2439.79% higher compared to the lowest quintile.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2023 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Winchester median household income. You can refer the same here

  8. National Hydrography Dataset Plus High Resolution

    • sal-urichmond.hub.arcgis.com
    • oregonwaterdata.org
    • +1more
    Updated Mar 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2023). National Hydrography Dataset Plus High Resolution [Dataset]. https://sal-urichmond.hub.arcgis.com/maps/f1f45a3ba37a4f03a5f48d7454e4b654
    Explore at:
    Dataset updated
    Mar 16, 2023
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    Description

    The National Hydrography Dataset Plus High Resolution (NHDplus High Resolution) maps the lakes, ponds, streams, rivers and other surface waters of the United States. Created by the US Geological Survey, NHDPlus High Resolution provides mean annual flow and velocity estimates for rivers and streams. Additional attributes provide connections between features facilitating complicated analyses.For more information on the NHDPlus High Resolution dataset see the User’s Guide for the National Hydrography Dataset Plus (NHDPlus) High Resolution.Dataset SummaryPhenomenon Mapped: Surface waters and related features of the United States and associated territoriesGeographic Extent: The Contiguous United States, Hawaii, portions of Alaska, Puerto Rico, Guam, US Virgin Islands, Northern Marianas Islands, and American SamoaProjection: Web Mercator Auxiliary Sphere Visible Scale: Visible at all scales but layer draws best at scales larger than 1:1,000,000Source: USGSUpdate Frequency: AnnualPublication Date: July 2022This layer was symbolized in the ArcGIS Map Viewer and while the features will draw in the Classic Map Viewer the advanced symbology will not. Prior to publication, the network and non-network flowline feature classes were combined into a single flowline layer. Similarly, the Area and Waterbody feature classes were merged under a single schema.Attribute fields were added to the flowline and waterbody layers to simplify symbology and enhance the layer's pop-ups. Fields added include Pop-up Title, Pop-up Subtitle, Esri Symbology (waterbodies only), and Feature Code Description. All other attributes are from the original dataset. No data values -9999 and -9998 were converted to Null values.What can you do with this layer?Feature layers work throughout the ArcGIS system. Generally your work flow with feature layers will begin in ArcGIS Online or ArcGIS Pro. Below are just a few of the things you can do with a feature service in Online and Pro.ArcGIS OnlineAdd this layer to a map in the map viewer. The layer or a map containing it can be used in an application. Change the layer’s transparency and set its visibility rangeOpen the layer’s attribute table and make selections. Selections made in the map or table are reflected in the other. Center on selection allows you to zoom to features selected in the map or table and show selected records allows you to view the selected records in the table.Apply filters. For example you can set a filter to show larger streams and rivers using the mean annual flow attribute or the stream order attribute.Change the layer’s style and symbologyAdd labels and set their propertiesCustomize the pop-upUse as an input to the ArcGIS Online analysis tools. This layer works well as a reference layer with the trace downstream and watershed tools. The buffer tool can be used to draw protective boundaries around streams and the extract data tool can be used to create copies of portions of the data.ArcGIS ProAdd this layer to a 2d or 3d map.Use as an input to geoprocessing. For example, copy features allows you to select then export portions of the data to a new feature class.Change the symbology and the attribute field used to symbolize the dataOpen table and make interactive selections with the mapModify the pop-upsApply Definition Queries to create sub-sets of the layerThis layer is part of the ArcGIS Living Atlas of the World that provides an easy way to explore the landscape layers and many other beautiful and authoritative maps on hundreds of topics.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.

  9. Data from: Multi-Camera Action Dataset (MCAD)

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip +2
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenhui Li; Yongkang Wong; An-An Liu; Yang Li; Yu-Ting Su; Mohan Kankanhalli; Wenhui Li; Yongkang Wong; An-An Liu; Yang Li; Yu-Ting Su; Mohan Kankanhalli (2020). Multi-Camera Action Dataset (MCAD) [Dataset]. http://doi.org/10.5281/zenodo.884592
    Explore at:
    application/gzip, json, txtAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wenhui Li; Yongkang Wong; An-An Liu; Yang Li; Yu-Ting Su; Mohan Kankanhalli; Wenhui Li; Yongkang Wong; An-An Liu; Yang Li; Yu-Ting Su; Mohan Kankanhalli
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Action recognition has received increasing attentions from the computer vision and machine learning community in the last decades. Ever since then, the recognition task has evolved from single view recording under controlled laboratory environment to unconstrained environment (i.e., surveillance environment or user generated videos). Furthermore, recent work focused on other aspect of action recognition problem, such as cross-view classification, cross domain learning, multi-modality learning, and action localization. Despite the large variations of studies, we observed limited works that explore the open-set and open-view classification problem, which is a genuine inherited properties in action recognition problem. In other words, a well designed algorithm should robustly identify an unfamiliar action as “unknown” and achieved similar performance across sensors with similar field of view. The Multi-Camera Action Dataset (MCAD) is designed to evaluate the open-view classification problem under surveillance environment.

    In our multi-camera action dataset, different from common action datasets we use a total of five cameras, which can be divided into two types of cameras (StaticandPTZ), to record actions. Particularly, there are three Static cameras (Cam04 & Cam05 & Cam06) with fish eye effect and two PanTilt-Zoom (PTZ) cameras (PTZ04 & PTZ06). Static camera has a resolution of 1280×960 pixels, while PTZ camera has a resolution of 704×576 pixels and a smaller field of view than Static camera. What’s more, we don’t control the illumination environment. We even set two contrasting conditions (Daytime and Nighttime environment) which makes our dataset more challenge than many controlled datasets with strongly controlled illumination environment.The distribution of the cameras is shown in the picture on the right.

    We identified 18 units single person daily actions with/without object which are inherited from the KTH, IXMAS, and TRECIVD datasets etc. The list and the definition of actions are shown in the table. These actions can also be divided into 4 types actions. Micro action without object (action ID of 01, 02 ,05) and with object (action ID of 10, 11, 12 ,13). Intense action with object (action ID of 03, 04 ,06, 07, 08, 09) and with object (action ID of 14, 15, 16, 17, 18). We recruited a total of 20 human subjects. Each candidate repeats 8 times (4 times during the day and 4 times in the evening) of each action under one camera. In the recording process, we use five cameras to record each action sample separately. During recording stage we just tell candidates the action name then they could perform the action freely with their own habit, only if they do the action in the field of view of the current camera. This can make our dataset much closer to reality. As a results there is high intra action class variation among different action samples as shown in picture of action samples.

    URL: http://mmas.comp.nus.edu.sg/MCAD/MCAD.html

    Resources:

    • IDXXXX.mp4.tar.gz contains video data for each individual
    • boundingbox.tar.gz contains person bounding box for all videos
    • protocol.json contains the evaluation protocol
    • img_list.txt contains the download URLs for the images version of the video data
    • idt_list.txt contians the download URLs for the improved Dense Trajectory feature
    • stip_list.txt contians the download URLs for the STIP feature

    How to Cite:

    Please cite the following paper if you use the MCAD dataset in your work (papers, articles, reports, books, software, etc):

    • Wenhui Liu, Yongkang Wong, An-An Liu, Yang Li, Yu-Ting Su, Mohan Kankanhalli
      Multi-Camera Action Dataset for Cross-Camera Action Recognition Benchmarking
      IEEE Winter Conference on Applications of Computer Vision (WACV), 2017.
      http://doi.org/10.1109/WACV.2017.28
  10. d

    Pseudo-Label Generation for Multi-Label Text Classification

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Pseudo-Label Generation for Multi-Label Text Classification [Dataset]. https://catalog.data.gov/dataset/pseudo-label-generation-for-multi-label-text-classification
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.

  11. Global Land Cover Mapping and Estimation Yearly 30 m V001 - Dataset - NASA...

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Global Land Cover Mapping and Estimation Yearly 30 m V001 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/global-land-cover-mapping-and-estimation-yearly-30-m-v001-6db80
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    NASA's Making Earth System Data Records for Use in Research Environments (MEaSUREs) Global Land Cover Mapping and Estimation (GLanCE) annual 30 meter (m) Version 1 data product provides global land cover and land cover change data derived from Landsat 5 Thematic Mapper (TM), Landsat 7 Enhanced Thematic Mapper Plus (ETM+), and Landsat 8 Operational Land Imager (OLI). These maps provide the user community with land cover type, land cover change, metrics characterizing the magnitude and seasonality of greenness of each pixel, and the magnitude of change. GLanCE data products will be provided using a set of seven continental grids that use Lambert Azimuthal Equal Area projections parameterized to minimize distortion for each continent. Currently, North America, South America, Europe, and Oceania are available. This dataset is useful for a wide range of applications, including ecosystem, climate, and hydrologic modeling; monitoring the response of terrestrial ecosystems to climate change; carbon accounting; and land management. The GLanCE data product provides seven layers: the land cover class, the estimated day of year of change, integer identifier for class in previous year, median and amplitude of the Enhanced Vegetation Index (EVI2) in the year, rate of change in EVI2, and the change in EVI2 median from previous year to current year. A low-resolution browse image representing EVI2 amplitude is also available for each granule.Known Issues Version 1.0 of the data set does not include Quality Assurance, Leaf Type or Leaf Phenology. These layers are populated with fill values. These layers will be included in future releases of the data product. * Science Data Set (SDS) values may be missing, or of lower quality, at years when land cover change occurs. This issue is a by-product of the fact that Continuous Change Detection and Classification (CCDC) does not fit models or provide synthetic reflectance values during short periods of time between time segments. * The accuracy of mapping results varies by land cover class and geography. Specifically, distinguishing between shrubs and herbaceous cover is challenging at high latitudes and in arid and semi-arid regions. Hence, the accuracy of shrub cover, herbaceous cover, and to some degree bare cover, is lower than for other classes. * Due to the combined effects of large solar zenith angles, short growing seasons, lower availability of high-resolution imagery to support training data, the representation of land cover at land high latitudes in the GLanCE product is lower than in mid latitudes. * Shadows and large variation in local zenith angles decrease the accuracy of the GLanCE product in regions with complex topography, especially at high latitudes. * Mapping results may include artifacts from variation in data density in overlap zones between Landsat scenes relative to mapping results in non-overlap zones. * Regions with low observation density due to cloud cover, especially in the tropics, and/or poor data density (e.g. Alaska, Siberia, West Africa) have lower map quality. * Artifacts from the Landsat 7 Scan Line Corrector failure are occasionally evident in the GLanCE map product. High proportions of missing data in regions with snow and ice at high elevations result in missing data in the GLanCE SDSs.* The GlanCE data product tends to modestly overpredict developed land cover in arid regions.

  12. R

    Hyper Kvasir Dataset

    • universe.roboflow.com
    zip
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simula (2025). Hyper Kvasir Dataset [Dataset]. https://universe.roboflow.com/simula/hyper-kvasir/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 7, 2025
    Dataset authored and provided by
    Simula
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    GI Tract
    Description

    Overview This is the largest Gastrointestinal dataset generously provided by Simula Research Laboratory in Norway

    You can read their research paper here in Nature

    In total, the dataset contains 10,662 labeled images stored using the JPEG format. The images can be found in the images folder. The classes, which each of the images belong to, correspond to the folder they are stored in (e.g., the ’polyp’ folder contains all polyp images, the ’barretts’ folder contains all images of Barrett’s esophagus, etc.). Each class-folder is located in a subfolder describing the type of finding, which again is located in a folder describing wheter it is a lower GI or upper GI finding. The number of images per class are not balanced, which is a general challenge in the medical field due to the fact that some findings occur more often than others. This adds an additional challenge for researchers, since methods applied to the data should also be able to learn from a small amount of training data. The labeled images represent 23 different classes of findings.

    The data is collected during real gastro- and colonoscopy examinations at a Hospital in Norway and partly labeled by experienced gastrointestinal endoscopists.

    Use Cases

    "Artificial intelligence is currently a hot topic in medicine. The fact that medical data is often sparse and hard to obtain due to legal restrictions and lack of medical personnel to perform the cumbersome and tedious labeling of the data, leads to technical limitations. In this respect, we share the Hyper-Kvasir dataset, which is the largest image and video dataset from the gastrointestinal tract available today."

    "We have used the labeled data to research the classification and segmentation of GI findings using both computer vision and ML approaches to potentially be used in live and post-analysis of patient examinations. Areas of potential utilization are analysis, classification, segmentation, and retrieval of images and videos with particular findings or particular properties from the computer science area. The labeled data can also be used for teaching and training in medical education. Having expert gastroenterologists providing the ground truths over various findings, HyperKvasir provides a unique and diverse learning set for future clinicians. Moreover, the unlabeled data is well suited for semi-supervised and unsupervised methods, and, if even more ground truth data is needed, the users of the data can use their own local medical experts to provide the needed labels. Finally, the videos can in addition be used to simulate live endoscopies feeding the video into the system like it is captured directly from the endoscopes enable developers to do image classification."

    Borgli, H., Thambawita, V., Smedsrud, P.H. et al. HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci Data 7, 283 (2020). https://doi.org/10.1038/s41597-020-00622-y

    Using this Dataset

    Hyper-Kvasir is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source. This means that in all documents and papers that use or refer to the Hyper-Kvasir dataset or report experimental results based on the dataset, a reference to the related article needs to be added: PREPRINT: https://osf.io/mkzcq/. Additionally, one should provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

    About Roboflow

    Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

    Developers reduce 50% of their boilerplate code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

  13. Dataset for Seismic waveform tomography of the Central and Eastern...

    • zenodo.org
    • data.niaid.nih.gov
    bin, tar, zip
    Updated Mar 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nienke Alexandra Blom; Nienke Alexandra Blom; Alexey Gokhberg; Andreas Fichtner; Andreas Fichtner; Alexey Gokhberg (2020). Dataset for Seismic waveform tomography of the Central and Eastern Mediterranean upper mantle [Dataset]. http://doi.org/10.5281/zenodo.3538039
    Explore at:
    tar, zip, binAvailable download formats
    Dataset updated
    Mar 30, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nienke Alexandra Blom; Nienke Alexandra Blom; Alexey Gokhberg; Andreas Fichtner; Andreas Fichtner; Alexey Gokhberg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset corresponding to the Seismic waveform tomography of the Central and Eastern Mediterranean upper mantle

    This dataset belongs to the seismic waveform tomography of the Central and Eastern Mediterranean by Blom, Gokhberg and Fichtner, Solid Earth (Discussions), 2019. Seismic tomography is an inverse problem where the internal elastic structure of the Earth (the upper ~500 km) is determined from seismograms (the vibrations of the Earth as a result of earthquakes, as recorded by seismometers at the Earth's surface). This inverse problem is cast as an optimisation where the misfit between observed and synthetic seismograms is minimised: waveform tomography (often referred to as full waveform inversion or FWI). Synthetic seismograms are produced by simulating the elastic wavefield of earthquakes within the Earth. The optimisation problem is solved by iterative, deterministic, gradient-based inversion. Gradients are computed using the adjoint method, which requires one forward wavefield simulation and one adjoint wavefield simulation per earthquake used in the project.

    The inversion was carried out over several frequency bands, starting with the longest periods and including a progressively broader frequency band. Within each frequency band, ~10-20 iterations were carried out, totalling to a hundred iterations. Synthetic seismograms and iteration information are stored for a subset of iterations, notably those where human interaction (i.e. the selection of events / data windows) took place.

    Here, we describe:

    • The contents of this package
    • How to set up the package such that all the data can be accessed and used, and reproduce the figures.

    Contents of this package

    • Data that was used for the seismic waveform inversion: raw and processed seismograms, station information, earthquake information, as well as the window selection (designating the parts of the data that were actually used at each stage in the inversion) and synthetic seismograms produced during various stages of the inversion. This information is gathered in the LASIF project "EMed_full.complete.tar".
    • Models and misfit development across the iterations, as well as models relating to model testing, as carried out after the inversion. This information is gathered in the tarball "MODEL_FILES.tar". Model files are both given in the ses3d ascii format (text file drho, dvsv, dvsh, dvp and block_x, block_y, block_z) and in bundled .vtu format. Conversion to .vtu was done using the tools in SCRIPTS. These vtu files can be viewed using Paraview.
    • information on the tools and code that was used to do the inversion:
      • ses3d: a seismic wave propagation spectral element code in spherical coordinates. This will run both forward and adjoint simulations. This is available publicly through the developers on https://cos.ethz.ch/software/production/ses3d.html. See Gokhberg & Fichtner, 2016.
      • LASIF: a waveform inversion workflow managing package, where we have made small adaptations to make it suitable for our workflow. The original package is available via www.lasif.net and on github (see Krischer et al, 2015), the modified version is added to this package as 'LASIF-master.zip'.
      • LASIF_scripts: bespoke scripts in order to interact with the LASIF project and generate different types of analyses and plots that are used in the publication. This is included in the tarball 'LASIF_scripts.tar'
      • SCRIPTS: containing some modified tools that were originally written for ses3d, as well as some additional tools - notably to interact with models converted to the VTK format. This is included in the tarball 'SCRIPTS.tar'
      • A description of the conda environment named lasif_ext (which is used for all the data analysis), in the form of the yml file 'lasif_ext.yml'
    • An additional LASIF project which is used just to compute sensitivity kernels for different windows within the same trace: 'EMed_window_kernels.tar'. This is used as an example in one of the manuscript figures.

    How to set up the data package

    1. Download the entire data package. We will assume it is located in `~/Downloads/`.
    2. Get miniconda or anaconda if you don't have it.
    3. Install LASIF. This can be done using the instructions from the LASIF website, but with a few adaptations, which are detailed in the lasif_ext.yml file. This amounts to the following:
      1. Add the channel conda-forge to your standard channels
      2. Name the environment "lasif_ext"
      3. Manually replace the files in the LASIF source directory with those in LASIF-master.zip.
      4. Install the specific version of pyqt=4.11.
      5. Install the additional packages jupyter, vtk=7.0.0, pandas=0.23.4 (these are the ones that work for me).
    4. Extract the LASIF_scripts.tar to the site-packages directory of your conda environment:
      tar -xf ~/Downloads/LASIF_scripts.tar -C [/path/to/conda/environments]/lasif_ext/lib/python2.7/site-packages/
    5. Make a project directory and extract all needed packages into it:
      # make project directory
      mkdir CEMed_project_Blometal
      cd CEMed_project_Blometal
      
      # extract data tarballs into it
      tar -xf ~/Downloads/EMed_full.complete.tar
      tar -xf ~/Downloads/EMed_window_kernels.tar
      tar -xf ~/Downloads/MODEL_FILES.tar
      
      # make scripts directory and extract scripts into it
      mkdir conda_stuff
      tar -xf ~/Downloads/SCRIPTS.tar -C conda_stuff
      
      # make data analysis directory
      mkdir data_analysis
      cd data_analysis
      
      # extract analysis tools
      tar -xf ~/Downloads/NPY_FILES.tar
      tar -xf ~/Downloads/FIGURE_SCRIPTS.tar
      tar -xf ~/Downloads/figs_png.tar

    Now the project should be ready for inspection. The following things can be done, for example:

    • Reproduce the figures in the manuscript. All scripts for this are located in CEMed_project_Blometal/data_analysis/FIGURE_SCRIPTS/.
      conda activate lasif_ext
      cd CEMed_project_Blometal
      jupyter notebook

      This should open up a browser tab that shows the directory structure. Navigate to data_analysis/FIGURE_scripts and click on one of the .ipynb files to open it. If you press 'Kernel' > 'Restart kernel and run all' at the top, all cells will be launched automatically. This should work out of the box.

    • Interact with the lasif project. For this, refer to the LASIF website. Note that above jupyter notebooks do so extensively, using the lasif communicator.
    • Build additional analysis tools, using the tools supplied in SCRIPTS and LASIF_scripts.

    References:

    • Blom, N., Gokhberg, A., and Fichtner, A.: Seismic waveform tomography of the Central and Eastern Mediterranean upper mantle, Solid Earth Discuss., https://doi.org/10.5194/se-2019-152, in review, 2019.

    • Gokhberg, A., Fichtner, A., 2016. Full-waveform inversion on heterogeneous HPC systems. Comp. & Geosci. 89, 260-268. https://doi.org/10.1016/j.cageo.2015.12.013

    • Krischer, L., Fichtner, A., Zukauskaitė, S., and Igel, H. (2015), Large‐Scale Seismic Inversion Framework, Seismological Research Letters, 86(4), 1198–1207. doi:10.1785/0220140248

  14. Z

    Doodleverse/Segmentation Zoo/Seg2Map Res-UNet models for DeepGlobe/7-class...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Buscombe, Daniel (2024). Doodleverse/Segmentation Zoo/Seg2Map Res-UNet models for DeepGlobe/7-class segmentation of RGB 512x512 high-res. images [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7576897
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Marda Science LLC
    Authors
    Buscombe, Daniel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Doodleverse/Segmentation Zoo/Seg2Map Res-UNet models for DeepGlobe/7-class segmentation of RGB 512x512 high-res. images

    These Residual-UNet model data are based on the DeepGlobe dataset

    Models have been created using Segmentation Gym* using the following dataset**: https://www.kaggle.com/datasets/balraj98/deepglobe-land-cover-classification-dataset

    Image size used by model: 512 x 512 x 3 pixels

    classes: 1. urban 2. agricultural 3. rangeland 4. forest 5. water 6. bare 7. unknown

    File descriptions

    For each model, there are 5 files with the same root name:

    1. '.json' config file: this is the file that was used by Segmentation Gym* to create the weights file. It contains instructions for how to make the model and the data it used, as well as instructions for how to use the model for prediction. It is a handy wee thing and mastering it means mastering the entire Doodleverse.

    2. '.h5' weights file: this is the file that was created by the Segmentation Gym* function train_model.py. It contains the trained model's parameter weights. It can called by the Segmentation Gym* function seg_images_in_folder.py. Models may be ensembled.

    3. '_modelcard.json' model card file: this is a json file containing fields that collectively describe the model origins, training choices, and dataset that the model is based upon. There is some redundancy between this file and the config file (described above) that contains the instructions for the model training and implementation. The model card file is not used by the program but is important metadata so it is important to keep with the other files that collectively make the model and is such is considered part of the model

    4. '_model_history.npz' model training history file: this numpy archive file contains numpy arrays describing the training and validation losses and metrics. It is created by the Segmentation Gym function train_model.py

    5. '.png' model training loss and mean IoU plot: this png file contains plots of training and validation losses and mean IoU scores during model training. A subset of data inside the .npz file. It is created by the Segmentation Gym function train_model.py

    Additionally, BEST_MODEL.txt contains the name of the model with the best validation loss and mean IoU

    References *Segmentation Gym: Buscombe, D., & Goldstein, E. B. (2022). A reproducible and reusable pipeline for segmentation of geoscientific imagery. Earth and Space Science, 9, e2022EA002332. https://doi.org/10.1029/2022EA002332 See: https://github.com/Doodleverse/segmentation_gym

    **Demir, I., Koperski, K., Lindenbaum, D., Pang, G., Huang, J., Basu, S., Hughes, F., Tuia, D. and Raskar, R., 2018. Deepglobe 2018: A challenge to parse the earth through satellite images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 172-181).

  15. a

    STN-PLAD

    • datasets.activeloop.ai
    • datasetninja.com
    deeplake
    Updated Feb 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    André Luiz (2022). STN-PLAD [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/stn-plad-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Feb 3, 2022
    Authors
    André Luiz
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This STN-PLAD (STN Power Line Assets Dataset) dataset was generated to aid power line companies in detecting high-voltage power line towers in order to avoid putting workers at risk. The dataset contains 5 different annotated object classes such as Stockbridge damper, spacer, transmission tower, tower plate, and insulator. Each object class has an average of 18.1 annotated instances in total images of 133 power lines. The images were taken from various angles, resolutions, backgrounds, and illumination. All the images were taken through a hi-res UAV. This dataset can make use of popular deep object detection methods.

  16. m

    Goldenhar-CFID: A Novel Dataset for Craniofacial Anomaly Detection in...

    • data.mendeley.com
    Updated Feb 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Israt Jahan (2025). Goldenhar-CFID: A Novel Dataset for Craniofacial Anomaly Detection in Goldenhar Syndrome [Dataset]. http://doi.org/10.17632/ffsthxyp4d.3
    Explore at:
    Dataset updated
    Feb 28, 2025
    Authors
    Israt Jahan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Goldenhar Syndrome Craniofacial Image Dataset (Goldenhar-CFID) is a high-resolution dataset designed for the automated detection and classification of craniofacial abnormalities associated with Goldenhar Syndrome (GS). It comprises 4,483 images, categorized into seven distinct classes of craniofacial deformities. This dataset serves as a valuable resource for researchers in medical image analysis, deep learning, and clinical decision-making. Dataset Characteristics: Total Images: 4,483 Number of Classes: 7 Image Format: JPG Image Resolution: 640 x 640 pixels Annotation: Each image is manually labeled and verified by medical experts Data Preprocessing: Auto-orientation and histogram equalization applied for enhanced feature detection Augmentation Techniques: Rotation, scaling, brightness adjustments, flipping, and contrast modifications Categories and Annotations The dataset includes images categorized into seven craniofacial deformities: Cleft Lip and Palate – Congenital anomaly where the upper lip and/or palate fails to develop properly. Epibulbar Dermoid Tumor – Benign growth on the eye’s surface, typically at the cornea-sclera junction. Eyelid Coloboma – Defect characterized by a partial or complete absence of eyelid tissue. Facial Asymmetry – Uneven development of facial structures. Malocclusion – Misalignment of the teeth and jaws. Microtia – Underdeveloped or absent outer ear. Vertebral Abnormality – Irregular development of spinal vertebrae. Dataset Structure and Splitting The dataset consists of four main subdirectories: Original – Contains 547 raw images. Unaugmented Balanced – Contains 210 images per class. Augmented Unbalanced – Includes 4,483 images with augmentation. Augmented Balanced – Contains 756 images per class. The dataset is split into: Training Set: 80% Validation Set: 10% Test Set: 10%

  17. S2 Dataset -

    • plos.figshare.com
    xlsx
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tahmina Afrose Keya; Siventhiran S. Balakrishnan; Maheswaran Solayappan; Saravana Selvan Dheena Dhayalan; Sreeramanan Subramaniam; Low Jun An; Anthony Leela; Kevin Fernandez; Prahan Kumar; A. Lokeshmaran; Abhijit Vinodrao Boratne; Mohd Tajuddin Abdullah (2024). S2 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0310435.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 7, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Tahmina Afrose Keya; Siventhiran S. Balakrishnan; Maheswaran Solayappan; Saravana Selvan Dheena Dhayalan; Sreeramanan Subramaniam; Low Jun An; Anthony Leela; Kevin Fernandez; Prahan Kumar; A. Lokeshmaran; Abhijit Vinodrao Boratne; Mohd Tajuddin Abdullah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Malaysia, particularly Pahang, experiences devastating floods annually, causing significant damage. The objective of the research was to create a flood susceptibility map for the designated area by employing an Ensemble Machine Learning (EML) algorithm based on geographic information system (GIS). By analyzing nine key factors from a geospatial database, flood susceptibility map was created with the ArcGIS software (ESRI ArcGIS Pro v3.0.1 x64). The Random Forest (RF) model was employed in this study to categorize the study area into distinct flood susceptibility classes. The Feature selection (FS) method was used to ranking the flood influencing factors. To validate the flood susceptibility models, standard statistical measures and the Area Under the Curve (AUC) were employed. The FS ranking demonstrated that the primary attributes to flooding in the study region are rainfall and elevation, with slope, geology, curvature, flow accumulation, flow direction, distance from the river, and land use/land cover (LULC) patterns ranking subsequently. The categories of ’very high’ and ’high’ class collectively made up 37.1% and 26.3% of the total area, respectively. The flood vulnerability assessment of Pahang found that the Eastern, Southern, and central regions were at high risk of flooding due to intense precipitation, low-lying topography with steep inclines, proximity to the shoreline and rivers, and abundant flooded vegetation, crops, urban areas, bare ground, and rangeland. Conversely, areas with dense tree canopies or forests were less susceptible to flooding in this research area. The ROC analysis demonstrated strong performance on the validation datasets, with an AUC value of >0.73 and accuracy scores exceeding 0.71. Research on flood susceptibility mapping can enhance risk reduction strategies and improve flood management in vulnerable areas. Technological advancements and expertise provide opportunities for more sophisticated methods, leading to better prepared and resilient communities.

  18. D

    Data from: Dataset corresponding to 'The model for the accompaniment of...

    • ssh.datastations.nl
    • narcis.nl
    pdf, zip
    Updated Jul 11, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    B.J. Geyser; C.A.M. Hermans; B.J. Geyser; C.A.M. Hermans (2017). Dataset corresponding to 'The model for the accompaniment of seekers into silence in their quest for wholeness' [Dataset]. http://doi.org/10.17026/DANS-XMG-KNM8
    Explore at:
    pdf(67935), zip(22005), pdf(67856), pdf(76140), pdf(60021), pdf(57263), pdf(52444), pdf(67827), pdf(60949), pdf(55298)Available download formats
    Dataset updated
    Jul 11, 2017
    Dataset provided by
    DANS Data Station Social Sciences and Humanities
    Authors
    B.J. Geyser; C.A.M. Hermans; B.J. Geyser; C.A.M. Hermans
    License

    https://doi.org/10.17026/fp39-0x58https://doi.org/10.17026/fp39-0x58

    Description

    This dataset belongs to the following dissertation: Barend Johannes Geyser (2017). The model for the accompaniment of seekers with a Christian background into silence in their quest for wholeness. Radboud UniversityData gathering has taken place by means of phenomenological interviews , observations and making field notes during the interviews, as well as video-stimulated recall. The interview transcripts are written in the South African language.This dataset contains the interview transcripts.The researcher decided to select participants who were starting with their second half of life, thus from 40 to 55 years of age. The participants are all from a Christian background and they were all living in the Northern suburbs of Johannesburg, which means that they are from the socio- economic middle class and upper middle class. There are 3 women and 5 men interviewed. The interviews involve the conscious selection of certain participants. In this instance, the participants are seekers that ask for accompaniment into silence. They are all Christian seekers on a quest for wholeness and investigating the possibility of the practice of silence as an aid in their quest. They all attempted to practice silence in some or other way for at least three years.In addition to the eight interview transcripts, a read me text is added to explain the context of the dataset.

  19. F

    English Image Captioning Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Image Captioning Dataset [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/english-image-caption-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the English Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.

    Image Data

    This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.

    Sources: Images are sourced from public databases and proprietary collections.
    Clarity and Relevance: Each image is vetted for visual clarity and relevance, ensuring it accurately represents real-world scenarios.
    Copyright: All selected images are free from copyright restrictions, allowing for unrestricted use in research and development.
    Format: Images in the dataset are available in various formats like JPEG, PNG, and HEIC.
    Image Categories: The dataset spans a wide range of image categories to ensure thorough training, fine-tuning, and testing of image captioning models. categories include:
    Daily Life: Images about household objects, activities, and daily routines.
    Nature and Environment: Images related to natural scenes, plants, animals, and weather.
    Technology and Gadgets: Images about electronic devices, tools, and machinery.
    Human Activities: Images about people, their actions, professions, and interactions.
    Geography and Landmarks: Images related to specific locations, landmarks, and geographic features.
    Food and Dining: Images about different foods, meals, and dining settings.
    Education: Images related to educational settings, materials, and activities.
    Sports and Recreation: Images about various sports, games, and recreational activities.
    Transportation: Images about vehicles, travel methods, and transportation infrastructure.
    Cultural and Historical: Images about cultural artifacts, historical events, and traditions.

    Caption Data

    Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.

    Caption Details:
    Human Generated: Each caption is generated by native English people.
    Quality Assurance: Captions are meticulously reviewed for linguistic accuracy, coherence, and relevance to the corresponding images.
    Contextual Relevance: Captions are generated by keeping the visual insights like objects, scenes, actions, and settings depicted in the images.

    Metadata

    Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:

    Image File Name
    Category
    Caption

    Usage and Applications

    The Image Captioning Dataset serves various applications across different domains:

    Training Image Captioning Models: Provides high-quality data for training and fine-tuning Generative AI models to generate accurate and

  20. i

    Richest Zip Codes in New Jersey

    • incomebyzipcode.com
    Updated Dec 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cubit Planning, Inc. (2024). Richest Zip Codes in New Jersey [Dataset]. https://www.incomebyzipcode.com/newjersey
    Explore at:
    Dataset updated
    Dec 18, 2024
    Dataset authored and provided by
    Cubit Planning, Inc.
    License

    https://www.incomebyzipcode.com/terms#TERMShttps://www.incomebyzipcode.com/terms#TERMS

    Area covered
    New Jersey
    Description

    A dataset listing the richest zip codes in New Jersey per the most current US Census data, including information on rank and average income.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Neilsberg Research (2025). Income Distribution by Quintile: Mean Household Income in Amherst, New York // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/amherst-ny-median-household-income/

Income Distribution by Quintile: Mean Household Income in Amherst, New York // 2025 Edition

Explore at:
json, csvAvailable download formats
Dataset updated
Mar 3, 2025
Dataset authored and provided by
Neilsberg Research
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Amherst, New York
Variables measured
Income Level, Mean Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the mean household income for each of the five quintiles in Amherst, New York, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

Key observations

  • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 18,852, while the mean income for the highest quintile (20% of households with the highest income) is 296,153. This indicates that the top earners earn 16 times compared to the lowest earners.
  • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 495,426, which is 167.29% higher compared to the highest quintile, and 2627.98% higher compared to the lowest quintile.
Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Income Levels:

  • Lowest Quintile
  • Second Quintile
  • Third Quintile
  • Fourth Quintile
  • Highest Quintile
  • Top 5 Percent

Variables / Data Columns

  • Income Level: This column showcases the income levels (As mentioned above).
  • Mean Household Income: Mean household income, in 2023 inflation-adjusted dollars for the specific income level.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Amherst town median household income. You can refer the same here

Search
Clear search
Close search
Google apps
Main menu