26 datasets found
  1. N

    Income Distribution by Quintile: Mean Household Income in Lancaster County,...

    • neilsberg.com
    csv, json
    Updated Mar 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Income Distribution by Quintile: Mean Household Income in Lancaster County, PA // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/lancaster-county-pa-median-household-income/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Mar 3, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Lancaster County, Pennsylvania
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Lancaster County, PA, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 22,178, while the mean income for the highest quintile (20% of households with the highest income) is 255,448. This indicates that the top earners earn 12 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 448,457, which is 175.56% higher compared to the highest quintile, and 2022.08% higher compared to the lowest quintile.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2023 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Lancaster County median household income. You can refer the same here

  2. N

    Income Distribution by Quintile: Mean Household Income in Deptford Township,...

    • neilsberg.com
    csv, json
    Updated Mar 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Income Distribution by Quintile: Mean Household Income in Deptford Township, New Jersey // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/deptford-township-nj-median-household-income/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Mar 3, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Deptford, New Jersey
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Deptford Township, New Jersey, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 21,507, while the mean income for the highest quintile (20% of households with the highest income) is 257,345. This indicates that the top earners earn 12 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 384,780, which is 149.52% higher compared to the highest quintile, and 1789.09% higher compared to the lowest quintile.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2023 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Deptford township median household income. You can refer the same here

  3. m

    Demographics of Upper-Middle Class Citizens in Gachibowli, Hyderabad, India

    • data.mendeley.com
    Updated Dec 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Praagna Shrikrishna Sriram (2019). Demographics of Upper-Middle Class Citizens in Gachibowli, Hyderabad, India [Dataset]. http://doi.org/10.17632/k55rb6zk3v.1
    Explore at:
    Dataset updated
    Dec 15, 2019
    Authors
    Praagna Shrikrishna Sriram
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Gachibowli, Hyderabad, India
    Description

    This dataset is one which highlights the demographics of Upper-Middle Class people living in Gachibowli, Hyderabad, India and attempts to, through various methods of statistical analysis, establish a relationship between several of these demographic details.

  4. High income tax filers in Canada

    • www150.statcan.gc.ca
    • open.canada.ca
    • +1more
    Updated Oct 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2024). High income tax filers in Canada [Dataset]. http://doi.org/10.25318/1110005501-eng
    Explore at:
    Dataset updated
    Oct 28, 2024
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    This table presents income shares, thresholds, tax shares, and total counts of individual Canadian tax filers, with a focus on high income individuals (95% income threshold, 99% threshold, etc.). Income thresholds are based on national threshold values, regardless of selected geography; for example, the number of Nova Scotians in the top 1% will be calculated as the number of taxfiling Nova Scotians whose total income exceeded the 99% national income threshold. Different definitions of income are available in the table namely market, total, and after-tax income, both with and without capital gains.

  5. N

    Income Distribution by Quintile: Mean Household Income in Middle Inlet,...

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Income Distribution by Quintile: Mean Household Income in Middle Inlet, Wisconsin [Dataset]. https://www.neilsberg.com/research/datasets/94c785c2-7479-11ee-949f-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Middle Inlet, Wisconsin
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Middle Inlet, Wisconsin, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 21,360, while the mean income for the highest quintile (20% of households with the highest income) is 162,915. This indicates that the top earners earn 8 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 282,509, which is 173.41% higher compared to the highest quintile, and 1322.61% higher compared to the lowest quintile.

    Mean household income by quintiles in Middle Inlet, Wisconsin (in 2022 inflation-adjusted dollars))

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2022 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Middle Inlet town median household income. You can refer the same here

  6. National Hydrography Dataset Plus High Resolution

    • hub.arcgis.com
    • oregonwaterdata.org
    Updated Mar 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2023). National Hydrography Dataset Plus High Resolution [Dataset]. https://hub.arcgis.com/maps/f1f45a3ba37a4f03a5f48d7454e4b654
    Explore at:
    Dataset updated
    Mar 15, 2023
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    Description

    The National Hydrography Dataset Plus High Resolution (NHDplus High Resolution) maps the lakes, ponds, streams, rivers and other surface waters of the United States. Created by the US Geological Survey, NHDPlus High Resolution provides mean annual flow and velocity estimates for rivers and streams. Additional attributes provide connections between features facilitating complicated analyses.For more information on the NHDPlus High Resolution dataset see the User’s Guide for the National Hydrography Dataset Plus (NHDPlus) High Resolution.Dataset SummaryPhenomenon Mapped: Surface waters and related features of the United States and associated territoriesGeographic Extent: The Contiguous United States, Hawaii, portions of Alaska, Puerto Rico, Guam, US Virgin Islands, Northern Marianas Islands, and American SamoaProjection: Web Mercator Auxiliary Sphere Visible Scale: Visible at all scales but layer draws best at scales larger than 1:1,000,000Source: USGSUpdate Frequency: AnnualPublication Date: July 2022This layer was symbolized in the ArcGIS Map Viewer and while the features will draw in the Classic Map Viewer the advanced symbology will not. Prior to publication, the network and non-network flowline feature classes were combined into a single flowline layer. Similarly, the Area and Waterbody feature classes were merged under a single schema.Attribute fields were added to the flowline and waterbody layers to simplify symbology and enhance the layer's pop-ups. Fields added include Pop-up Title, Pop-up Subtitle, Esri Symbology (waterbodies only), and Feature Code Description. All other attributes are from the original dataset. No data values -9999 and -9998 were converted to Null values.What can you do with this layer?Feature layers work throughout the ArcGIS system. Generally your work flow with feature layers will begin in ArcGIS Online or ArcGIS Pro. Below are just a few of the things you can do with a feature service in Online and Pro.ArcGIS OnlineAdd this layer to a map in the map viewer. The layer or a map containing it can be used in an application. Change the layer’s transparency and set its visibility rangeOpen the layer’s attribute table and make selections. Selections made in the map or table are reflected in the other. Center on selection allows you to zoom to features selected in the map or table and show selected records allows you to view the selected records in the table.Apply filters. For example you can set a filter to show larger streams and rivers using the mean annual flow attribute or the stream order attribute.Change the layer’s style and symbologyAdd labels and set their propertiesCustomize the pop-upUse as an input to the ArcGIS Online analysis tools. This layer works well as a reference layer with the trace downstream and watershed tools. The buffer tool can be used to draw protective boundaries around streams and the extract data tool can be used to create copies of portions of the data.ArcGIS ProAdd this layer to a 2d or 3d map.Use as an input to geoprocessing. For example, copy features allows you to select then export portions of the data to a new feature class.Change the symbology and the attribute field used to symbolize the dataOpen table and make interactive selections with the mapModify the pop-upsApply Definition Queries to create sub-sets of the layerThis layer is part of the ArcGIS Living Atlas of the World that provides an easy way to explore the landscape layers and many other beautiful and authoritative maps on hundreds of topics.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.

  7. d

    College Enrollment, Credit Attainment and Remediation of High School...

    • catalog.data.gov
    • data.ct.gov
    • +1more
    Updated Sep 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ct.gov (2023). College Enrollment, Credit Attainment and Remediation of High School Graduates by School [Dataset]. https://catalog.data.gov/dataset/college-enrollment-credit-attainment-and-remediation-of-high-school-graduates-by-school
    Explore at:
    Dataset updated
    Sep 2, 2023
    Dataset provided by
    data.ct.gov
    Description

    The data here is from the report entitled Trends in Enrollment, Credit Attainment, and Remediation at Connecticut Public Universities and Community Colleges: Results from P20WIN for the High School Graduating Classes of 2010 through 2016. The report answers three questions: 1. Enrollment: What percentage of the graduating class enrolled in a Connecticut public university or community college (UCONN, the four Connecticut State Universities, and 12 Connecticut community colleges) within 16 months of graduation? 2. Credit Attainment: What percentage of those who enrolled in a Connecticut public university or community college within 16 months of graduation earned at least one year’s worth of credits (24 or more) within two years of enrollment? 3. Remediation: What percentage of those who enrolled in one of the four Connecticut State Universities or one of the 12 community colleges within 16 months of graduation took a remedial course within two years of enrollment? Notes on the data: School Credit: % Earning 24 Credits is a subset of the % Enrolled in 16 Months. School Remediation: % Enrolled in Remediation is a subset of the % Enrolled in 16 Months.

  8. VAE-Encoded-Faces-HQ

    • kaggle.com
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Z Andy Supotco (2024). VAE-Encoded-Faces-HQ [Dataset]. https://www.kaggle.com/datasets/zandysupotco/sd-vae-ft-ema-f8-256-faces6-enc
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 4, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Z Andy Supotco
    Description

    This dataset contains 250k 256*256 pix high resolution human, anime and animal faces encoded by "sd-vae-ema-f8" from huggingface / diffusers (https://github.com/huggingface/diffusers/) and saved as ".pt" files (e.g. the "afhq_x.pt" contains a torch.Tensor shaped [15.8k, 32, 32, 4] with dtype float32; the "afhq_cls.pt" is the dataset label, a LongTensor shaped [15.8k,]). Each original image is 256*256*3, and is encoded to 32*32*4. The original images are from 6 different datasets on Kaggle. Motivated by personal interests, I created this dataset for class-conditional image generation with Latent Diffusion Models (LDMs).

    Contents

    I chose the following HQ datasets on Kaggle based on personal appetite. |Datset Label| Dataset Name | Dataset Size |Description| URL | | --- | --- | |0|AFHQ |15.8k| Cat, dog and wild animal faces| https://www.kaggle.com/datasets/dimensi0n/afhq-512/data |1|FFHQ |70.0k|Human faces|https://www.kaggle.com/datasets/xhlulu/flickrfaceshq-dataset-nvidia-resized-256px |2|CelebA-HQ |30.0k|Celebrity faces| https://www.kaggle.com/datasets/denislukovnikov/celebahq256-images-only |3|FaceAttributes |24.0k |Human faces| https://www.kaggle.com/datasets/mantasu/face-attributes-grouped |4|AnimeGAN |25.7k|Anime faces generated by styleGAN-2| https://www.kaggle.com/datasets/prasoonkottarathil/gananime-lite |5|AnimeFaces |92.2k| Anime faces|https://www.kaggle.com/datasets/scribbless/another-anime-face-dataset

    I find my LDM hard to learn the samples in AFHQ and FaceAttributes, but behaves reasonably well on the other datasets.

    Encoding and Decoding Pipeline

    The image is first downsampled to 256 pix (the above datasets provide original images of either 256 pix or 512 pix). They're normalized (img = img / 127.5 - 1) before encoded by the sd-vae-ema-f8 encoder. The output latent code is shaped as [batch_size, 32, 32, 4]. The std of the latent code is ~4.5 and the mean is <0.5. ``` import torch from diffusers.models import AutoencoderKL

    def encode(normalized_images: torch.Tensor, mode=True): dist = vae_model.encode(normalized_images).latent_dist if mode: return dist.mode() else: return dist.sample()

    def decode(latent_code: torch.Tensor): return vae_model.decode(latent_code).sample

    this is a model with 34M encoder params and 49M decoder params

    model_name = "stabilityai/sd-vae-ft-ema" vae_model = AutoencoderKL.from_pretrained(model_name) vae_model.eval().requires_grad_(False) ``` It took about 45 min on a P100 GPU on Kaggle to encode these 250k images (with a batch size of 32, which didn't fully take advantage of the GPU's 16GB VRAM!).

  9. d

    TissueNet: detect lesions in uterine cervix specimens - Open data set

    • data.gouv.fr
    csv, tiff
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Société Française de Pathologie (2025). TissueNet: detect lesions in uterine cervix specimens - Open data set [Dataset]. https://www.data.gouv.fr/en/datasets/tissuenet-detect-lesions-in-uterine-cervix-specimens-open-data-set/
    Explore at:
    tiff, csvAvailable download formats
    Dataset updated
    Mar 11, 2025
    Dataset authored and provided by
    Société Française de Pathologie
    License

    https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence

    Description
    1. Purpose of the database: This database was collected in order to organize the TissueNet Data Challenge. This dataset consists of high resolution images of microscopic slides created from cervical biopsies and surgical specimens. Additionally, the competitors were given slide metadata as well as annotations for the training set that outlined some (but not necessarily all) of the lesions present on a slide. The database shared on data.gouv.fr is a portion of the database used for the data challenge. This database contains the slides coming from the pathological centers that have agreed to share the data openly. This database includes 1272 microscopic slides of uterine cervical tissue from medical centers across France. The slides are distributed in the following datasets: diagnosed_biopsies = 443 slides diagnosed_conizations = 21 slides annotated_biopsies = 295 slides undiagnosed_biopsies = 217 slides undiagnosed_conizations = 296 slides Diagnosed_biopsies & diagnosed_conizations: Pathologists have labeled each slide according to four classes of lesion severity as classified by the World Health Organization (5th edition): 0: benign (normal or subnormal) 1: low malignant potential (low grade squamous intraepithelial lesion) 2: high malignant potential (high grade squamous intraepithelial lesion) 3: invasive cancer (invasive squamous carcinoma) → It refers to the class of the most severe lesion on the slide (at the slide level, not annotation level). Fully_annotated_biopsies: Pathologists have labeled and annotated these images to point out regions that represent lesions. When working with the annotations, it's important to keep in mind the following points: -- The annotated regions do not necessarily include all lesioned tissue in the slide. An unannotated region is not necessarily normal tissue. -- The whole image class label and the annotation class label do not necessarily match. The annotated regions may be the image's labeled class or below. For instance, an image labeled as a class 2 lesion could have annotations representing class 0, 1, or 2. At least some of the annotated regions will represent the most severe/labeled class. All annotations on a slide with label 0 will be normal tissue. -- The lesion may fall entirely within the square, or may extend beyond the annotation boundaries. -- All annotations are a fixed size of 300x300 micrometers. As images have different resolutions in pixels/micrometer, annotations will have different dimensions in terms of pixels. -- When using the geometries, it is important to know the origin of the coordinate system. Image processing software may assume the image origin is either the bottom left or the top left. The WKT shapes that we provide as annotations (geometry column in train_annotations.csv) are relative to the bottom left being the origin (0, 0). Undiagnosed_biopsies & undiagnosed_conizations: There are no labels for these images or corresponding annotations All images are standardized in pyramidal TIF format. These images are compressed using JPEG Q=75. The pyramidal TIF format maintains a sufficient level of detail for pathologists to perform diagnoses while enabling smaller file sizes and easier loading with actively developed Python libraries such as PyVips. 2. Context of creation of the database: This database was created as part of the TissueNet Data Challenge. This challenge began in 2019 when the French Society of Pathology (SFP) and the Health Data Hub (HDH) decided to build a challenge using a data bank of whole slide images (WSIs). Nineteen public and private pathology departments across France contributed more than 5,000 WSIs as data for the challenge. These slides are often difficult for pathologists themselves to diagnose, and expert eyes may be required. All labeled images included in the challenge were reviewed twice by expert pathologists. The database shared on data.gouv.fr is a portion of the database used for the data challenge. This database contains the slides coming from the pathological centers that have agreed to share the data openly. This database includes 1272 microscopic slides of uterine cervical tissue from medical centers across France. 3. Target: Data challenges are global competitions aimed at solving specific problems within a given time frame using highly anonymized data. Thus, these challenges are intended for data scientists (researcher, industrials, students etc.) from all around the world. The objective of the challenge was to classify each image according to the most severe category of epithelial lesion present in the sample. The classes are defined as follows: 0: benign (normal or subnormal) 1: low malignant potential (low grade squamous intraepithelial lesion) 2: high malignant potential (high grade squamous intraepithelial lesion) 3: invasive cancer (invasive squamous carcinoma) 4. Results obtained from the database: In the TissueNet competition, participants were tasked with building machine learning models that could predict the most severe lesions in each digital biopsy slide. What's more, participants needed to submit code for executing their solution on test data in the cloud, ensuring that the model could run fast enough on this large scale data to be useful in practice. This setup rewards models that perform well on unseen images and brings these innovations one step closer to impact. Global performance of each algorithm was evaluated according to a custom metric devised by a panel of expert pathologists. The score for each prediction equals 1 minus the error, where the error weighting for misclassification has been set by an expert consensus within the scientific council as defined in the table below. The total error is the average error across all predictions. Note that the metric is symmetric, e.g., predicting class 3 when it is actually class 0 produces the same error as predicting class 0 when it is actually class 3. Error table of misclassification: The winning solutions used clever approaches to prioritize the parts of each slide to analyze further, and built computer vision pipelines to determine the most appropriate diagnosis for the selected tissue. Models were scored not just on their accuracy, but also on the impact of their errors (providing a large penalty for mistakes that have worse consequences in practice). The top-performing model achieved over 76% accuracy in predicting the exact severity label of each slide across 4 ranked classes, including 95% accuracy for the most severe class of cancerous tissue. In addition, the top 3 solutions achieved >98% on-or-adjacent accuracy, meaning they reduced the more costly misclassifications that erred by more than one class to less than 2% of the 1,500+ slide test set! All prize-winning solutions are available under an open source license for ongoing use and learning. For more details : winning models on GitHub 5. Other informations: Here are some resources you can use in order to work with the data : -OpenSlide supports all native whole slide image formats, including: .mrxs (MIRAX) .svs (Aperio) .ndpi (Hamamatsu) -PyVips is a Python binding for libvips, a low-level library for working with large images. PyVips can be used to read and manipulate the pyramidal TIF formats. -Cytomine allows you to display and explore native whole slide images and pyramidal TIF formats in a web browser. It also supports adding annotations and executing scripts from inside Cytomine or from any computing server using the dedicated Cytomine Python client. Cytomine can be installed locally or on any Linux server. The Cytomine GitHub repository includes examples of Python scripts demonstrating how to interact with your Cytomine instance, as well as examples of ready-to-use machine learning scripts (all S_ prefixed repos, such as S_CellDetect_Stardist_HE_ROI). Here are a few papers and tutorials that talk about machine learning with WSI that you may find helpful: Can AI predict epithelial lesion categories via automated analysis of cervical biopsies: The TissueNet challenge? Le premier data challenge organisé par la Société Française de Pathologie : une compétition internationale en 2020, un outil de recherche en intelligence artificielle pour l’avenir ?The first data challenge of the french society of pathology: An international competition in 2020, a research tool in A.I. for the future? Whole slide image preprocessing in Python Assessment of Machine Learning of Breast Pathology Structures for Automated Differentiation of Breast Cancer and High-Risk Proliferative Lesions - PubMed Using deep convolutional neural networks to identify and classify tumor-associated stroma in diagnostic breast biopsies Assessment of Machine Learning of Breast Pathology Structures for Automated Differentiation of Breast Cancer and High-Risk Proliferative Lesions Histologic tissue components provide major cues for machine learning-based prostate cancer detection and grading on prostatectomy specimens Assessment of Machine Learning Detection of Environmental Enteropathy and Celiac Disease in Children 6. Licences: Creative Commons Attribution (CC BY 3.0) Licence Ouverte/Open Licence 2.0 (Etalab 2.0) 7. User form: USER FORM The purpose of the user form is to track who (in terms of individuals and institutions) is using the data and potentially for what purposes. This form is not restrictive in the sense that access requests will never be denied. 7. Cite: For any reuse of this database, use the DOI provided below: https://doi.org/10.60597/eaqa-k904
  10. Network Slicing

    • kaggle.com
    Updated Aug 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Puspak Meher (2022). Network Slicing [Dataset]. https://www.kaggle.com/datasets/puspakmeher/networkslicing/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Puspak Meher
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Overview

    Cellular communications, especially with the advent of 5G mobile networks, demand stringent adherence to high-reliability standards, ultra-low latency, increased capacity, enhanced security, and high-speed user connectivity. To fulfill these requirements, mobile operators require a programmable solution capable of supporting multiple independent tenants on a single physical infrastructure. The advent of 5G networks facilitates end-to-end resource allocation through Network Slicing (NS), which allows for the division of the network into distinct virtual slices.

    Network slicing in 5G stands as a pivotal feature for next-generation wireless networks, delivering substantial benefits to both mobile operators and businesses. Developing a Machine Learning (ML) model is crucial for accurately predicting the optimal network slice based on key device parameters. Such a model also plays a vital role in managing network load balancing and addressing network slice failures.

    Dataset Characteristics and Target Classes

    The dataset is structured to support the development of an ML model that can classify the optimal network slice based on device parameters. The target output comprises three distinct classes:

    1. Enhanced Mobile Broadband (eMBB):

      • Focuses on high-bandwidth and high-speed data transmission.
      • Facilitates activities such as high-definition video streaming, online gaming, and immersive media experiences.
    2. Ultra-Reliable Low Latency Communication (URLLC):

      • Emphasizes extremely reliable and low-latency connections.
      • Supports critical applications like autonomous vehicles, industrial automation, and remote surgery.
    3. Massive Machine Type Communication (mMTC):

      • Aims to support a massive number of connected devices.
      • Enables efficient communication between Internet of Things (IoT) devices, smart cities, and sensor networks.

    File name: deepslice_data.csv

    Data Attributes (Columns Desc)

    • Device ID: Unique identifier for each device.
    • Connection Type: Specifies the type of connection (e.g., LTE, 5G).
    • Latency Requirements (ms): The maximum allowable latency for the device's operation.
    • Bandwidth Requirements (Mbps): The bandwidth needed for optimal device performance.
    • Reliability (%): The required reliability level for the device's connection.
    • Data Rate (Mbps): The data rate the device can handle.
    • Device Type: Categorizes the device (e.g., smartphone, IoT sensor).
    • Mobility (Low/Medium/High): Indicates the mobility level of the device.
    • Battery Life (hours): Expected battery life of the device.
    • Application Type: The primary application for the device's connection (e.g., video streaming, industrial control).

    Class Distribution

    The dataset includes labeled instances categorized into the three target classes: eMBB, URLLC, and mMTC. Each instance corresponds to a specific device configuration and its optimal network slice.

    Application and Relevance

    Network slicing in 5G is instrumental in provisioning tailored network services for specific use cases, ensuring optimal performance, resource utilization, and user experiences based on the requirements of eMBB, URLLC, and mMTC applications. This dataset is invaluable for researchers and practitioners aiming to design and implement ML models for network slice prediction, thereby enhancing the operational efficiency and reliability of 5G networks.

    Conclusion

    This dataset is meticulously curated to facilitate the development of ML models for predicting the optimal 5G network slice. It encompasses a comprehensive set of attributes and target classes, ensuring that it meets the highest standards required for advanced research and practical applications in the field of cellular communications and network management.

  11. g

    Dataset Direct Download Service (WFS): Upper Aa PPRN Hazard Zone

    • gimi9.com
    • data.europa.eu
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataset Direct Download Service (WFS): Upper Aa PPRN Hazard Zone [Dataset]. https://gimi9.com/dataset/eu_fr-120066022-srv-76ecbdf2-b366-4a06-a38d-ef3994afa9f8/
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Area exposed to one or more hazards represented on the hazard map used for risk analysis of the RPP. The hazard map is the result of the study of hazards, the objective of which is to assess the intensity of each hazard at any point in the study area. The evaluation method is specific to each hazard type. It leads to the delimitation of a set of areas on the study perimeter constituting a zoning graduated according to the level of the hazard. The allocation of a hazard level at a given point in the territory takes into account the probability of occurrence of the dangerous phenomenon and its degree of intensity. For multi-random PPRNs, each zone is usually identified on the hazard map by a code for each hazard to which it is exposed. All hazard areas shown on the hazard map are included. Areas protected by protective structures must be represented (possibly in a specific way) as they are always considered to be subject to hazard (cases of breakage or inadequacy of the structure).The hazard zones may be classified as data compiled in so far as they result from a synthesis using several sources of calculated, modelled or observed hazard data. These source data are not concerned by this class of objects but by another standard dealing with the knowledge of hazards. Some areas within the study area are considered “no or insignificant hazard zones”. These are the areas where the hazard has been studied and is nil. These areas are not included in the object class and do not have to be represented as hazard zones. However, in the case of natural RPPs, regulatory zoning may classify certain areas not exposed to hazard as prescribing areas (see definition of the PPR class).

  12. g

    Simple download service (Atom) of the dataset: Upper Lilies PPRN Hazard Zone...

    • gimi9.com
    • data.europa.eu
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simple download service (Atom) of the dataset: Upper Lilies PPRN Hazard Zone [Dataset]. https://gimi9.com/dataset/eu_fr-120066022-srv-b38e7744-ae85-473d-bc95-e84de2666a8f/
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Area exposed to one or more hazards represented on the hazard map used for risk analysis of the RPP. The hazard map is the result of the study of hazards, the objective of which is to assess the intensity of each hazard at any point in the study area. The evaluation method is specific to each hazard type. It leads to the delimitation of a set of areas on the study perimeter constituting a zoning graduated according to the level of the hazard. The allocation of a hazard level at a given point in the territory takes into account the probability of occurrence of the dangerous phenomenon and its degree of intensity. For multi-random PPRNs, each zone is usually identified on the hazard map by a code for each hazard to which it is exposed. All hazard areas shown on the hazard map are included. Areas protected by protective structures must be represented (possibly in a specific way) as they are always considered to be subject to hazard (cases of breakage or inadequacy of the structure).The hazard zones may be classified as data compiled in so far as they result from a synthesis using several sources of calculated, modelled or observed hazard data. These source data are not concerned by this class of objects but by another standard dealing with the knowledge of hazards. Some areas within the study area are considered “no or insignificant hazard zones”. These are the areas where the hazard has been studied and is nil. These areas are not included in the object class and do not have to be represented as hazard zones. However, in the case of natural RPPs, regulatory zoning may classify certain areas not exposed to hazard as prescribing areas (see definition of the PPR class).

  13. N

    Income Distribution by Quintile: Mean Household Income in St. Tammany...

    • neilsberg.com
    csv, json
    Updated Mar 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Income Distribution by Quintile: Mean Household Income in St. Tammany Parish, LA // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/st-tammany-parish-la-median-household-income/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Mar 3, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Louisiana, St. Tammany Parish
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in St. Tammany Parish, LA, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 18,444, while the mean income for the highest quintile (20% of households with the highest income) is 263,439. This indicates that the top earners earn 14 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 451,482, which is 171.38% higher compared to the highest quintile, and 2447.85% higher compared to the lowest quintile.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2023 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for St. Tammany Parish median household income. You can refer the same here

  14. t

    Tucson Equity Priority Index (TEPI): Ward 1 Census Block Groups

    • teds.tucsonaz.gov
    Updated Feb 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tucson (2025). Tucson Equity Priority Index (TEPI): Ward 1 Census Block Groups [Dataset]. https://teds.tucsonaz.gov/datasets/tucson-equity-priority-index-tepi-ward-1-census-block-groups/about
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset authored and provided by
    City of Tucson
    Area covered
    Description

    For detailed information, visit the Tucson Equity Priority Index StoryMap.Download the Data DictionaryWhat is the Tucson Equity Priority Index (TEPI)?The Tucson Equity Priority Index (TEPI) is a tool that describes the distribution of socially vulnerable demographics. It categorizes the dataset into 5 classes that represent the differing prioritization needs based on the presence of social vulnerability: Low (0-20), Low-Moderate (20-40), Moderate (40-60), Moderate-High (60-80) High (80-100). Each class represents 20% of the dataset’s features in order of their values. The features within the Low (0-20) classification represent the areas that, when compared to all other locations in the study area, have the lowest need for prioritization, as they tend to have less socially vulnerable demographics. The features that fall into the High (80-100) classification represent the 20% of locations in the dataset that have the greatest need for prioritization, as they tend to have the highest proportions of socially vulnerable demographics. How is social vulnerability measured?The Tucson Equity Priority Index (TEPI) examines the proportion of vulnerability per feature using 11 demographic indicators:Income Below Poverty: Households with income at or below the federal poverty level (FPL), which in 2023 was $14,500 for an individual and $30,000 for a family of fourUnemployment: Measured as the percentage of unemployed persons in the civilian labor forceHousing Cost Burdened: Homeowners who spend more than 30% of their income on housing expenses, including mortgage, maintenance, and taxesRenter Cost Burdened: Renters who spend more than 30% of their income on rentNo Health Insurance: Those without private health insurance, Medicare, Medicaid, or any other plan or programNo Vehicle Access: Households without automobile, van, or truck accessHigh School Education or Less: Those highest level of educational attainment is a High School diploma, equivalency, or lessLimited English Ability: Those whose ability to speak English is "Less Than Well."People of Color: Those who identify as anything other than Non-Hispanic White Disability: Households with one or more physical or cognitive disabilities Age: Groups that tend to have higher levels of vulnerability, including children (those below 18), and seniors (those 65 and older)An overall percentile value is calculated for each feature based on the total proportion of the above indicators in each area. How are the variables combined?These indicators are divided into two main categories that we call Thematic Indices: Economic and Personal Characteristics. The two thematic indices are further divided into five sub-indices called Tier-2 Sub-Indices. Each Tier-2 Sub-Index contains 2-3 indicators. Indicators are the datasets used to measure vulnerability within each sub-index. The variables for each feature are re-scaled using the percentile normalization method, which converts them to the same scale using values between 0 to 100. The variables are then combined first into each of the five Tier-2 Sub-Indices, then the Thematic Indices, then the overall TEPI using the mean aggregation method and equal weighting. The resulting dataset is then divided into the five classes, where:High Vulnerability (80-100%): Representing the top classification, this category includes the highest 20% of regions that are the most socially vulnerable. These areas require the most focused attention. Moderate-High Vulnerability (60-80%): This upper-middle classification includes areas with higher levels of vulnerability compared to the median. While not the highest, these areas are more vulnerable than a majority of the dataset and should be considered for targeted interventions. Moderate Vulnerability (40-60%): Representing the middle or median quintile, this category includes areas of average vulnerability. These areas may show a balanced mix of high and low vulnerability. Detailed examination of specific indicators is recommended to understand the nuanced needs of these areas. Low-Moderate Vulnerability (20-40%): Falling into the lower-middle classification, this range includes areas that are less vulnerable than most but may still exhibit certain vulnerable characteristics. These areas typically have a mix of lower and higher indicators, with the lower values predominating. Low Vulnerability (0-20%): This category represents the bottom classification, encompassing the lowest 20% of data points. Areas in this range are the least vulnerable, making them the most resilient compared to all other features in the dataset.

  15. The National Longitudinal Study of the High School Class of 1972

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Aug 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Education Statistics (NCES) (2023). The National Longitudinal Study of the High School Class of 1972 [Dataset]. https://catalog.data.gov/dataset/the-national-longitudinal-study-of-the-high-school-class-of-1972-682a8
    Explore at:
    Dataset updated
    Aug 13, 2023
    Dataset provided by
    National Center for Education Statisticshttps://nces.ed.gov/
    Description

    The National Longitudinal Study of the High School Class of 1972 (NLS-72) is part of the Secondary Longitudinal Studies (SLS) program; program data is available since 1972 at https://nces.ed.gov/pubsearch/getpubcats.asp?sid=021. The National Longitudinal Study of the High School Class of 1972 (NLS-72) (https://nces.ed.gov/surveys/nls72/index.asp) is a longitudinal survey that follows high school seniors through 5 follow-ups in 1973, 1974, 1976, 1979, and 1986. The study was conducted using a national representative sample of 1972 high school seniors. Key statistics produced from the National Longitudinal Study of High School Class of 1972 are student's educational aspirations and attainment, family formation, and occupations.

  16. Data from: UltraCortex: Submillimeter Ultra-High Field 9.4T Brain MR Image...

    • openneuro.org
    Updated Jun 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas Mahler; Julius Steiglechner; Benjamin Bender; Tobias Lindig; Dana Ramadan; Jonas Bause; Florian Birk; Rahel Heule; Edyta Charyasz; Michael Erb; Vinod Jangir Kumar; Gisela E Hagberg; Pascal Martin; Gabriele Lohmann; Klaus Scheffler (2024). UltraCortex: Submillimeter Ultra-High Field 9.4T Brain MR Image Collection and Manual Cortical Segmentations [Dataset]. http://doi.org/10.18112/openneuro.ds005216.v1.0.0
    Explore at:
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Lucas Mahler; Julius Steiglechner; Benjamin Bender; Tobias Lindig; Dana Ramadan; Jonas Bause; Florian Birk; Rahel Heule; Edyta Charyasz; Michael Erb; Vinod Jangir Kumar; Gisela E Hagberg; Pascal Martin; Gabriele Lohmann; Klaus Scheffler
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    UltraCortex: Submillimeter Ultra-High Field 9.4 T1 Brain MR Image Collection and Manual Cortical Segmentations

    Overview

    Welcome to the UltraCortex repository, which hosts a unique collection of ultra-high field (9.4 Tesla) MRI data of the human brain. This dataset includes detailed structural images and high-quality manual segmentations, making it an invaluable resource for researchers in neuroimaging and computational neuroscience.

    Dataset Contents

    • Structural MR Images: 86 T1-weighted images with resolutions of 0.6 to 0.8 mm.
    • Manual Segmentations: Precise segmentations of 12 brains into gray and white matter compartments.
    • Validation: Segmentations independently validated by two expert neuroradiologists.

    Purpose

    The UltraCortex dataset aims to:

    • Facilitate the development and validation of new algorithms for analyzing ultra-high field MRI data.
    • Serve as a benchmark for existing neuroimaging methods.
    • Provide a rich resource for educational purposes and detailed studies of brain anatomy.

    Data Acquisition

    • Scanner: 9.4 T whole-body MRI scanner (Siemens Healthineers).
    • Sequences: MP-RAGE and MP2RAGE sequences.
    • Participants: 78 healthy adult volunteers (28 females, 50 males; age range: 20-53 years).

    Data Processing

    • Formats: Data is provided in NIfTI format.
    • Anonymization: Images have been anonymized and stripped of all revealing metadata.
    • Segmentation: Manual segmentations were created using ITK-Snap and validated by expert neuroradiologists.

    Usage Notes

    • Data License: The dataset has been marked as dedicated to the public domain.
    • Formats: Data is provided in NIfTI and plain text formats for compatibility with various analysis tools.
    • Open-Source Software: Data processing was performed using open-source software to enhance reproducibility.

    Accessing the Data

    The dataset is publicly available on the OpenNeuro repository:

    Citation

    If you use this dataset, please cite:

    Mahler, L., Steiglechner, J. et al. (2024). UltraCortex: Submillimeter Ultra-High Field 9.4 T1 Brain MR Image Collection and Manual Cortical Segmentations
    

    Contact

    For any questions or further information, please contact the corresponding authors:

    Thank you for using the UltraCortex dataset. We hope it facilitates your research and contributes to advancements in neuroimaging.

  17. d

    Data from: Pseudo-Label Generation for Multi-Label Text Classification

    • catalog.data.gov
    • datasets.ai
    • +3more
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Pseudo-Label Generation for Multi-Label Text Classification [Dataset]. https://catalog.data.gov/dataset/pseudo-label-generation-for-multi-label-text-classification
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    With the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.

  18. a

    Tucson Equity Priority Index (TEPI): Pima County Block Groups

    • tucson-equity-data-hub-cotgis.hub.arcgis.com
    • teds.tucsonaz.gov
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tucson (2024). Tucson Equity Priority Index (TEPI): Pima County Block Groups [Dataset]. https://tucson-equity-data-hub-cotgis.hub.arcgis.com/datasets/tucson-equity-priority-index-tepi-pima-county-block-groups
    Explore at:
    Dataset updated
    Jul 23, 2024
    Dataset authored and provided by
    City of Tucson
    Area covered
    Description

    For detailed information, visit the Tucson Equity Priority Index StoryMap.Download the Data DictionaryWhat is the Tucson Equity Priority Index (TEPI)?The Tucson Equity Priority Index (TEPI) is a tool that describes the distribution of socially vulnerable demographics. It categorizes the dataset into 5 classes that represent the differing prioritization needs based on the presence of social vulnerability: Low (0-20), Low-Moderate (20-40), Moderate (40-60), Moderate-High (60-80) High (80-100). Each class represents 20% of the dataset’s features in order of their values. The features within the Low (0-20) classification represent the areas that, when compared to all other locations in the study area, have the lowest need for prioritization, as they tend to have less socially vulnerable demographics. The features that fall into the High (80-100) classification represent the 20% of locations in the dataset that have the greatest need for prioritization, as they tend to have the highest proportions of socially vulnerable demographics. How is social vulnerability measured?The Tucson Equity Priority Index (TEPI) examines the proportion of vulnerability per feature using 11 demographic indicators:Income Below Poverty: Households with income at or below the federal poverty level (FPL), which in 2023 was $14,500 for an individual and $30,000 for a family of fourUnemployment: Measured as the percentage of unemployed persons in the civilian labor forceHousing Cost Burdened: Homeowners who spend more than 30% of their income on housing expenses, including mortgage, maintenance, and taxesRenter Cost Burdened: Renters who spend more than 30% of their income on rentNo Health Insurance: Those without private health insurance, Medicare, Medicaid, or any other plan or programNo Vehicle Access: Households without automobile, van, or truck accessHigh School Education or Less: Those highest level of educational attainment is a High School diploma, equivalency, or lessLimited English Ability: Those whose ability to speak English is "Less Than Well."People of Color: Those who identify as anything other than Non-Hispanic White Disability: Households with one or more physical or cognitive disabilities Age: Groups that tend to have higher levels of vulnerability, including children (those below 18), and seniors (those 65 and older)An overall percentile value is calculated for each feature based on the total proportion of the above indicators in each area. How are the variables combined?These indicators are divided into two main categories that we call Thematic Indices: Economic and Personal Characteristics. The two thematic indices are further divided into five sub-indices called Tier-2 Sub-Indices. Each Tier-2 Sub-Index contains 2-3 indicators. Indicators are the datasets used to measure vulnerability within each sub-index. The variables for each feature are re-scaled using the percentile normalization method, which converts them to the same scale using values between 0 to 100. The variables are then combined first into each of the five Tier-2 Sub-Indices, then the Thematic Indices, then the overall TEPI using the mean aggregation method and equal weighting. The resulting dataset is then divided into the five classes, where:High Vulnerability (80-100%): Representing the top classification, this category includes the highest 20% of regions that are the most socially vulnerable. These areas require the most focused attention. Moderate-High Vulnerability (60-80%): This upper-middle classification includes areas with higher levels of vulnerability compared to the median. While not the highest, these areas are more vulnerable than a majority of the dataset and should be considered for targeted interventions. Moderate Vulnerability (40-60%): Representing the middle or median quintile, this category includes areas of average vulnerability. These areas may show a balanced mix of high and low vulnerability. Detailed examination of specific indicators is recommended to understand the nuanced needs of these areas. Low-Moderate Vulnerability (20-40%): Falling into the lower-middle classification, this range includes areas that are less vulnerable than most but may still exhibit certain vulnerable characteristics. These areas typically have a mix of lower and higher indicators, with the lower values predominating. Low Vulnerability (0-20%): This category represents the bottom classification, encompassing the lowest 20% of data points. Areas in this range are the least vulnerable, making them the most resilient compared to all other features in the dataset.

  19. High Resolution Land Cover Classification - USA

    • sdiinnovation-geoplatform.hub.arcgis.com
    • hub.arcgis.com
    • +2more
    Updated Dec 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2021). High Resolution Land Cover Classification - USA [Dataset]. https://sdiinnovation-geoplatform.hub.arcgis.com/content/a10f46a8071a4318bcc085dae26d7ee4
    Explore at:
    Dataset updated
    Dec 7, 2021
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    United States
    Description

    Land cover describes the surface of the earth. Land cover maps are useful in urban planning, resource management, change detection, agriculture, and a variety of other applications in which information related to earth surface is required. Land cover classification is a complex exercise and is hard to capture using traditional means. Deep learning models are highly capable of learning these complex semantics and can produce superior results.Using the modelFollow the guide to use the model. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.Fine-tuning the modelThis model can be fine-tuned using the Train Deep Learning Model tool. Follow the guide to fine-tune this model.Input8-bit, 3-band high-resolution (80 - 100 cm) imagery.OutputClassified raster with the same classes as in the Chesapeake Bay Landcover dataset (2013/2014). By default, the output raster contains 9 classes. A simpler classification with 6 classes can be performed by setting the the 'detailed_classes' model argument to false.Note: The output classified raster will not contain 'Aberdeen Proving Ground' class. Find class descriptions here.Applicable geographiesThis model is applicable in the United States and is expected to produce best results in the Chesapeake Bay Region.Model architectureThis model uses the UNet model architecture implemented in ArcGIS API for Python.Accuracy metricsThis model has an overall accuracy of 86.5% for classification into 9 land cover classes and 87.86% for 6 classes. The table below summarizes the precision, recall and F1-score of the model on the validation dataset, for classification into 9 land cover classes:ClassPrecisionRecallF1 ScoreWater0.936140.930460.93329Wetlands0.816590.759050.78677Tree Canopy0.904770.931430.91791Shrubland0.516250.186430.27394Low Vegetation0.859770.866760.86325Barren0.671650.509220.57927Structures0.80510.848870.82641Impervious Surfaces0.735320.685560.70957Impervious Roads0.762810.812380.78682The table below summarizes the precision, recall and F1-score of the model on the validation dataset, for classification into 6 land cover classes: ClassPrecisionRecallF1 ScoreWater0.950.940.95Tree Canopy and Shrubs0.910.920.92Low Vegetation0.850.850.85Barren0.790.690.74Impervious Surfaces0.840.840.84Impervious Roads0.820.830.82Training dataThis model has been trained on the Chesapeake Bay high-resolution 2013/2014 NAIP Landcover dataset (produced by Chesapeake Conservancy with their partners University of Vermont Spatial Analysis Lab (UVM SAL), and Worldview Solutions, Inc. (WSI)) and other high resolution imagery. Find more information about the dataset here.Sample resultsHere are a few results from the model.

  20. N

    Income Distribution by Quintile: Mean Household Income in Sands Point, NY //...

    • neilsberg.com
    csv, json
    Updated Mar 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Income Distribution by Quintile: Mean Household Income in Sands Point, NY // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/sands-point-ny-median-household-income/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Mar 3, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New York, Sands Point
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Sands Point, NY, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 62,342, while the mean income for the highest quintile (20% of households with the highest income) is 1,206,232. This indicates that the top earners earn 19 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 1,779,703, which is 147.54% higher compared to the highest quintile, and 2854.74% higher compared to the lowest quintile.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2023 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Sands Point median household income. You can refer the same here

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Neilsberg Research (2025). Income Distribution by Quintile: Mean Household Income in Lancaster County, PA // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/lancaster-county-pa-median-household-income/

Income Distribution by Quintile: Mean Household Income in Lancaster County, PA // 2025 Edition

Explore at:
json, csvAvailable download formats
Dataset updated
Mar 3, 2025
Dataset authored and provided by
Neilsberg Research
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Lancaster County, Pennsylvania
Variables measured
Income Level, Mean Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the mean household income for each of the five quintiles in Lancaster County, PA, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

Key observations

  • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 22,178, while the mean income for the highest quintile (20% of households with the highest income) is 255,448. This indicates that the top earners earn 12 times compared to the lowest earners.
  • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 448,457, which is 175.56% higher compared to the highest quintile, and 2022.08% higher compared to the lowest quintile.
Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Income Levels:

  • Lowest Quintile
  • Second Quintile
  • Third Quintile
  • Fourth Quintile
  • Highest Quintile
  • Top 5 Percent

Variables / Data Columns

  • Income Level: This column showcases the income levels (As mentioned above).
  • Mean Household Income: Mean household income, in 2023 inflation-adjusted dollars for the specific income level.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Lancaster County median household income. You can refer the same here

Search
Clear search
Close search
Google apps
Main menu