Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the median household income across different racial categories in United States. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to gain insights into economic disparities and trends and explore the variations in median houshold income for diverse racial categories.
Key observations
Based on our analysis of the distribution of United States population by race & ethnicity, the population is predominantly White. This particular racial category constitutes the majority, accounting for 68.17% of the total residents in United States. Notably, the median household income for White households is $79,933. Interestingly, despite the White population being the most populous, it is worth noting that Asian households actually reports the highest median household income, with a median income of $106,954. This reveals that, while Whites may be the most numerous in United States, Asian households experience greater economic prosperity in terms of median household income.
https://i.neilsberg.com/ch/united-states-median-household-income-by-race.jpeg" alt="United States median household income diversity across racial categories">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2022 1-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for United States median household income by race. You can refer the same here
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
This dataset contains information about the demographics of all US cities and census-designated places with a population greater or equal to 65,000. This data comes from the US Census Bureau's 2015 American Community Survey. This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.
Report on Demographic Data in New York City Public Schools, 2020-21Enrollment counts are based on the November 13 Audited Register for 2020. Categories with total enrollment values of zero were omitted. Pre-K data includes students in 3-K. Data on students with disabilities, English language learners, and student poverty status are as of March 19, 2021. Due to missing demographic information in rare cases and suppression rules, demographic categories do not always add up to total enrollment and/or citywide totals. NYC DOE "Eligible for free or reduced-price lunch” counts are based on the number of students with families who have qualified for free or reduced-price lunch or are eligible for Human Resources Administration (HRA) benefits. English Language Arts and Math state assessment results for students in grade 9 are not available for inclusion in this report, as the spring 2020 exams did not take place. Spring 2021 ELA and Math test results are not included in this report for K-8 students in 2020-21. Due to the COVID-19 pandemic’s complete transformation of New York City’s school system during the 2020-21 school year, and in accordance with New York State guidance, the 2021 ELA and Math assessments were optional for students to take. As a result, 21.6% of students in grades 3-8 took the English assessment in 2021 and 20.5% of students in grades 3-8 took the Math assessment. These participation rates are not representative of New York City students and schools and are not comparable to prior years, so results are not included in this report. Dual Language enrollment includes English Language Learners and non-English Language Learners. Dual Language data are based on data from STARS; as a result, school participation and student enrollment in Dual Language programs may differ from the data in this report. STARS course scheduling and grade management software applications provide a dynamic internal data system for school use; while standard course codes exist, data are not always consistent from school to school. This report does not include enrollment at District 75 & 79 programs. Students enrolled at Young Adult Borough Centers are represented in the 9-12 District data but not the 9-12 School data. “Prior Year” data included in Comparison tabs refers to data from 2019-20. “Year-to-Year Change” data included in Comparison tabs indicates whether the demographics of a school or special program have grown more or less similar to its district or attendance zone (or school, for special programs) since 2019-20. Year-to-year changes must have been at least 1 percentage point to qualify as “More Similar” or “Less Similar”; changes less than 1 percentage point are categorized as “No Change”. The admissions method tab contains information on the admissions methods used for elementary, middle, and high school programs during the Fall 2020 admissions process. Fall 2020 selection criteria are included for all programs with academic screens, including middle and high school programs. Selection criteria data is based on school-reported information. Fall 2020 Diversity in Admissions priorities is included for applicable middle and high school programs. Note that the data on each school’s demographics and performance includes all students of the given subgroup who were enrolled in the school on November 13, 2020. Some of these students may not have been admitted under the admissions method(s) shown, as some students may have enrolled in the school outside the centralized admissions process (via waitlist, over-the-counter, or transfer), and schools may have changed admissions methods over the past few years. Admissions methods are only reported for grades K-12. "3K and Pre-Kindergarten data are reported at the site level. See below for definitions of site types included in this report. Additionally, please note that this report excludes all students at District 75 sites, reflecting slightly lower enrollment than our total of 60,265 students
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the median household incomes over the past decade across various racial categories identified by the U.S. Census Bureau in State Center. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. It also showcases the annual income trends, between 2013 and 2023, providing insights into the economic shifts within diverse racial communities.The dataset can be utilized to gain insights into income disparities and variations across racial categories, aiding in data analysis and decision-making..
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for State Center median household income by race. You can refer the same here
NOTE: A more current version of the Protected Areas Database of the United States (PAD-US) is available: PAD-US 2.1 https://doi.org/10.5066/P92QM3NT. The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme (https://communities.geoplatform.gov/ngda-cadastre/). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all public and nonprofit lands and waters. Most are public lands owned in fee; however, long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g. 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. The PAD-US geodatabase maps and describes areas with over twenty-five attributes in nine feature classes to support data management, queries, web mapping services, and analyses. NOTE: A more current version of the Protected Areas Database of the United States (PAD-US) is available: PAD-US 2.1 https://doi.org/10.5066/P92QM3NT This PAD-US Version 2.0 dataset includes a variety of updates and changes from the previous Version 1.4 dataset. The following list summarizes major updates and changes: 1) Expanded database structure with new layers: the geodatabase feature class structure now includes nine feature classes separating fee owned lands, conservation (and other) easements, management designations overlapping fee lands, marine areas, proclamation boundaries and various 'Combined' feature classes (e.g. 'Fee' + 'Easement' + 'Designation' feature classes); 2) Major update of the Federal estate including data from 8 agencies, developed in collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/); 3) Major updates to 30 States and limited additions to 16 other States; 4) Integration of The Nature Conservancy's (TNC) Secured Lands geodatabase; 5) Integration of Ducks Unlimited's (DU) Conservation and Recreation Lands (CARL) database; 6) Integration of The Trust for Public Land's (TPL) Conservation Almanac database; 7) The Nature Conservancy (TNC) Lands database update: the national source of lands owned in fee or managed by TNC; 8) National Conservation Easement Database (NCED) update: complete update of non-sensitive (suitable for publication in the public domain) easements; 9) Complete National Marine Protected Areas (MPA) update: from the NOAA MPA Inventory, including conservation measure ('GAP Status Code', 'IUCN Category') review by NOAA; 10) First integration of Bureau of Energy Ocean Management (BOEM) managed marine lands: BOEM submitted Outer Continental Shelf Area lands managed for natural resources (minerals, oil and gas), a significant and new addition to PAD-US; 11) Fee boundary overlap assessment: topology overlaps in the PAD-US 2.0 'Fee' feature class have been identified and are available for user and data-steward reference (See Logical_Consistency_Report Section). For more information regarding the PAD-US dataset please visit, https://usgs.gov/gapanalysis/PAD-US/. For more information about data aggregation please review the “Data Manual for PAD-US” available at https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-manual .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The RRING Work Package 3 (WP3) objective was to clarify how Research Funding Organisations (RFOs) and Research Performing Organisations (RPOs) operated within region-specific research and innovation environments. It explored how they navigated the governance and regulatory frameworks for Responsible Research and Innovation (RRI), as well as offering their perspectives on the entities responsible for RRI-related policy and action in their locales.
This data set covers the global survey research part, which was designed to contextualise how RPOs and RFOs interacted within the research environment and with non-academic stakeholders. Countries were grouped according to the UNESCO regions of the world and key results per region are listed below. For a detailed analysis and further findings of the work completed under WP3 of the RRING project, please refer to the full deliverable document "State of the Art of RRI in the Five UNESCO World Regions" [link to be inserted].
European and North American States
Latin American and Caribbean States
Asian and Pacific States
Arab States
African States
Note: Please refer to the "RRING WP3 - Survey Data Documentation" document for detailed instructions on how to use this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the median household incomes over the past decade across various racial categories identified by the U.S. Census Bureau in State College. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. It also showcases the annual income trends, between 2013 and 2023, providing insights into the economic shifts within diverse racial communities.The dataset can be utilized to gain insights into income disparities and variations across racial categories, aiding in data analysis and decision-making..
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for State College median household income by race. You can refer the same here
NOTE: A more current version of the Protected Areas Database of the United States (PAD-US) is available: PAD-US 2.0 https://doi.org/10.5066/P955KPLE. The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public open space and voluntarily provided, private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastral Theme (http://www.fgdc.gov/ngda-reports/NGDA_Datasets.html). PAD-US is an ongoing project with several published versions of a spatial database of areas dedicated to the preservation of biological diversity, and other natural, recreational or cultural uses, managed for these purposes through legal or other effective means. The geodatabase maps and describes public open space and other protected areas. Most areas are public lands owned in fee; however, long-term easements, leases, and agreements or administrative designations documented in agency management plans may be included. The PAD-US database strives to be a complete “best available” inventory of protected areas (lands and waters) including data provided by managing agencies and organizations. The dataset is built in collaboration with several partners and data providers (http://gapanalysis.usgs.gov/padus/stewards/). See Supplemental Information Section of this metadata record for more information on partnerships and links to major partner organizations. As this dataset is a compilation of many data sets; data completeness, accuracy, and scale may vary. Federal and state data are generally complete, while local government and private protected area coverage is about 50% complete, and depends on data management capacity in the state. For completeness estimates by state: http://www.protectedlands.net/partners. As the federal and state data are reasonably complete; focus is shifting to completing the inventory of local gov and voluntarily provided, private protected areas. The PAD-US geodatabase contains over twenty-five attributes and four feature classes to support data management, queries, web mapping services and analyses: Marine Protected Areas (MPA), Fee, Easements and Combined. The data contained in the MPA Feature class are provided directly by the National Oceanic and Atmospheric Administration (NOAA) Marine Protected Areas Center (MPA, http://marineprotectedareas.noaa.gov ) tracking the National Marine Protected Areas System. The Easements feature class contains data provided directly from the National Conservation Easement Database (NCED, http://conservationeasement.us ) The MPA and Easement feature classes contain some attributes unique to the sole source databases tracking them (e.g. Easement Holder Name from NCED, Protection Level from NOAA MPA Inventory). The "Combined" feature class integrates all fee, easement and MPA features as the best available national inventory of protected areas in the standard PAD-US framework. In addition to geographic boundaries, PAD-US describes the protection mechanism category (e.g. fee, easement, designation, other), owner and managing agency, designation type, unit name, area, public access and state name in a suite of standardized fields. An informative set of references (i.e. Aggregator Source, GIS Source, GIS Source Date) and "local" or source data fields provide a transparent link between standardized PAD-US fields and information from authoritative data sources. The areas in PAD-US are also assigned conservation measures that assess management intent to permanently protect biological diversity: the nationally relevant "GAP Status Code" and global "IUCN Category" standard. A wealth of attributes facilitates a wide variety of data analyses and creates a context for data to be used at local, regional, state, national and international scales. More information about specific updates and changes to this PAD-US version can be found in the Data Quality Information section of this metadata record as well as on the PAD-US website, http://gapanalysis.usgs.gov/padus/data/history/.) Due to the completeness and complexity of these data, it is highly recommended to review the Supplemental Information Section of the metadata record as well as the Data Use Constraints, to better understand data partnerships as well as see tips and ideas of appropriate uses of the data and how to parse out the data that you are looking for. For more information regarding the PAD-US dataset please visit, http://gapanalysis.usgs.gov/padus/. To find more data resources as well as view example analysis performed using PAD-US data visit, http://gapanalysis.usgs.gov/padus/resources/. The PAD-US dataset and data standard are compiled and maintained by the USGS Gap Analysis Program, http://gapanalysis.usgs.gov/ . For more information about data standards and how the data are aggregated please review the “Standards and Methods Manual for PAD-US,” http://gapanalysis.usgs.gov/padus/data/standards/ .
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The State of Alabama contains the most diverse fish fauna of North America. The University of Alabama Ichthyological Collection (UAIC) documents this diversity and is one of the largest educational and research collections of fishes in the southeastern United States. This nationally and internationally recognized biological resource includes over one million preserved, skeletal, and frozen specimens, some dating back to the mid 1900's, and is the best single resource documenting past and present distributions and abundances of fishes in the State.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the US English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.
This visual speech dataset contains 1000 videos in US English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.
While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.
The dataset provides comprehensive metadata for each video recording and participant:
Techsalerator’s News Event Data in North America offers a comprehensive and detailed dataset designed to provide businesses, analysts, journalists, and researchers with a thorough view of significant news events across North America. This dataset captures and categorizes major events reported from a diverse range of news sources, including press releases, industry news sites, blogs, and PR platforms, providing valuable insights into regional developments, economic shifts, political changes, and cultural events.
Key Features of the Dataset: Extensive Coverage:
The dataset aggregates news events from a wide array of sources, including company press releases, industry-specific news outlets, blogs, PR sites, and traditional media. This broad coverage ensures a diverse range of information from multiple reporting channels. Categorization of Events:
News events are categorized into various types such as business and economic updates, political developments, technological advancements, legal and regulatory changes, and cultural events. This categorization helps users quickly find and analyze information relevant to their interests or sectors. Real-Time Updates:
The dataset is updated regularly to include the most current events, ensuring that users have access to up-to-date news and can stay informed about recent developments as they happen. Geographic Segmentation:
Events are tagged with their respective countries and territories within North America. This geographic segmentation allows users to filter and analyze news events based on specific locations, facilitating targeted research and analysis. Event Details:
Each event entry includes comprehensive details such as the date of occurrence, source of the news, a description of the event, and relevant keywords. This thorough detailing helps users understand the context and significance of each event. Historical Data:
The dataset includes historical news event data, enabling users to track trends and conduct comparative analysis over time. This feature supports longitudinal studies and provides insights into how news events evolve. Advanced Search and Filter Options:
Users can search and filter news events based on criteria such as date range, event type, location, and keywords. This functionality allows for precise and efficient retrieval of relevant information. North American Countries and Territories Covered: Countries: Canada Mexico United States Territories: American Samoa (U.S. territory) French Polynesia (French overseas collectivity; included for regional relevance) Guam (U.S. territory) New Caledonia (French special collectivity; included for regional relevance) Northern Mariana Islands (U.S. territory) Puerto Rico (U.S. territory) Saint Pierre and Miquelon (French overseas territory; geographically close to North America and included for regional comprehensiveness) Wallis and Futuna (French overseas collectivity; included for regional relevance) Benefits of the Dataset: Strategic Insights: Businesses and analysts can use the dataset to gain insights into significant regional developments, economic conditions, and political changes, aiding in strategic decision-making and market analysis. Market and Industry Trends: The dataset provides valuable information on industry-specific trends and events, helping users understand market dynamics and identify emerging opportunities. Media and PR Monitoring: Journalists and PR professionals can track relevant news across North America, enabling them to monitor media coverage, identify emerging stories, and manage public relations efforts effectively. Academic and Research Use: Researchers can utilize the dataset for longitudinal studies, trend analysis, and academic research on various topics related to North American news and events. Techsalerator’s News Event Data in North America is a crucial resource for accessing and analyzing significant news events across the continent. By providing detailed, categorized, and up-to-date information, it supports effective decision-making, research, and media monitoring across diverse sectors.
The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme ( https://ngda-cadastre-geoplatform.hub.arcgis.com/ ). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all open space public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, permanent and long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g., 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of U.S. public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. PAD-US provides a full inventory geodatabase, spatial analysis, statistics, data downloads, web services, poster maps, and data submissions included in efforts to track global progress toward biodiversity protection. PAD-US integrates spatial data to ensure public lands and other protected areas from all jurisdictions are represented. PAD-US version 4.0 includes new and updated data from the following data providers. All other data were transferred from previous versions of PAD-US. Federal updates - The USGS remains committed to updating federal fee owned lands data and major designation changes in regular PAD-US updates, where authoritative data provided directly by managing agencies are available or alternative data sources are recommended. Revisions associated with the federal estate in this version include updates to the Federal estate (fee ownership parcels, easement interest, management designations, and proclamation boundaries), with authoritative data from 7 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census Bureau), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), and the U.S. Forest Service (USFS). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://ngda-gov-units-geoplatform.hub.arcgis.com/pages/federal-lands-workgroup/ ). This includes improved the representation of boundaries and attributes for the National Park Service, U.S. Forest Service, Bureau of Land Management, and U.S. Fish and Wildlife Service lands, in collaboration with agency data-stewards, in response to feedback from the PAD-US Team and stakeholders. Additionally, National Cemetery boundaries were added using geospatial boundary data provided by the U.S. Department of Veterans Affairs and NASA boundaries were added using data contained in the USGS National Boundary Dataset (NBD). State Updates - USGS is committed to building capacity in the state data steward network and the PAD-US Team to increase the frequency of state land and NGO partner updates, as resources allow. State Lands Workgroup ( https://ngda-gov-units-geoplatform.hub.arcgis.com/pages/state-lands-workgroup ) is focused on improving protected land inventories in PAD-US, increase update efficiency, and facilitate local review. PAD-US 4.0 included updates and additions from the following seventeen states and territories: California (state, local, and nonprofit fee); Colorado (state, local, and nonprofit fee and easement); Georgia (state and local fee); Kentucky (state, local, and nonprofit fee and easement); Maine (state, local, and nonprofit fee and easement); Montana (state, local, and nonprofit fee); Nebraska (state fee); New Jersey (state, local, and nonprofit fee and easement); New York (state, local, and nonprofit fee and easement); North Carolina (state, local, and nonprofit fee); Pennsylvania (state, local, and nonprofit fee and easement); Puerto Rico (territory fee); Tennessee (land trust fee); Texas (state, local, and nonprofit fee); Virginia (state, local, and nonprofit fee); West Virginia (state, local, and nonprofit fee); and Wisconsin (state fee data). Additionally, the following datasets were incorporated from NGO data partners: Trust for Public Land (TPL) Parkserve (new fee and easement data); The Nature Conservancy (TNC) Lands (fee owned by TNC); TNC Northeast Secured Areas; Ducks Unlimited (land trust fee); and the National Conservation Easement Database (NCED). All state and NGO easement submissions are provided to NCED. For more information regarding the PAD-US dataset please visit, https://www.usgs.gov/programs/gap-analysis-project/science/protected-areas . For more information regarding the PAD-US dataset please visit, https://www.usgs.gov/programs/gap-analysis-project/science/protected-areas . For more information about data aggregation please review the PAD-US Data Manual available at https://www.usgs.gov/programs/gap-analysis-project/pad-us-data-manual . A version history of PAD-US updates is summarized below (See https://www.usgs.gov/programs/gap-analysis-project/pad-us-data-history/ for more information): 1) First posted - April 2009 (Version 1.0 - available from the PAD-US: Team pad-us@usgs.gov). 2) Revised - May 2010 (Version 1.1 - available from the PAD-US: Team pad-us@usgs.gov). 3) Revised - April 2011 (Version 1.2 - available from the PAD-US: Team pad-us@usgs.gov). 4) Revised - November 2012 (Version 1.3) https://doi.org/10.5066/F79Z92XD 5) Revised - May 2016 (Version 1.4) https://doi.org/10.5066/F7G73BSZ 6) Revised - September 2018 (Version 2.0) https://doi.org/10.5066/P955KPLE 7) Revised - September 2020 (Version 2.1) https://doi.org/10.5066/P92QM3NT 8) Revised - January 2022 (Version 3.0) https://doi.org/10.5066/P9Q9LQ4B 9) Revised - April 2024 (Version 4.0) https://doi.org/10.5066/P96WBCHS Comparing protected area trends between PAD-US versions is not recommended without consultation with USGS as many changes reflect improvements to agency and organization GIS systems, or conservation and recreation measure classification, rather than actual changes in protected area acquisition on the ground.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
SNAPSHOT USA is an annual, multi-contributor camera trap survey of mammals across the United States. The growing SNAPSHOT USA dataset is intended for tracking the spatial and temporal responses of mammal populations to changes in land use, land cover, and climate. These data will be useful for exploring the drivers of spatial and temporal changes in relative abundance and distribution, as well as the impacts of species interactions on daily activity patterns. SNAPSHOT USA 2019–2023 contains 987,979 records of camera trap image sequence data and 9,694 records of camera trap deployment metadata. Data were collected across the United States of America in all 50 states, 12 ecoregions, and many ecosystems. Data were collected between August 1st and December 29th each year from 2019 to 2023. The dataset includes a wide range of taxa but is primarily focused on medium to large mammals. SNAPSHOT USA 2019–2023 comprises two .csv files. The original data can be found within the SNAPSHOT USA Initiative in the Wildlife Insights platform. Methods The first three annual SNAPSHOT USA surveys were coordinated by Roland Kays, Michael Cove, and William McShea. The 2019, 2020, and 2021 datasets are accessible for public use through the Supporting Information of their respective publications. Although the 2019 and 2020 surveys were originally processed and stored in eMammal (https://www.emammal.si.edu), all data are now housed in Wildlife Insights (WI) within the SNAPSHOT USA Initiative. The two most recent surveys, 2022 and 2023, were coordinated by the SNAPSHOT USA Survey Coordinator Brigit Rooney. This dataset represents the first publication of 2022 and 2023 SNAPSHOT USA data. The SNAPSHOT USA project developed a standard protocol in 2019 to survey mammals >100 g and large identifiable birds. Cameras are unbaited and set at approximately 50 cm height across an array of at least 7 cameras with a minimum distance of 200 m and a maximum of 5 km between them. The collection period for SNAPSHOT USA data is between September and October and the target minimum of camera trap-nights per array is 400. Some contributors to SNAPSHOT USA 2019–2023 started collecting data earlier or deployed cameras later based on locations or logistics, and we chose to include data from August 1st through December 29th each year in this dataset. The first two years of SNAPSHOT USA data incorporated an Expert Review Tool to verify the accuracy of every identification, as that was built in to the eMammal repository. This tool required SNAPSHOT USA project managers (Cove and Kays in 2019, with more taxon-specific reviewers in 2020) to review and confirm all species identifications, in an effort to minimize identification errors. As eMammal automatically grouped all uploaded images into “sequences” of images taken within 60 seconds of each other, by using the image timestamps, species identifications were made for individual sequences rather than images. These data have since been transferred to WI, where they underwent opportunistic review and correction by the SNAPSHOT USA Survey Coordinator. In contrast, SNAPSHOT USA 2021, 2022, and 2023 were managed and identified entirely in WI. All SNAPSHOT USA projects in this repository were created as “Sequence” projects, to enable the identification of sequences in the same manner as eMammal. Each 60-second sequence of images was classified to the narrowest taxonomic level possible by three iterations of validation. First, WI’s Artificial Intelligence algorithm suggested a taxonomic identification. This algorithm consists of a multiclass classification deep convolutional neural network model that uses pre-trained image embedding from Inception, a model used to identify objects. Second, each array’s Principal Investigator was responsible for validating the data, fixing Artificial Intelligence identification mistakes, and approving the data they contributed to the survey. Lastly, the SNAPSHOT USA Survey Coordinator quality-checked the deployment data and as many identified sequences as possible. This was a multistep process that began with checking the sequence metadata for obvious timestamp errors by organizing them chronologically in Microsoft Excel, and the deployment metadata for location errors by mapping their coordinates and looking for outliers. Next, the coordinator checked the sequence metadata for unlikely identifications, including species detections in places outside their known range, and verified their accuracy by viewing the images in WI. Finally, identifications for the most common species were verified by using the “Species” filter on WI to look for mistakes, one species at a time. When combining the five years of SNAPSHOT USA data to create SNAPSHOT USA 2019–2023, several aspects of the data were standardized to ensure consistency across all years. These were camera array names, camera location names, and taxonomy classifications. To match protocol requirements, all camera locations less than 5 km apart were classified as one array. This resulted in combining several arrays that were originally recorded under different names and ensuring that arrays in the same place maintained the same name each year. The camera location names were standardized by ensuring that all locations with geographic coordinates that were the same to four decimal places, in Decimal Degrees notation, had the same name. However, the original coordinates were retained in the dataset. Finally, all species taxonomy classifications for the 2019 and 2020 datasets (identified in eMammal) were standardized to match those used by WI. As part of this process, all subspecies of mammals in the dataset were changed to species level (e.g., Florida black bear (Ursus americanus floridanus) became American black bear (Ursus americanus)). For mammal taxonomy classifications, WI uses a combination of the International Union for Conservation of Nature (IUCN) Red List of Threatened Species (2023; https://iucnredlist.org) and the American Society of Mammalogists Mammal Diversity Database (2024; https://www.mammaldiversity.org). For bird species, WI uses Birdlife International’s taxonomy classifications (2024; https://datazone.birdlife.org/species/search). The WI taxonomy is continually updated in response to public user suggestions and the taxonomy used in the SNAPSHOT USA 2019–2023 dataset reflects the WI taxonomy used in June 2024.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.
However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.
2 Data-set Introduction
2.1 Data Collection
We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:
The headline must have one or more words directly or indirectly related to COVID-19.
The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.
The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.
Avoid taking duplicate reports.
Maintain a time frame for the above mentioned newspapers.
To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.
2.2 Data Pre-processing and Statistics
Some pre-processing steps performed on the newspaper report dataset are as follows:
Remove hyperlinks.
Remove non-English alphanumeric characters.
Remove stop words.
Lemmatize text.
While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.
The primary data statistics of the two dataset are shown in Table 1 and 2.
Table 1: Covid-News-USA-NNK data statistics
No of words per headline
7 to 20
No of words per body content
150 to 2100
Table 2: Covid-News-BD-NNK data statistics No of words per headline
10 to 20
No of words per body content
100 to 1500
2.3 Dataset Repository
We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.
3 Literature Review
Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.
Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].
Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.
Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.
4 Our experiments and Result analysis
We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:
In February, both the news paper have talked about China and source of the outbreak.
StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.
Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.
Washington Post discussed global issues more than StarTribune.
StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.
While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.
We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases
where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the median household incomes over the past decade across various racial categories identified by the U.S. Census Bureau in State Line City. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. It also showcases the annual income trends, between 2013 and 2023, providing insights into the economic shifts within diverse racial communities.The dataset can be utilized to gain insights into income disparities and variations across racial categories, aiding in data analysis and decision-making..
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for State Line City median household income by race. You can refer the same here
The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme ( https://communities.geoplatform.gov/ngda-cadastre/ ). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all open space public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, permanent and long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g. 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of U.S. public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. The PAD-US geodatabase maps and describes areas using thirty-six attributes and five separate feature classes representing the U.S. protected areas network: Fee (ownership parcels), Designation, Easement, Marine, Proclamation and Other Planning Boundaries. An additional Combined feature class includes the full PAD-US inventory to support data management, queries, web mapping services, and analyses. The Feature Class (FeatClass) field in the Combined layer allows users to extract data types as needed. A Federal Data Reference file geodatabase lookup table (PADUS3_0Combined_Federal_Data_References) facilitates the extraction of authoritative federal data provided or recommended by managing agencies from the Combined PAD-US inventory. This PAD-US Version 3.0 dataset includes a variety of updates from the previous Version 2.1 dataset (USGS, 2020, https://doi.org/10.5066/P92QM3NT ), achieving goals to: 1) Annually update and improve spatial data representing the federal estate for PAD-US applications; 2) Update state and local lands data as state data-steward and PAD-US Team resources allow; and 3) Automate data translation efforts to increase PAD-US update efficiency. The following list summarizes the integration of "best available" spatial data to ensure public lands and other protected areas from all jurisdictions are represented in the PAD-US (other data were transferred from PAD-US 2.1). Federal updates - The USGS remains committed to updating federal fee owned lands data and major designation changes in annual PAD-US updates, where authoritative data provided directly by managing agencies are available or alternative data sources are recommended. The following is a list of updates or revisions associated with the federal estate: 1) Major update of the Federal estate (fee ownership parcels, easement interest, and management designations where available), including authoritative data from 8 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census Bureau), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), U.S. Forest Service (USFS), and National Oceanic and Atmospheric Administration (NOAA). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/ ). 2) Improved the representation (boundaries and attributes) of the National Park Service, U.S. Forest Service, Bureau of Land Management, and U.S. Fish and Wildlife Service lands, in collaboration with agency data-stewards, in response to feedback from the PAD-US Team and stakeholders. 3) Added a Federal Data Reference file geodatabase lookup table (PADUS3_0Combined_Federal_Data_References) to the PAD-US 3.0 geodatabase to facilitate the extraction (by Data Provider, Dataset Name, and/or Aggregator Source) of authoritative data provided directly (or recommended) by federal managing agencies from the full PAD-US inventory. A summary of the number of records (Frequency) and calculated GIS Acres (vs Documented Acres) associated with features provided by each Aggregator Source is included; however, the number of records may vary from source data as the "State Name" standard is applied to national files. The Feature Class (FeatClass) field in the table and geodatabase describe the data type to highlight overlapping features in the full inventory (e.g. Designation features often overlap Fee features) and to assist users in building queries for applications as needed. 4) Scripted the translation of the Department of Defense, Census Bureau, and Natural Resource Conservation Service source data into the PAD-US format to increase update efficiency. 5) Revised conservation measures (GAP Status Code, IUCN Category) to more accurately represent protected and conserved areas. For example, Fish and Wildlife Service (FWS) Waterfowl Production Area Wetland Easements changed from GAP Status Code 2 to 4 as spatial data currently represents the complete parcel (about 10.54 million acres primarily in North Dakota and South Dakota). Only aliquot parts of these parcels are documented under wetland easement (1.64 million acres). These acreages are provided by the U.S. Fish and Wildlife Service and are referenced in the PAD-US geodatabase Easement feature class 'Comments' field. State updates - The USGS is committed to building capacity in the state data-steward network and the PAD-US Team to increase the frequency of state land updates, as resources allow. The USGS supported efforts to significantly increase state inventory completeness with the integration of local parks data in the PAD-US 2.1, and developed a state-to-PAD-US data translation script during PAD-US 3.0 development to pilot in future updates. Additional efforts are in progress to support the technical and organizational strategies needed to increase the frequency of state updates. The PAD-US 3.0 included major updates to the following three states: 1) California - added or updated state, regional, local, and nonprofit lands data from the California Protected Areas Database (CPAD), managed by GreenInfo Network, and integrated conservation and recreation measure changes following review coordinated by the data-steward with state managing agencies. Developed a data translation Python script (see Process Step 2 Source Data Documentation) in collaboration with the data-steward to increase the accuracy and efficiency of future PAD-US updates from CPAD. 2) Virginia - added or updated state, local, and nonprofit protected areas data (and removed legacy data) from the Virginia Conservation Lands Database, provided by the Virginia Department of Conservation and Recreation's Natural Heritage Program, and integrated conservation and recreation measure changes following review by the data-steward. 3) West Virginia - added or updated state, local, and nonprofit protected areas data provided by the West Virginia University, GIS Technical Center. For more information regarding the PAD-US dataset please visit, https://www.usgs.gov/gapanalysis/PAD-US/. For more information about data aggregation please review the PAD-US Data Manual available at https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-manual . A version history of PAD-US updates is summarized below (See https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-history for more information): 1) First posted - April 2009 (Version 1.0 - available from the PAD-US: Team pad-us@usgs.gov). 2) Revised - May 2010 (Version 1.1 - available from the PAD-US: Team pad-us@usgs.gov). 3) Revised - April 2011 (Version 1.2 - available from the PAD-US: Team pad-us@usgs.gov). 4) Revised - November 2012 (Version 1.3) https://doi.org/10.5066/F79Z92XD 5) Revised - May 2016 (Version 1.4) https://doi.org/10.5066/F7G73BSZ 6) Revised - September 2018 (Version 2.0) https://doi.org/10.5066/P955KPLE 7) Revised - September 2020 (Version 2.1) https://doi.org/10.5066/P92QM3NT 8) Revised - January 2022 (Version 3.0) https://doi.org/10.5066/P9Q9LQ4B Comparing protected area trends between PAD-US versions is not recommended without consultation with USGS as many changes reflect improvements to agency and organization GIS systems, or conservation and recreation measure classification, rather than actual changes in protected area acquisition on the ground.
The State of Utah, including the Utah Automated Geographic Reference Center, Utah Geological Survey, and the Utah Division of Emergency Management, along with local and federal partners, including Salt Lake County and local cities, the Federal Emergency Management Agency, the U.S. Geological Survey, and the U.S. Environmental Protection Agency, have funded and collected over 8380 km2 (3236 mi2) of high-resolution (0.5 or 1 meter) Lidar data across the state since 2011, in support of a diverse set of flood mapping, geologic, transportation, infrastructure, solar energy, and vegetation projects. The datasets include point cloud, first return digital surface model (DSM), and bare-earth digital terrain/elevation model (DEM) data, along with appropriate metadata (XML, project tile indexes, and area completion reports). This 0.5-meter 2013-2014 Wasatch Front dataset includes most of the Salt Lake and Utah Valleys (Utah), and the Wasatch (Utah and Idaho), and West Valley fault zones (Utah). Other recently acquired State of Utah data include the 2011 Utah Geological Survey Lidar dataset covering Cedar and Parowan Valleys, the east shore/wetlands of Great Salt Lake, the Hurricane fault zone, the west half of Ogden Valley, North Ogden, and part of the Wasatch Plateau in Utah.
https://catalog.dvrpc.org/dvrpc_data_license.htmlhttps://catalog.dvrpc.org/dvrpc_data_license.html
This dataset contains data from the P.L. 94-171 2020 Census Redistricting Program. The 2020 Census Redistricting Data Program provides states the opportunity to delineate voting districts and to suggest census block boundaries for use in the 2020 Census redistricting data tabulations (Public Law 94-171 Redistricting Data File). In addition, the Redistricting Data Program will periodically collect state legislative and congressional district boundaries if they are changed by the states. The program is also responsible for the effective delivery of the 2020 Census P.L. 94-171 Redistricting Data statutorily required by one year from Census Day. The program ensures continued dialogue with the states in regard to 2020 Census planning, thereby allowing states ample time for their planning, response, and participation. The U.S. Census Bureau will deliver the Public Law 94-171 redistricting data to all states by Sept. 30, 2021. COVID-19-related delays and prioritizing the delivery of the apportionment results delayed the Census Bureau’s original plan to deliver the redistricting data to the states by April 1, 2021.
Data in this dataset contains information on population, diversity, race, ethnicity, housing, household, vacancy rate for 2020 for various geographies (county, MCD, Philadelphia Planning Districts (referred to as county planning areas [CPAs] internally, Census designated places, tracts, block groups, and blocks)
For more information on the 2020 Census, visit https://www.census.gov/programs-surveys/decennial-census/about/rdo/summary-files.html
PLEASE NOTE: 2020 Decennial Census data has had noise injected into it because of the Census's new Disclosure Avoidance System (DAS). This can mean that population counts and characteristics, especially when they are particularly small, may not exactly correspond to the data as collected. As such, caution should be exercised when examining areas with small counts. Ron Jarmin, acting director of the Census Bureau posted a discussion of the redistricting data, which outlines what to expect with the new DAS. For more details on accuracy you can read it here: https://www.census.gov/newsroom/blogs/director/2021/07/redistricting-data.html
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the US English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US English communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic American accents and dialects.
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
The dataset comes with granular metadata for both speakers and recordings:
Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
This dataset is a versatile resource for multiple English speech and language AI applications:
The United States Geological Survey (USGS) - Science Analytics and Synthesis (SAS) - Gap Analysis Project (GAP) manages the Protected Areas Database of the United States (PAD-US), an Arc10x geodatabase, that includes a full inventory of areas dedicated to the preservation of biological diversity and to other natural, recreation, historic, and cultural uses, managed for these purposes through legal or other effective means (www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/protected-areas). The PAD-US is developed in partnership with many organizations, including coordination groups at the [U.S.] Federal level, lead organizations for each State, and a number of national and other non-governmental organizations whose work is closely related to the PAD-US. Learn more about the USGS PAD-US partners program here: www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-data-stewards. The United Nations Environmental Program - World Conservation Monitoring Centre (UNEP-WCMC) tracks global progress toward biodiversity protection targets enacted by the Convention on Biological Diversity (CBD) through the World Database on Protected Areas (WDPA) and World Database on Other Effective Area-based Conservation Measures (WD-OECM) available at: www.protectedplanet.net. See the Aichi Target 11 dashboard (www.protectedplanet.net/en/thematic-areas/global-partnership-on-aichi-target-11) for official protection statistics recognized globally and developed for the CBD, or here for more information and statistics on the United States of America's protected areas: www.protectedplanet.net/country/USA. It is important to note statistics published by the National Oceanic and Atmospheric Administration (NOAA) Marine Protected Areas (MPA) Center (www.marineprotectedareas.noaa.gov/dataanalysis/mpainventory/) and the USGS-GAP (www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-statistics-and-reports) differ from statistics published by the UNEP-WCMC as methods to remove overlapping designations differ slightly and U.S. Territories are reported separately by the UNEP-WCMC (e.g. The largest MPA, "Pacific Remote Islands Marine Monument" is attributed to the United States Minor Outlying Islands statistics). At the time of PAD-US 2.1 publication (USGS-GAP, 2020), NOAA reported 26% of U.S. marine waters (including the Great Lakes) as protected in an MPA that meets the International Union for Conservation of Nature (IUCN) definition of biodiversity protection (www.iucn.org/theme/protected-areas/about). USGS-GAP released PAD-US 3.0 Statistics and Reports in the summer of 2022. The relationship between the USGS, the NOAA, and the UNEP-WCMC is as follows: - USGS manages and publishes the full inventory of U.S. marine and terrestrial protected areas data in the PAD-US representing many values, developed in collaboration with a partnership network in the U.S. and; - USGS is the primary source of U.S. marine and terrestrial protected areas data for the WDPA, developed from a subset of the PAD-US in collaboration with the NOAA, other agencies and non-governmental organizations in the U.S., and the UNEP-WCMC and; - UNEP-WCMC is the authoritative source of global protected area statistics from the WDPA and WD-OECM and; - NOAA is the authoritative source of MPA data in the PAD-US and MPA statistics in the U.S. and; - USGS is the authoritative source of PAD-US statistics (including areas primarily managed for biodiversity, multiple uses including natural resource extraction, and public access). The PAD-US 3.0 Combined Marine, Fee, Designation, Easement feature class (GAP Status Code 1 and 2 only) is the source of protected areas data in this WDPA update. Tribal areas and military lands represented in the PAD-US Proclamation feature class as GAP Status Code 4 (no known mandate for biodiversity protection) are not included as spatial data to represent internal protected areas are not available at this time. The USGS submitted more than 51,000 protected areas from PAD-US 3.0, including all 50 U.S. States and 6 U.S. Territories, to the UNEP-WCMC for inclusion in the WDPA, available at www.protectedplanet.net. The NOAA is the sole source of MPAs in PAD-US and the National Conservation Easement Database (NCED, www.conservationeasement.us/) is the source of conservation easements. The USGS aggregates authoritative federal lands data directly from managing agencies for PAD-US (https://ngda-gov-units-geoplatform.hub.arcgis.com/pages/federal-lands-workgroup), while a network of State data-stewards provide state, local government lands, and some land trust preserves. National nongovernmental organizations contribute spatial data directly (www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-data-stewards). The USGS translates the biodiversity focused subset of PAD-US into the WDPA schema (UNEP-WCMC, 2019) for efficient aggregation by the UNEP-WCMC. The USGS maintains WDPA Site Identifiers (WDPAID, WDPA_PID), a persistent identifier for each protected area, provided by UNEP-WCMC. Agency partners are encouraged to track WDPA Site Identifier values in source datasets to improve the efficiency and accuracy of PAD-US and WDPA updates. The IUCN protected areas in the U.S. are managed by thousands of agencies and organizations across the country and include over 51,000 designated sites such as National Parks, National Wildlife Refuges, National Monuments, Wilderness Areas, some State Parks, State Wildlife Management Areas, Local Nature Preserves, City Natural Areas, The Nature Conservancy and other Land Trust Preserves, and Conservation Easements. The boundaries of these protected places (some overlap) are represented as polygons in the PAD-US, along with informative descriptions such as Unit Name, Manager Name, and Designation Type. As the WDPA is a global dataset, their data standards (UNEP-WCMC 2019) require simplification to reduce the number of records included, focusing on the protected area site name and management authority as described in the Supplemental Information section in this metadata record. Given the numerous organizations involved, sites may be added or removed from the WDPA between PAD-US updates. These differences may reflect actual change in protected area status; however, they also reflect the dynamic nature of spatial data or Geographic Information Systems (GIS). Many agencies and non-governmental organizations are working to improve the accuracy of protected area boundaries, the consistency of attributes, and inventory completeness between PAD-US updates. In addition, USGS continually seeks partners to review and refine the assignment of conservation measures in the PAD-US.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the median household income across different racial categories in United States. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to gain insights into economic disparities and trends and explore the variations in median houshold income for diverse racial categories.
Key observations
Based on our analysis of the distribution of United States population by race & ethnicity, the population is predominantly White. This particular racial category constitutes the majority, accounting for 68.17% of the total residents in United States. Notably, the median household income for White households is $79,933. Interestingly, despite the White population being the most populous, it is worth noting that Asian households actually reports the highest median household income, with a median income of $106,954. This reveals that, while Whites may be the most numerous in United States, Asian households experience greater economic prosperity in terms of median household income.
https://i.neilsberg.com/ch/united-states-median-household-income-by-race.jpeg" alt="United States median household income diversity across racial categories">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2022 1-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for United States median household income by race. You can refer the same here