NOTE: A more current version of the Protected Areas Database of the United States (PAD-US) is available: PAD-US 3.0 https://doi.org/10.5066/P9Q9LQ4B. The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme (https://communities.geoplatform.gov/ngda-cadastre/). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g. 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. The PAD-US geodatabase maps and describes areas using over twenty-five attributes and five feature classes representing the U.S. protected areas network in separate feature classes: Fee (ownership parcels), Designation, Easement, Marine, Proclamation and Other Planning Boundaries. Five additional feature classes include various combinations of the primary layers (for example, Combined_Fee_Easement) to support data management, queries, web mapping services, and analyses. This PAD-US Version 2.1 dataset includes a variety of updates and new data from the previous Version 2.0 dataset (USGS, 2018 https://doi.org/10.5066/P955KPLE ), achieving the primary goal to "Complete the PAD-US Inventory by 2020" (https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-vision) by addressing known data gaps with newly available data. The following list summarizes the integration of "best available" spatial data to ensure public lands and other protected areas from all jurisdictions are represented in PAD-US, along with continued improvements and regular maintenance of the federal theme. Completing the PAD-US Inventory: 1) Integration of over 75,000 city parks in all 50 States (and the District of Columbia) from The Trust for Public Land's (TPL) ParkServe data development initiative (https://parkserve.tpl.org/) added nearly 2.7 million acres of protected area and significantly reduced the primary known data gap in previous PAD-US versions (local government lands). 2) First-time integration of the Census American Indian/Alaskan Native Areas (AIA) dataset (https://www2.census.gov/geo/tiger/TIGER2019/AIANNH) representing the boundaries for federally recognized American Indian reservations and off-reservation trust lands across the nation (as of January 1, 2020, as reported by the federally recognized tribal governments through the Census Bureau's Boundary and Annexation Survey) addressed another major PAD-US data gap. 3) Aggregation of nearly 5,000 protected areas owned by local land trusts in 13 states, aggregated by Ducks Unlimited through data calls for easements to update the National Conservation Easement Database (https://www.conservationeasement.us/), increased PAD-US protected areas by over 350,000 acres. Maintaining regular Federal updates: 1) Major update of the Federal estate (fee ownership parcels, easement interest, and management designations), including authoritative data from 8 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), U.S. Forest Service (USFS), National Oceanic and Atmospheric Administration (NOAA). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/); 2) Complete National Marine Protected Areas (MPA) update: from the National Oceanic and Atmospheric Administration (NOAA) MPA Inventory, including conservation measure ('GAP Status Code', 'IUCN Category') review by NOAA; Other changes: 1) PAD-US field name change - The "Public Access" field name changed from 'Access' to 'Pub_Access' to avoid unintended scripting errors associated with the script command 'access'. 2) Additional field - The "Feature Class" (FeatClass) field was added to all layers within PAD-US 2.1 (only included in the "Combined" layers of PAD-US 2.0 to describe which feature class data originated from). 3) Categorical GAP Status Code default changes - National Monuments are categorically assigned GAP Status Code = 2 (previously GAP 3), in the absence of other information, to better represent biodiversity protection restrictions associated with the designation. The Bureau of Land Management Areas of Environmental Concern (ACECs) are categorically assigned GAP Status Code = 3 (previously GAP 2) as the areas are administratively protected, not permanent. More information is available upon request. 4) Agency Name (FWS) geodatabase domain description changed to U.S. Fish and Wildlife Service (previously U.S. Fish & Wildlife Service). 5) Select areas in the provisional PAD-US 2.1 Proclamation feature class were removed following a consultation with the data-steward (Census Bureau). Tribal designated statistical areas are purely a geographic area for providing Census statistics with no land base. Most affected areas are relatively small; however, 4,341,120 acres and 37 records were removed in total. Contact Mason Croft (masoncroft@boisestate) for more information about how to identify these records. For more information regarding the PAD-US dataset please visit, https://usgs.gov/gapanalysis/PAD-US/. For more information about data aggregation please review the Online PAD-US Data Manual available at https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-manual .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This national, tract-level experienced racial segregation dataset uses data for over 66 million anonymized and opted-in devices in Cuebiq’s Spectus Clean Room data to estimate 15 minute time overlaps of device stays in 38.2m x 19.1m grids across the United States in 2022. We infer a probability distribution of racial backgrounds for each device given their home Census block groups at the time of data collection, and calculate the probability of a diverse social contact during that space and time. These measures are then aggregated to the Census tract and across the whole time period in order to preserve privacy and develop a generalizable measure of the diversity of a place. We propose that this dataset is a better measurement of the segregation and diversity as it is experienced, which we show diverges from standard measurements of segregation. The data can be used by researchers to better understand the determinants of experienced segregation; beyond research, we suggest this data can be used by policy makers to understand the impacts of policies designed to encourage social mixing and access to opportunities such as affordable housing and mixed-income housing, and more.
For the purposes of enhanced privacy, home census block groups were pre-calculated by the data provider, and all calculations are done at the Census tract, with tracts that have more than 20 unique devices over the period of analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the median household income across different racial categories in State Line City. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to gain insights into economic disparities and trends and explore the variations in median houshold income for diverse racial categories.
Key observations
Based on our analysis of the distribution of State Line City population by race & ethnicity, the population is predominantly White. This particular racial category constitutes the majority, accounting for 89.80% of the total residents in State Line City. Notably, the median household income for White households is $64,167. Interestingly, White is both the largest group and the one with the highest median household income, which stands at $64,167.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for State Line City median household income by race. You can refer the same here
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States
This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.
All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.
Fork this kernel to get started with this dataset.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names
https://cloud.google.com/bigquery/public-data/usa-names
Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by @dcp from Unplash.
What are the most common names?
What are the most common female names?
Are there more female or male names?
Female names by a wide margin?
Report on Demographic Data in New York City Public Schools, 2020-21Enrollment counts are based on the November 13 Audited Register for 2020. Categories with total enrollment values of zero were omitted. Pre-K data includes students in 3-K. Data on students with disabilities, English language learners, and student poverty status are as of March 19, 2021. Due to missing demographic information in rare cases and suppression rules, demographic categories do not always add up to total enrollment and/or citywide totals. NYC DOE "Eligible for free or reduced-price lunch” counts are based on the number of students with families who have qualified for free or reduced-price lunch or are eligible for Human Resources Administration (HRA) benefits. English Language Arts and Math state assessment results for students in grade 9 are not available for inclusion in this report, as the spring 2020 exams did not take place. Spring 2021 ELA and Math test results are not included in this report for K-8 students in 2020-21. Due to the COVID-19 pandemic’s complete transformation of New York City’s school system during the 2020-21 school year, and in accordance with New York State guidance, the 2021 ELA and Math assessments were optional for students to take. As a result, 21.6% of students in grades 3-8 took the English assessment in 2021 and 20.5% of students in grades 3-8 took the Math assessment. These participation rates are not representative of New York City students and schools and are not comparable to prior years, so results are not included in this report. Dual Language enrollment includes English Language Learners and non-English Language Learners. Dual Language data are based on data from STARS; as a result, school participation and student enrollment in Dual Language programs may differ from the data in this report. STARS course scheduling and grade management software applications provide a dynamic internal data system for school use; while standard course codes exist, data are not always consistent from school to school. This report does not include enrollment at District 75 & 79 programs. Students enrolled at Young Adult Borough Centers are represented in the 9-12 District data but not the 9-12 School data. “Prior Year” data included in Comparison tabs refers to data from 2019-20. “Year-to-Year Change” data included in Comparison tabs indicates whether the demographics of a school or special program have grown more or less similar to its district or attendance zone (or school, for special programs) since 2019-20. Year-to-year changes must have been at least 1 percentage point to qualify as “More Similar” or “Less Similar”; changes less than 1 percentage point are categorized as “No Change”. The admissions method tab contains information on the admissions methods used for elementary, middle, and high school programs during the Fall 2020 admissions process. Fall 2020 selection criteria are included for all programs with academic screens, including middle and high school programs. Selection criteria data is based on school-reported information. Fall 2020 Diversity in Admissions priorities is included for applicable middle and high school programs. Note that the data on each school’s demographics and performance includes all students of the given subgroup who were enrolled in the school on November 13, 2020. Some of these students may not have been admitted under the admissions method(s) shown, as some students may have enrolled in the school outside the centralized admissions process (via waitlist, over-the-counter, or transfer), and schools may have changed admissions methods over the past few years. Admissions methods are only reported for grades K-12. "3K and Pre-Kindergarten data are reported at the site level. See below for definitions of site types included in this report. Additionally, please note that this report excludes all students at District 75 sites, reflecting slightly lower enrollment than our total of 60,265 students
The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme ( https://ngda-cadastre-geoplatform.hub.arcgis.com/ ). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all open space public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, permanent and long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g., 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of U.S. public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. PAD-US provides a full inventory geodatabase, spatial analysis, statistics, data downloads, web services, poster maps, and data submissions included in efforts to track global progress toward biodiversity protection. PAD-US integrates spatial data to ensure public lands and other protected areas from all jurisdictions are represented. PAD-US version 4.0 includes new and updated data from the following data providers. All other data were transferred from previous versions of PAD-US. Federal updates - The USGS remains committed to updating federal fee owned lands data and major designation changes in regular PAD-US updates, where authoritative data provided directly by managing agencies are available or alternative data sources are recommended. Revisions associated with the federal estate in this version include updates to the Federal estate (fee ownership parcels, easement interest, management designations, and proclamation boundaries), with authoritative data from 7 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census Bureau), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), and the U.S. Forest Service (USFS). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://ngda-gov-units-geoplatform.hub.arcgis.com/pages/federal-lands-workgroup/ ). This includes improved the representation of boundaries and attributes for the National Park Service, U.S. Forest Service, Bureau of Land Management, and U.S. Fish and Wildlife Service lands, in collaboration with agency data-stewards, in response to feedback from the PAD-US Team and stakeholders. Additionally, National Cemetery boundaries were added using geospatial boundary data provided by the U.S. Department of Veterans Affairs and NASA boundaries were added using data contained in the USGS National Boundary Dataset (NBD). State Updates - USGS is committed to building capacity in the state data steward network and the PAD-US Team to increase the frequency of state land and NGO partner updates, as resources allow. State Lands Workgroup ( https://ngda-gov-units-geoplatform.hub.arcgis.com/pages/state-lands-workgroup ) is focused on improving protected land inventories in PAD-US, increase update efficiency, and facilitate local review. PAD-US 4.0 included updates and additions from the following seventeen states and territories: California (state, local, and nonprofit fee); Colorado (state, local, and nonprofit fee and easement); Georgia (state and local fee); Kentucky (state, local, and nonprofit fee and easement); Maine (state, local, and nonprofit fee and easement); Montana (state, local, and nonprofit fee); Nebraska (state fee); New Jersey (state, local, and nonprofit fee and easement); New York (state, local, and nonprofit fee and easement); North Carolina (state, local, and nonprofit fee); Pennsylvania (state, local, and nonprofit fee and easement); Puerto Rico (territory fee); Tennessee (land trust fee); Texas (state, local, and nonprofit fee); Virginia (state, local, and nonprofit fee); West Virginia (state, local, and nonprofit fee); and Wisconsin (state fee data). Additionally, the following datasets were incorporated from NGO data partners: Trust for Public Land (TPL) Parkserve (new fee and easement data); The Nature Conservancy (TNC) Lands (fee owned by TNC); TNC Northeast Secured Areas; Ducks Unlimited (land trust fee); and the National Conservation Easement Database (NCED). All state and NGO easement submissions are provided to NCED. For more information regarding the PAD-US dataset please visit, https://www.usgs.gov/programs/gap-analysis-project/science/protected-areas . For more information regarding the PAD-US dataset please visit, https://www.usgs.gov/programs/gap-analysis-project/science/protected-areas . For more information about data aggregation please review the PAD-US Data Manual available at https://www.usgs.gov/programs/gap-analysis-project/pad-us-data-manual . A version history of PAD-US updates is summarized below (See https://www.usgs.gov/programs/gap-analysis-project/pad-us-data-history/ for more information): 1) First posted - April 2009 (Version 1.0 - available from the PAD-US: Team pad-us@usgs.gov). 2) Revised - May 2010 (Version 1.1 - available from the PAD-US: Team pad-us@usgs.gov). 3) Revised - April 2011 (Version 1.2 - available from the PAD-US: Team pad-us@usgs.gov). 4) Revised - November 2012 (Version 1.3) https://doi.org/10.5066/F79Z92XD 5) Revised - May 2016 (Version 1.4) https://doi.org/10.5066/F7G73BSZ 6) Revised - September 2018 (Version 2.0) https://doi.org/10.5066/P955KPLE 7) Revised - September 2020 (Version 2.1) https://doi.org/10.5066/P92QM3NT 8) Revised - January 2022 (Version 3.0) https://doi.org/10.5066/P9Q9LQ4B 9) Revised - April 2024 (Version 4.0) https://doi.org/10.5066/P96WBCHS Comparing protected area trends between PAD-US versions is not recommended without consultation with USGS as many changes reflect improvements to agency and organization GIS systems, or conservation and recreation measure classification, rather than actual changes in protected area acquisition on the ground.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within United States. The dataset can be utilized to gain insights into gender-based income distribution within the United States population, aiding in data analysis and decision-making..
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income brackets:
Variables / Data Columns
Employment type classifications include:
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for United States median household income by race. You can refer the same here
The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme ( https://communities.geoplatform.gov/ngda-cadastre/ ). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all open space public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, permanent and long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g. 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of U.S. public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. The PAD-US geodatabase maps and describes areas using thirty-six attributes and five separate feature classes representing the U.S. protected areas network: Fee (ownership parcels), Designation, Easement, Marine, Proclamation and Other Planning Boundaries. An additional Combined feature class includes the full PAD-US inventory to support data management, queries, web mapping services, and analyses. The Feature Class (FeatClass) field in the Combined layer allows users to extract data types as needed. A Federal Data Reference file geodatabase lookup table (PADUS3_0Combined_Federal_Data_References) facilitates the extraction of authoritative federal data provided or recommended by managing agencies from the Combined PAD-US inventory. This PAD-US Version 3.0 dataset includes a variety of updates from the previous Version 2.1 dataset (USGS, 2020, https://doi.org/10.5066/P92QM3NT ), achieving goals to: 1) Annually update and improve spatial data representing the federal estate for PAD-US applications; 2) Update state and local lands data as state data-steward and PAD-US Team resources allow; and 3) Automate data translation efforts to increase PAD-US update efficiency. The following list summarizes the integration of "best available" spatial data to ensure public lands and other protected areas from all jurisdictions are represented in the PAD-US (other data were transferred from PAD-US 2.1). Federal updates - The USGS remains committed to updating federal fee owned lands data and major designation changes in annual PAD-US updates, where authoritative data provided directly by managing agencies are available or alternative data sources are recommended. The following is a list of updates or revisions associated with the federal estate: 1) Major update of the Federal estate (fee ownership parcels, easement interest, and management designations where available), including authoritative data from 8 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census Bureau), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), U.S. Forest Service (USFS), and National Oceanic and Atmospheric Administration (NOAA). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/ ). 2) Improved the representation (boundaries and attributes) of the National Park Service, U.S. Forest Service, Bureau of Land Management, and U.S. Fish and Wildlife Service lands, in collaboration with agency data-stewards, in response to feedback from the PAD-US Team and stakeholders. 3) Added a Federal Data Reference file geodatabase lookup table (PADUS3_0Combined_Federal_Data_References) to the PAD-US 3.0 geodatabase to facilitate the extraction (by Data Provider, Dataset Name, and/or Aggregator Source) of authoritative data provided directly (or recommended) by federal managing agencies from the full PAD-US inventory. A summary of the number of records (Frequency) and calculated GIS Acres (vs Documented Acres) associated with features provided by each Aggregator Source is included; however, the number of records may vary from source data as the "State Name" standard is applied to national files. The Feature Class (FeatClass) field in the table and geodatabase describe the data type to highlight overlapping features in the full inventory (e.g. Designation features often overlap Fee features) and to assist users in building queries for applications as needed. 4) Scripted the translation of the Department of Defense, Census Bureau, and Natural Resource Conservation Service source data into the PAD-US format to increase update efficiency. 5) Revised conservation measures (GAP Status Code, IUCN Category) to more accurately represent protected and conserved areas. For example, Fish and Wildlife Service (FWS) Waterfowl Production Area Wetland Easements changed from GAP Status Code 2 to 4 as spatial data currently represents the complete parcel (about 10.54 million acres primarily in North Dakota and South Dakota). Only aliquot parts of these parcels are documented under wetland easement (1.64 million acres). These acreages are provided by the U.S. Fish and Wildlife Service and are referenced in the PAD-US geodatabase Easement feature class 'Comments' field. State updates - The USGS is committed to building capacity in the state data-steward network and the PAD-US Team to increase the frequency of state land updates, as resources allow. The USGS supported efforts to significantly increase state inventory completeness with the integration of local parks data in the PAD-US 2.1, and developed a state-to-PAD-US data translation script during PAD-US 3.0 development to pilot in future updates. Additional efforts are in progress to support the technical and organizational strategies needed to increase the frequency of state updates. The PAD-US 3.0 included major updates to the following three states: 1) California - added or updated state, regional, local, and nonprofit lands data from the California Protected Areas Database (CPAD), managed by GreenInfo Network, and integrated conservation and recreation measure changes following review coordinated by the data-steward with state managing agencies. Developed a data translation Python script (see Process Step 2 Source Data Documentation) in collaboration with the data-steward to increase the accuracy and efficiency of future PAD-US updates from CPAD. 2) Virginia - added or updated state, local, and nonprofit protected areas data (and removed legacy data) from the Virginia Conservation Lands Database, provided by the Virginia Department of Conservation and Recreation's Natural Heritage Program, and integrated conservation and recreation measure changes following review by the data-steward. 3) West Virginia - added or updated state, local, and nonprofit protected areas data provided by the West Virginia University, GIS Technical Center. For more information regarding the PAD-US dataset please visit, https://www.usgs.gov/gapanalysis/PAD-US/. For more information about data aggregation please review the PAD-US Data Manual available at https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-manual . A version history of PAD-US updates is summarized below (See https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-history for more information): 1) First posted - April 2009 (Version 1.0 - available from the PAD-US: Team pad-us@usgs.gov). 2) Revised - May 2010 (Version 1.1 - available from the PAD-US: Team pad-us@usgs.gov). 3) Revised - April 2011 (Version 1.2 - available from the PAD-US: Team pad-us@usgs.gov). 4) Revised - November 2012 (Version 1.3) https://doi.org/10.5066/F79Z92XD 5) Revised - May 2016 (Version 1.4) https://doi.org/10.5066/F7G73BSZ 6) Revised - September 2018 (Version 2.0) https://doi.org/10.5066/P955KPLE 7) Revised - September 2020 (Version 2.1) https://doi.org/10.5066/P92QM3NT 8) Revised - January 2022 (Version 3.0) https://doi.org/10.5066/P9Q9LQ4B Comparing protected area trends between PAD-US versions is not recommended without consultation with USGS as many changes reflect improvements to agency and organization GIS systems, or conservation and recreation measure classification, rather than actual changes in protected area acquisition on the ground.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the median household income across different racial categories in State College. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to gain insights into economic disparities and trends and explore the variations in median houshold income for diverse racial categories.
Key observations
Based on our analysis of the distribution of State College population by race & ethnicity, the population is predominantly White. This particular racial category constitutes the majority, accounting for 80.12% of the total residents in State College. Notably, the median household income for White households is $50,296. Interestingly, despite the White population being the most populous, it is worth noting that Some Other Race households actually reports the highest median household income, with a median income of $60,333. This reveals that, while Whites may be the most numerous in State College, Some Other Race households experience greater economic prosperity in terms of median household income.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for State College median household income by race. You can refer the same here
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
This dataset contains information about the demographics of all US cities and census-designated places with a population greater or equal to 65,000. This data comes from the US Census Bureau's 2015 American Community Survey. This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Composed of hundreds of microbial species, the composition of the human gut microbiota can vary with chronic diseases underlying health disparities that disproportionally affect ethnic minorities. However, the influence of ethnicity on the gut microbiota remains largely unexplored and lacks reproducible generalizations across studies. By distilling associations between ethnicity and differences in two US-based 16S gut microbiota data sets including 1,673 individuals, we report 12 microbial genera and families that reproducibly vary by ethnicity. Interestingly, a majority of these microbial taxa, including the most heritable bacterial family, Christensenellaceae, overlap with genetically associated taxa and form co-occurring clusters linked by similar fermentative and methanogenic metabolic processes. These results demonstrate recurrent associations between specific taxa in the gut microbiota and ethnicity, providing hypotheses for examining specific members of the gut microbiota as mediators of health disparities.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The RRING Work Package 3 (WP3) objective was to clarify how Research Funding Organisations (RFOs) and Research Performing Organisations (RPOs) operated within region-specific research and innovation environments. It explored how they navigated the governance and regulatory frameworks for Responsible Research and Innovation (RRI), as well as offering their perspectives on the entities responsible for RRI-related policy and action in their locales.
This data set covers the global survey research part, which was designed to contextualise how RPOs and RFOs interacted within the research environment and with non-academic stakeholders. Countries were grouped according to the UNESCO regions of the world and key results per region are listed below. For a detailed analysis and further findings of the work completed under WP3 of the RRING project, please refer to the full deliverable document "State of the Art of RRI in the Five UNESCO World Regions" [link to be inserted].
European and North American States
‘Diverse and inclusive': Respondents were most attitudinally supportive of the importance of ensuring ethical principles were applied in R&I (92%), followed by diverse perspectives (88%), and gender equality (79%). Including ethnic minorities was the area which garnered the least attitudinal support (71%). Respondents took the most practical steps towards engaging with diverse perspectives (63%), and the least towards inclusion of ethnic minorities (24%).
‘Anticipative and reflective’: Respondents widely agreed (82%) with the importance of ensuring R&I work does not cause concerns for society, but only 37% confirmed they had taken practical steps to ensure this.
‘Open and transparent’: Vast majorities of respondents agreed on the importance of keeping R&I methods open and transparent (94%), with 65% also confirming they take practical steps to do this. An equally high number agreed on the importance of making the results of R&I work accessible to as wide a public as possible (94%), and 68% confirmed this through their reported actions. This indicated the smallest value-action gap of all RRI measures for respondents from European and North American countries. Attitudinal agreement on the importance of making data freely available to the public was lower (83%), as was the practical action aspect for this measure (45%).
‘Responsive and adaptive to change’: Most respondents agreed (89%) that it was important to ensure their work addresses societal needs, and 62% confirmed that they take practical steps towards this aim.
Latin American and Caribbean States
‘Diverse and inclusive': Respondents were most attitudinally supportive of the importance of gender equality in R&I (86%), followed by ensuring ethical principles are applied (85%), and diverse perspectives incorporated (83%). Including ethnic minorities was the area which garnered the least attitudinal support (77%). Respondents took the most practical steps towards ensuring ethical principles guide their work (50%), and the least towards including ethnic minorities (25%), but the smallest value action gap was found for gender equality.
‘Anticipative and reflective’: Respondents agreed (79%) that it is important to ensure R&I work does not cause concerns for society, but only 29% confirmed they had taken practical steps to ensure this.
‘Open and transparent’: The majority of respondents agreed on the importance of keeping R&I methods open and transparent (89%), with 45% indicating they had taken practical action. A majority also agreed on the importance of making the results of R&I work accessible to as wide a public as possible (88%), and 44% backed this up with practical action. Attitudinal agreement on the importance of making data freely available to the public was slightly lower (81%), as was the practical action aspect for this measure (35%).
‘Responsive and adaptive to change’: Most respondents agreed (84%) that it was important to ensure their work addresses societal needs, and 49% confirmed that they take practical steps towards this aim.
Asian and Pacific States
‘Diverse and inclusive': Respondents were most attitudinally supportive of the importance of ensuring ethical principles were applied in R&I (90%), followed by diverse perspectives (89%), and gender equality (86%). Including ethnic minorities was the area which garnered the least attitudinal support (76%). Respondents took the most practical steps towards engaging with diverse perspectives (65%), and the least towards including ethnic minorities (30%).
‘Anticipative and reflective’: Respondents widely agreed (78%) with the importance of ensuring R&I work does not cause concerns for society, and 42% confirmed they had taken practical steps to ensure this.
‘Open and transparent’: The majority of respondents agreed on the importance of keeping R&I methods open and transparent (91%), with 58% indicating they take practical steps to do this. A majority also agreed on the importance of making the results of R&I work accessible to as wide a public as possible (89%), and 64% backed this up with practical action. Attitudinal agreement on the importance of making data freely available to the public was lower (79%), as was the practical action aspect for this measure (40%).
‘Responsive and adaptive to change’: Most respondents agreed (92%) that it was important to ensure their work addresses societal needs, and 69% confirmed that they take practical steps towards this aim. This was the RRI measure with the smallest valueaction gap for respondents from the Asian and Pacific region.
Arab States
‘Diverse and inclusive': Respondents were most attitudinally supportive of the importance of ensuring ethical principles were applied in R&I (93%), followed by diverse perspectives (81%), and gender equality (85%). Including ethnic minorities was the area which garnered the least attitudinal support (74%). Respondents took the most practical steps towards engaging with diverse perspectives (66%), which equated to one of two equally small value-action gaps for respondents from Arab states, and the least practical steps towards inclusion of ethnic minorities (22%).
‘Anticipative and reflective’: A high proportion of respondents (85%) agreed that it is important to ensure R&I work does not cause concerns for society. However, only 38% confirmed they had taken practical steps to ensure this.
‘Open and transparent’: The majority of respondents agreed on the importance of keeping R&I methods open and transparent (89%), with 59% also confirming they take practical steps to do this. A majority also agreed on the importance of making the results of R&I work accessible to as wide a public as possible (90%), and 66% backed this up with practical action. Ensuring public accessibility of research results was the second of two measures with equally small value-action gaps. Attitudinal agreement on the importance of making data freely available to the public was much lower (78%), which also reflected the practical action aspect for this measure (49%).
‘Responsive and adaptive to change’: Most respondents agreed (96%) that it was important to ensure their work addresses societal needs, and 68% confirmed that they take practical steps to achieve this.
African States
‘Diverse and inclusive': Respondents were most attitudinally supportive of the importance of ensuring engagement with diverse perspectives and expertise in R&I (91%), followed by ensuring ethical principles are applied (90%), and gender equality (89%). Including ethnic minorities was the area which garnered the least attitudinal support (74%). Respondents took the most practical steps towards ensuring ethical principles guide their work (57%), and the least towards including ethnic minorities (32%).
‘Anticipative and reflective’: The majority of respondents (85%) agreed that it is important to ensure R&I work does not cause concerns for society, with 59% confirming that they take practical steps to ensure this.
‘Open and transparent’: A high proportion of respondents agreed on the importance of keeping R&I methods open and transparent (90%), with 54% also confirming they take practical steps to do this. A majority also agreed on the importance of making the results of R&I work accessible to as wide a public as possible (86%), and 56% backed this up with practical action. Attitudinal agreement on the importance of making data freely available to the public was significantly lower (73%), as was the practical action aspect for this measure (38%).
‘Responsive and adaptive to change’: Respondents mostly agreed (92%) that it was important to ensure their work addresses societal needs, and 64% confirmed that they take practical steps towards this aim. This was the RRI measure with the smallest valueaction gap for respondents from African states.
Note: Please refer to the "RRING WP3 - Survey Data Documentation" document for detailed instructions on how to use this dataset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The State of Alabama contains the most diverse fish fauna of North America. The University of Alabama Ichthyological Collection (UAIC) documents this diversity and is one of the largest educational and research collections of fishes in the southeastern United States. This nationally and internationally recognized biological resource includes over one million preserved, skeletal, and frozen specimens, some dating back to the mid 1900's, and is the best single resource documenting past and present distributions and abundances of fishes in the State.
NOTE: A more current version of the Protected Areas Database of the United States (PAD-US) is available: PAD-US 2.0 https://doi.org/10.5066/P955KPLE. The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public open space and voluntarily provided, private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastral Theme (http://www.fgdc.gov/ngda-reports/NGDA_Datasets.html). PAD-US is an ongoing project with several published versions of a spatial database of areas dedicated to the preservation of biological diversity, and other natural, recreational or cultural uses, managed for these purposes through legal or other effective means. The geodatabase maps and describes public open space and other protected areas. Most areas are public lands owned in fee; however, long-term easements, leases, and agreements or administrative designations documented in agency management plans may be included. The PAD-US database strives to be a complete “best available” inventory of protected areas (lands and waters) including data provided by managing agencies and organizations. The dataset is built in collaboration with several partners and data providers (http://gapanalysis.usgs.gov/padus/stewards/). See Supplemental Information Section of this metadata record for more information on partnerships and links to major partner organizations. As this dataset is a compilation of many data sets; data completeness, accuracy, and scale may vary. Federal and state data are generally complete, while local government and private protected area coverage is about 50% complete, and depends on data management capacity in the state. For completeness estimates by state: http://www.protectedlands.net/partners. As the federal and state data are reasonably complete; focus is shifting to completing the inventory of local gov and voluntarily provided, private protected areas. The PAD-US geodatabase contains over twenty-five attributes and four feature classes to support data management, queries, web mapping services and analyses: Marine Protected Areas (MPA), Fee, Easements and Combined. The data contained in the MPA Feature class are provided directly by the National Oceanic and Atmospheric Administration (NOAA) Marine Protected Areas Center (MPA, http://marineprotectedareas.noaa.gov ) tracking the National Marine Protected Areas System. The Easements feature class contains data provided directly from the National Conservation Easement Database (NCED, http://conservationeasement.us ) The MPA and Easement feature classes contain some attributes unique to the sole source databases tracking them (e.g. Easement Holder Name from NCED, Protection Level from NOAA MPA Inventory). The "Combined" feature class integrates all fee, easement and MPA features as the best available national inventory of protected areas in the standard PAD-US framework. In addition to geographic boundaries, PAD-US describes the protection mechanism category (e.g. fee, easement, designation, other), owner and managing agency, designation type, unit name, area, public access and state name in a suite of standardized fields. An informative set of references (i.e. Aggregator Source, GIS Source, GIS Source Date) and "local" or source data fields provide a transparent link between standardized PAD-US fields and information from authoritative data sources. The areas in PAD-US are also assigned conservation measures that assess management intent to permanently protect biological diversity: the nationally relevant "GAP Status Code" and global "IUCN Category" standard. A wealth of attributes facilitates a wide variety of data analyses and creates a context for data to be used at local, regional, state, national and international scales. More information about specific updates and changes to this PAD-US version can be found in the Data Quality Information section of this metadata record as well as on the PAD-US website, http://gapanalysis.usgs.gov/padus/data/history/.) Due to the completeness and complexity of these data, it is highly recommended to review the Supplemental Information Section of the metadata record as well as the Data Use Constraints, to better understand data partnerships as well as see tips and ideas of appropriate uses of the data and how to parse out the data that you are looking for. For more information regarding the PAD-US dataset please visit, http://gapanalysis.usgs.gov/padus/. To find more data resources as well as view example analysis performed using PAD-US data visit, http://gapanalysis.usgs.gov/padus/resources/. The PAD-US dataset and data standard are compiled and maintained by the USGS Gap Analysis Program, http://gapanalysis.usgs.gov/ . For more information about data standards and how the data are aggregated please review the “Standards and Methods Manual for PAD-US,” http://gapanalysis.usgs.gov/padus/data/standards/ .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.
However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.
2 Data-set Introduction
2.1 Data Collection
We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:
The headline must have one or more words directly or indirectly related to COVID-19.
The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.
The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.
Avoid taking duplicate reports.
Maintain a time frame for the above mentioned newspapers.
To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.
2.2 Data Pre-processing and Statistics
Some pre-processing steps performed on the newspaper report dataset are as follows:
Remove hyperlinks.
Remove non-English alphanumeric characters.
Remove stop words.
Lemmatize text.
While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.
The primary data statistics of the two dataset are shown in Table 1 and 2.
Table 1: Covid-News-USA-NNK data statistics
No of words per headline
7 to 20
No of words per body content
150 to 2100
Table 2: Covid-News-BD-NNK data statistics No of words per headline
10 to 20
No of words per body content
100 to 1500
2.3 Dataset Repository
We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.
3 Literature Review
Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.
Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].
Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.
Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.
4 Our experiments and Result analysis
We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:
In February, both the news paper have talked about China and source of the outbreak.
StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.
Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.
Washington Post discussed global issues more than StarTribune.
StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.
While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.
We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases
where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the US English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.
This visual speech dataset contains 1000 videos in US English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.
While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.
The dataset provides comprehensive metadata for each video recording and participant:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The National Elevation Dataset (NED) is a primary elevation data product that has been produced and distributed by the U.S. Geological Survey (USGS). Since its inception, the USGS has compiled and published topographic information in many forms, and the NED is a significant development in this long line of products that describe the land surface. The NED provides seamless raster elevation data of the conterminous United States (CONUS), Alaska, Hawaii, U.S. island territories, Mexico, and Canada. The NED is derived from diverse source datasets that are processed to a specification with consistent resolutions, coordinate system, elevation units, and horizontal and vertical datums. The NED was developed as the logical result of the maturation of the long-standing USGS elevation program, which for many years concentrated on production of quadrangle-based digital elevation models (DEM). The NED contributes to the elevation layer of The National Map, and it provides basic elevation information for earth science studies and mapping applications in the U.S. and most of North America.For over 15 years (1999–2014), the NED served as the flagship elevation product of the USGS. In 2015, the 3D Elevation Program (3DEP) was initiated. When the 3DEP initiative became operational, the name “National Elevation Dataset” (and the abbreviation “NED”) were retired as the USGS elevation activities and data were rebranded under the 3DEP banner. However, elevation data produced and distributed as part of the NED are still widely used (and distributed by other entities), so there is a continuing need for detailed documentation, including how it was produced, its accuracy, and how it is used.
The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public open space and voluntarily provided, private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastral Theme (http://www.fgdc.gov/ngda-reports/NGDA_Datasets.html). PAD-US is an ongoing project with several published versions of a spatial database of areas dedicated to the preservation of biological diversity, and other natural, recreational or cultural uses, managed for these purposes through legal or other effective means. The geodatabase maps and describes public open space and other protected areas. Most areas are public lands owned in fee; however, long-term easements, leases, and agreements or administrative designations documented in agency management plans may be included. The PAD-US database strives to be a complete “best available” inventory of protected areas (lands and waters) including data provided by managing agencies and organizations. The dataset is built in collaboration with several partners and data providers (http://gapanalysis.usgs.gov/padus/stewards/). See Supplemental Information Section of this metadata record for more information on partnerships and links to major partner organizations. As this dataset is a compilation of many data sets; data completeness, accuracy, and scale may vary. Federal and state data are generally complete, while local government and private protected area coverage is about 50% complete, and depends on data management capacity in the state. For completeness estimates by state: http://www.protectedlands.net/partners. As the federal and state data are reasonably complete; focus is shifting to completing the inventory of local gov and voluntarily provided, private protected areas. The PAD-US geodatabase contains over twenty-five attributes and four feature classes to support data management, queries, web mapping services and analyses: Marine Protected Areas (MPA), Fee, Easements and Combined. The data contained in the MPA Feature class are provided directly by the National Oceanic and Atmospheric Administration (NOAA) Marine Protected Areas Center (MPA, http://marineprotectedareas.noaa.gov ) tracking the National Marine Protected Areas System. The Easements feature class contains data provided directly from the National Conservation Easement Database (NCED, http://conservationeasement.us ) The MPA and Easement feature classes contain some attributes unique to the sole source databases tracking them (e.g. Easement Holder Name from NCED, Protection Level from NOAA MPA Inventory). The "Combined" feature class integrates all fee, easement and MPA features as the best available national inventory of protected areas in the standard PAD-US framework. In addition to geographic boundaries, PAD-US describes the protection mechanism category (e.g. fee, easement, designation, other), owner and managing agency, designation type, unit name, area, public access and state name in a suite of standardized fields. An informative set of references (i.e. Aggregator Source, GIS Source, GIS Source Date) and "local" or source data fields provide a transparent link between standardized PAD-US fields and information from authoritative data sources. The areas in PAD-US are also assigned conservation measures that assess management intent to permanently protect biological diversity: the nationally relevant "GAP Status Code" and global "IUCN Category" standard. A wealth of attributes facilitates a wide variety of data analyses and creates a context for data to be used at local, regional, state, national and international scales. More information about specific updates and changes to this PAD-US version can be found in the Data Quality Information section of this metadata record as well as on the PAD-US website, http://gapanalysis.usgs.gov/padus/data/history/.) Due to the completeness and complexity of these data, it is highly recommended to review the Supplemental Information Section of the metadata record as well as the Data Use Constraints, to better understand data partnerships as well as see tips and ideas of appropriate uses of the data and how to parse out the data that you are looking for. For more information regarding the PAD-US dataset please visit, http://gapanalysis.usgs.gov/padus/. To find more data resources as well as view example analysis performed using PAD-US data visit, http://gapanalysis.usgs.gov/padus/resources/. The PAD-US dataset and data standard are compiled and maintained by the USGS Gap Analysis Program, http://gapanalysis.usgs.gov/ . For more information about data standards and how the data are aggregated please review the “Standards and Methods Manual for PAD-US,” http://gapanalysis.usgs.gov/padus/data/standards/ .
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This submission includes publicly available data extracted in its original form. Please reference the Related Publication listed here for source and citation information: https://catalog.data.gov/dataset/smart-location-database7 If you have questions about the underlying data stored here, please contact Thomas John (thomas.john@epa.gov). If you have questions or recommendations related to this metadata entry and extracted data, please contact the CAFE Data Management team at: climatecafe@bu.edu. "The Smart Location Database is a nationwide geographic data resource for measuring location efficiency. It includes more than 90 attributes summarizing characteristics, such as housing density, diversity of land use, neighborhood design, destination accessibility, transit service, employment and demographics. Most attributes are available for every census block group in the United States. A large body of research has demonstrated that land use and urban form can have a significant effect on transportation outcomes. People who live and/or work in compact neighborhoods with a walkable street grid and easy access to public transit, jobs, stores, and services are more likely to have several transportation options to meet their everyday needs. As a result, they can choose to drive less, which reduces their emissions of greenhouse gases and other pollutants compared to people who live and work in places that are not location efficient. Walking, biking, and taking public transit can also save people money and improve their health by encouraging physical activity. The Smart Location Database summarizes several demographic, employment, and built environment variables for every census block group (CBG) in the United States. The database includes indicators of the commonly cited “D” variables shown in the transportation research literature to be related to travel behavior. The Ds include residential and employment density, land use diversity, design of the built environment, access to destinations, and distance to transit. SLD variables can be used as inputs to travel demand models, baseline data for scenario planning studies, and combined into composite indicators characterizing the relative location efficiency of CBG within U.S. metropolitan regions. EPA first released a beta version of the Smart Location Database in 2011. The initial full version was released in 2013, and the database was updated to its current version in 2021." Quote from https://www.epa.gov/smartgrowth/smart-location-mapping and https://catalog.data.gov/dataset/smart-location-database7
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
SNAPSHOT USA is an annual, multi-contributor camera trap survey of mammals across the United States. The growing SNAPSHOT USA dataset is intended for tracking the spatial and temporal responses of mammal populations to changes in land use, land cover, and climate. These data will be useful for exploring the drivers of spatial and temporal changes in relative abundance and distribution, as well as the impacts of species interactions on daily activity patterns. SNAPSHOT USA 2019–2023 contains 987,979 records of camera trap image sequence data and 9,694 records of camera trap deployment metadata. Data were collected across the United States of America in all 50 states, 12 ecoregions, and many ecosystems. Data were collected between August 1st and December 29th each year from 2019 to 2023. The dataset includes a wide range of taxa but is primarily focused on medium to large mammals. SNAPSHOT USA 2019–2023 comprises two .csv files. The original data can be found within the SNAPSHOT USA Initiative in the Wildlife Insights platform. Methods The first three annual SNAPSHOT USA surveys were coordinated by Roland Kays, Michael Cove, and William McShea. The 2019, 2020, and 2021 datasets are accessible for public use through the Supporting Information of their respective publications. Although the 2019 and 2020 surveys were originally processed and stored in eMammal (https://www.emammal.si.edu), all data are now housed in Wildlife Insights (WI) within the SNAPSHOT USA Initiative. The two most recent surveys, 2022 and 2023, were coordinated by the SNAPSHOT USA Survey Coordinator Brigit Rooney. This dataset represents the first publication of 2022 and 2023 SNAPSHOT USA data. The SNAPSHOT USA project developed a standard protocol in 2019 to survey mammals >100 g and large identifiable birds. Cameras are unbaited and set at approximately 50 cm height across an array of at least 7 cameras with a minimum distance of 200 m and a maximum of 5 km between them. The collection period for SNAPSHOT USA data is between September and October and the target minimum of camera trap-nights per array is 400. Some contributors to SNAPSHOT USA 2019–2023 started collecting data earlier or deployed cameras later based on locations or logistics, and we chose to include data from August 1st through December 29th each year in this dataset. The first two years of SNAPSHOT USA data incorporated an Expert Review Tool to verify the accuracy of every identification, as that was built in to the eMammal repository. This tool required SNAPSHOT USA project managers (Cove and Kays in 2019, with more taxon-specific reviewers in 2020) to review and confirm all species identifications, in an effort to minimize identification errors. As eMammal automatically grouped all uploaded images into “sequences” of images taken within 60 seconds of each other, by using the image timestamps, species identifications were made for individual sequences rather than images. These data have since been transferred to WI, where they underwent opportunistic review and correction by the SNAPSHOT USA Survey Coordinator. In contrast, SNAPSHOT USA 2021, 2022, and 2023 were managed and identified entirely in WI. All SNAPSHOT USA projects in this repository were created as “Sequence” projects, to enable the identification of sequences in the same manner as eMammal. Each 60-second sequence of images was classified to the narrowest taxonomic level possible by three iterations of validation. First, WI’s Artificial Intelligence algorithm suggested a taxonomic identification. This algorithm consists of a multiclass classification deep convolutional neural network model that uses pre-trained image embedding from Inception, a model used to identify objects. Second, each array’s Principal Investigator was responsible for validating the data, fixing Artificial Intelligence identification mistakes, and approving the data they contributed to the survey. Lastly, the SNAPSHOT USA Survey Coordinator quality-checked the deployment data and as many identified sequences as possible. This was a multistep process that began with checking the sequence metadata for obvious timestamp errors by organizing them chronologically in Microsoft Excel, and the deployment metadata for location errors by mapping their coordinates and looking for outliers. Next, the coordinator checked the sequence metadata for unlikely identifications, including species detections in places outside their known range, and verified their accuracy by viewing the images in WI. Finally, identifications for the most common species were verified by using the “Species” filter on WI to look for mistakes, one species at a time. When combining the five years of SNAPSHOT USA data to create SNAPSHOT USA 2019–2023, several aspects of the data were standardized to ensure consistency across all years. These were camera array names, camera location names, and taxonomy classifications. To match protocol requirements, all camera locations less than 5 km apart were classified as one array. This resulted in combining several arrays that were originally recorded under different names and ensuring that arrays in the same place maintained the same name each year. The camera location names were standardized by ensuring that all locations with geographic coordinates that were the same to four decimal places, in Decimal Degrees notation, had the same name. However, the original coordinates were retained in the dataset. Finally, all species taxonomy classifications for the 2019 and 2020 datasets (identified in eMammal) were standardized to match those used by WI. As part of this process, all subspecies of mammals in the dataset were changed to species level (e.g., Florida black bear (Ursus americanus floridanus) became American black bear (Ursus americanus)). For mammal taxonomy classifications, WI uses a combination of the International Union for Conservation of Nature (IUCN) Red List of Threatened Species (2023; https://iucnredlist.org) and the American Society of Mammalogists Mammal Diversity Database (2024; https://www.mammaldiversity.org). For bird species, WI uses Birdlife International’s taxonomy classifications (2024; https://datazone.birdlife.org/species/search). The WI taxonomy is continually updated in response to public user suggestions and the taxonomy used in the SNAPSHOT USA 2019–2023 dataset reflects the WI taxonomy used in June 2024.
NOTE: A more current version of the Protected Areas Database of the United States (PAD-US) is available: PAD-US 3.0 https://doi.org/10.5066/P9Q9LQ4B. The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme (https://communities.geoplatform.gov/ngda-cadastre/). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g. 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. The PAD-US geodatabase maps and describes areas using over twenty-five attributes and five feature classes representing the U.S. protected areas network in separate feature classes: Fee (ownership parcels), Designation, Easement, Marine, Proclamation and Other Planning Boundaries. Five additional feature classes include various combinations of the primary layers (for example, Combined_Fee_Easement) to support data management, queries, web mapping services, and analyses. This PAD-US Version 2.1 dataset includes a variety of updates and new data from the previous Version 2.0 dataset (USGS, 2018 https://doi.org/10.5066/P955KPLE ), achieving the primary goal to "Complete the PAD-US Inventory by 2020" (https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-vision) by addressing known data gaps with newly available data. The following list summarizes the integration of "best available" spatial data to ensure public lands and other protected areas from all jurisdictions are represented in PAD-US, along with continued improvements and regular maintenance of the federal theme. Completing the PAD-US Inventory: 1) Integration of over 75,000 city parks in all 50 States (and the District of Columbia) from The Trust for Public Land's (TPL) ParkServe data development initiative (https://parkserve.tpl.org/) added nearly 2.7 million acres of protected area and significantly reduced the primary known data gap in previous PAD-US versions (local government lands). 2) First-time integration of the Census American Indian/Alaskan Native Areas (AIA) dataset (https://www2.census.gov/geo/tiger/TIGER2019/AIANNH) representing the boundaries for federally recognized American Indian reservations and off-reservation trust lands across the nation (as of January 1, 2020, as reported by the federally recognized tribal governments through the Census Bureau's Boundary and Annexation Survey) addressed another major PAD-US data gap. 3) Aggregation of nearly 5,000 protected areas owned by local land trusts in 13 states, aggregated by Ducks Unlimited through data calls for easements to update the National Conservation Easement Database (https://www.conservationeasement.us/), increased PAD-US protected areas by over 350,000 acres. Maintaining regular Federal updates: 1) Major update of the Federal estate (fee ownership parcels, easement interest, and management designations), including authoritative data from 8 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), U.S. Forest Service (USFS), National Oceanic and Atmospheric Administration (NOAA). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/); 2) Complete National Marine Protected Areas (MPA) update: from the National Oceanic and Atmospheric Administration (NOAA) MPA Inventory, including conservation measure ('GAP Status Code', 'IUCN Category') review by NOAA; Other changes: 1) PAD-US field name change - The "Public Access" field name changed from 'Access' to 'Pub_Access' to avoid unintended scripting errors associated with the script command 'access'. 2) Additional field - The "Feature Class" (FeatClass) field was added to all layers within PAD-US 2.1 (only included in the "Combined" layers of PAD-US 2.0 to describe which feature class data originated from). 3) Categorical GAP Status Code default changes - National Monuments are categorically assigned GAP Status Code = 2 (previously GAP 3), in the absence of other information, to better represent biodiversity protection restrictions associated with the designation. The Bureau of Land Management Areas of Environmental Concern (ACECs) are categorically assigned GAP Status Code = 3 (previously GAP 2) as the areas are administratively protected, not permanent. More information is available upon request. 4) Agency Name (FWS) geodatabase domain description changed to U.S. Fish and Wildlife Service (previously U.S. Fish & Wildlife Service). 5) Select areas in the provisional PAD-US 2.1 Proclamation feature class were removed following a consultation with the data-steward (Census Bureau). Tribal designated statistical areas are purely a geographic area for providing Census statistics with no land base. Most affected areas are relatively small; however, 4,341,120 acres and 37 records were removed in total. Contact Mason Croft (masoncroft@boisestate) for more information about how to identify these records. For more information regarding the PAD-US dataset please visit, https://usgs.gov/gapanalysis/PAD-US/. For more information about data aggregation please review the Online PAD-US Data Manual available at https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-manual .