96 datasets found

d
Protected Areas Database of the United States (PAD-US) 2.1
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 2.1 [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-2-1
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
NOTE: A more current version of the Protected Areas Database of the United States (PAD-US) is available: PAD-US 3.0 https://doi.org/10.5066/P9Q9LQ4B. The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme (https://communities.geoplatform.gov/ngda-cadastre/). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g. 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. The PAD-US geodatabase maps and describes areas using over twenty-five attributes and five feature classes representing the U.S. protected areas network in separate feature classes: Fee (ownership parcels), Designation, Easement, Marine, Proclamation and Other Planning Boundaries. Five additional feature classes include various combinations of the primary layers (for example, Combined_Fee_Easement) to support data management, queries, web mapping services, and analyses. This PAD-US Version 2.1 dataset includes a variety of updates and new data from the previous Version 2.0 dataset (USGS, 2018 https://doi.org/10.5066/P955KPLE ), achieving the primary goal to "Complete the PAD-US Inventory by 2020" (https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-vision) by addressing known data gaps with newly available data. The following list summarizes the integration of "best available" spatial data to ensure public lands and other protected areas from all jurisdictions are represented in PAD-US, along with continued improvements and regular maintenance of the federal theme. Completing the PAD-US Inventory: 1) Integration of over 75,000 city parks in all 50 States (and the District of Columbia) from The Trust for Public Land's (TPL) ParkServe data development initiative (https://parkserve.tpl.org/) added nearly 2.7 million acres of protected area and significantly reduced the primary known data gap in previous PAD-US versions (local government lands). 2) First-time integration of the Census American Indian/Alaskan Native Areas (AIA) dataset (https://www2.census.gov/geo/tiger/TIGER2019/AIANNH) representing the boundaries for federally recognized American Indian reservations and off-reservation trust lands across the nation (as of January 1, 2020, as reported by the federally recognized tribal governments through the Census Bureau's Boundary and Annexation Survey) addressed another major PAD-US data gap. 3) Aggregation of nearly 5,000 protected areas owned by local land trusts in 13 states, aggregated by Ducks Unlimited through data calls for easements to update the National Conservation Easement Database (https://www.conservationeasement.us/), increased PAD-US protected areas by over 350,000 acres. Maintaining regular Federal updates: 1) Major update of the Federal estate (fee ownership parcels, easement interest, and management designations), including authoritative data from 8 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), U.S. Forest Service (USFS), National Oceanic and Atmospheric Administration (NOAA). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/); 2) Complete National Marine Protected Areas (MPA) update: from the National Oceanic and Atmospheric Administration (NOAA) MPA Inventory, including conservation measure ('GAP Status Code', 'IUCN Category') review by NOAA; Other changes: 1) PAD-US field name change - The "Public Access" field name changed from 'Access' to 'Pub_Access' to avoid unintended scripting errors associated with the script command 'access'. 2) Additional field - The "Feature Class" (FeatClass) field was added to all layers within PAD-US 2.1 (only included in the "Combined" layers of PAD-US 2.0 to describe which feature class data originated from). 3) Categorical GAP Status Code default changes - National Monuments are categorically assigned GAP Status Code = 2 (previously GAP 3), in the absence of other information, to better represent biodiversity protection restrictions associated with the designation. The Bureau of Land Management Areas of Environmental Concern (ACECs) are categorically assigned GAP Status Code = 3 (previously GAP 2) as the areas are administratively protected, not permanent. More information is available upon request. 4) Agency Name (FWS) geodatabase domain description changed to U.S. Fish and Wildlife Service (previously U.S. Fish & Wildlife Service). 5) Select areas in the provisional PAD-US 2.1 Proclamation feature class were removed following a consultation with the data-steward (Census Bureau). Tribal designated statistical areas are purely a geographic area for providing Census statistics with no land base. Most affected areas are relatively small; however, 4,341,120 acres and 37 records were removed in total. Contact Mason Croft (masoncroft@boisestate) for more information about how to identify these records. For more information regarding the PAD-US dataset please visit, https://usgs.gov/gapanalysis/PAD-US/. For more information about data aggregation please review the Online PAD-US Data Manual available at https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-manual .
u
An experienced racial-ethnic diversity dataset in the United States using...
knowledge.uchicago.edu
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xu, Wenfei; Wang, Zhuojun; Attia, Nada; Attia, Youssef; Zhang, Yucheng; Zong, Haotian (2023). An experienced racial-ethnic diversity dataset in the United States using human mobility data [Dataset]. http://doi.org/10.17605/OSF.IO/X94GJ
Explore at:
Unique identifier
https://doi.org/10.17605/OSF.IO/X94GJ
Dataset updated
Jul 26, 2023
Dataset provided by
OSF
Authors
Xu, Wenfei; Wang, Zhuojun; Attia, Nada; Attia, Youssef; Zhang, Yucheng; Zong, Haotian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
This national, tract-level experienced racial segregation dataset uses data for over 66 million anonymized and opted-in devices in Cuebiq’s Spectus Clean Room data to estimate 15 minute time overlaps of device stays in 38.2m x 19.1m grids across the United States in 2022. We infer a probability distribution of racial backgrounds for each device given their home Census block groups at the time of data collection, and calculate the probability of a diverse social contact during that space and time. These measures are then aggregated to the Census tract and across the whole time period in order to preserve privacy and develop a generalizable measure of the diversity of a place. We propose that this dataset is a better measurement of the segregation and diversity as it is experienced, which we show diverges from standard measurements of segregation. The data can be used by researchers to better understand the determinants of experienced segregation; beyond research, we suggest this data can be used by policy makers to understand the impacts of policies designed to encourage social mixing and access to opportunities such as affordable housing and mixed-income housing, and more.

For the purposes of enhanced privacy, home census block groups were pre-calculated by the data provider, and all calculations are done at the Census tract, with tracts that have more than 20 unique devices over the period of analysis.
N
Median Household Income by Racial Categories in State Line City, IN (, in...
neilsberg.com
csv, json
Updated Mar 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Median Household Income by Racial Categories in State Line City, IN (, in 2023 inflation-adjusted dollars) [Dataset]. https://www.neilsberg.com/insights/state-line-city-in-median-household-income-by-race/
Explore at:
json, csvAvailable download formats
Dataset updated
Mar 1, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
State Line City
Variables measured
Median Household Income for Asian Population, Median Household Income for Black Population, Median Household Income for White Population, Median Household Income for Some other race Population, Median Household Income for Two or more races Population, Median Household Income for American Indian and Alaska Native Population, Median Household Income for Native Hawaiian and Other Pacific Islander Population
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To portray the median household income within each racial category idetified by the US Census Bureau, we conducted an initial analysis and categorization of the data. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). It is important to note that the median household income estimates exclusively represent the identified racial categories and do not incorporate any ethnicity classifications. Households are categorized, and median incomes are reported based on the self-identified race of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the median household income across different racial categories in State Line City. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to gain insights into economic disparities and trends and explore the variations in median houshold income for diverse racial categories.

Key observations

Based on our analysis of the distribution of State Line City population by race & ethnicity, the population is predominantly White. This particular racial category constitutes the majority, accounting for 89.80% of the total residents in State Line City. Notably, the median household income for White households is $64,167. Interestingly, White is both the largest group and the one with the highest median household income, which stands at $64,167.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Racial categories include:

White

Black or African American

American Indian and Alaska Native

Asian

Native Hawaiian and Other Pacific Islander

Some other race

Two or more races (multiracial)

Variables / Data Columns

Race of the head of household: This column presents the self-identified race of the household head, encompassing all relevant racial categories (excluding ethnicity) applicable in State Line City.

Median household income: Median household income, adjusting for inflation, presented in 2023-inflation-adjusted dollars

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for State Line City median household income by race. You can refer the same here
USA Name Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?
d
2020 - 2021 Diversity Report
catalog.data.gov
data.cityofnewyork.us
+1more
Updated Nov 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2024). 2020 - 2021 Diversity Report [Dataset]. https://catalog.data.gov/dataset/2020-2021-diversity-report
Explore at:
Dataset updated
Nov 29, 2024
Dataset provided by
data.cityofnewyork.us
Description
Report on Demographic Data in New York City Public Schools, 2020-21Enrollment counts are based on the November 13 Audited Register for 2020. Categories with total enrollment values of zero were omitted. Pre-K data includes students in 3-K. Data on students with disabilities, English language learners, and student poverty status are as of March 19, 2021. Due to missing demographic information in rare cases and suppression rules, demographic categories do not always add up to total enrollment and/or citywide totals. NYC DOE "Eligible for free or reduced-price lunch” counts are based on the number of students with families who have qualified for free or reduced-price lunch or are eligible for Human Resources Administration (HRA) benefits. English Language Arts and Math state assessment results for students in grade 9 are not available for inclusion in this report, as the spring 2020 exams did not take place. Spring 2021 ELA and Math test results are not included in this report for K-8 students in 2020-21. Due to the COVID-19 pandemic’s complete transformation of New York City’s school system during the 2020-21 school year, and in accordance with New York State guidance, the 2021 ELA and Math assessments were optional for students to take. As a result, 21.6% of students in grades 3-8 took the English assessment in 2021 and 20.5% of students in grades 3-8 took the Math assessment. These participation rates are not representative of New York City students and schools and are not comparable to prior years, so results are not included in this report. Dual Language enrollment includes English Language Learners and non-English Language Learners. Dual Language data are based on data from STARS; as a result, school participation and student enrollment in Dual Language programs may differ from the data in this report. STARS course scheduling and grade management software applications provide a dynamic internal data system for school use; while standard course codes exist, data are not always consistent from school to school. This report does not include enrollment at District 75 & 79 programs. Students enrolled at Young Adult Borough Centers are represented in the 9-12 District data but not the 9-12 School data. “Prior Year” data included in Comparison tabs refers to data from 2019-20. “Year-to-Year Change” data included in Comparison tabs indicates whether the demographics of a school or special program have grown more or less similar to its district or attendance zone (or school, for special programs) since 2019-20. Year-to-year changes must have been at least 1 percentage point to qualify as “More Similar” or “Less Similar”; changes less than 1 percentage point are categorized as “No Change”. The admissions method tab contains information on the admissions methods used for elementary, middle, and high school programs during the Fall 2020 admissions process. Fall 2020 selection criteria are included for all programs with academic screens, including middle and high school programs. Selection criteria data is based on school-reported information. Fall 2020 Diversity in Admissions priorities is included for applicable middle and high school programs. Note that the data on each school’s demographics and performance includes all students of the given subgroup who were enrolled in the school on November 13, 2020. Some of these students may not have been admitted under the admissions method(s) shown, as some students may have enrolled in the school outside the centralized admissions process (via waitlist, over-the-counter, or transfer), and schools may have changed admissions methods over the past few years. Admissions methods are only reported for grades K-12. "3K and Pre-Kindergarten data are reported at the site level. See below for definitions of site types included in this report. Additionally, please note that this report excludes all students at District 75 sites, reflecting slightly lower enrollment than our total of 60,265 students
d
Protected Areas Database of the United States (PAD-US) 4.0
catalog.data.gov
data.usgs.gov
Updated Jul 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 4.0 [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-4-0
Explore at:
Dataset updated
Jul 20, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme ( https://ngda-cadastre-geoplatform.hub.arcgis.com/ ). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all open space public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, permanent and long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g., 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of U.S. public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. PAD-US provides a full inventory geodatabase, spatial analysis, statistics, data downloads, web services, poster maps, and data submissions included in efforts to track global progress toward biodiversity protection. PAD-US integrates spatial data to ensure public lands and other protected areas from all jurisdictions are represented. PAD-US version 4.0 includes new and updated data from the following data providers. All other data were transferred from previous versions of PAD-US. Federal updates - The USGS remains committed to updating federal fee owned lands data and major designation changes in regular PAD-US updates, where authoritative data provided directly by managing agencies are available or alternative data sources are recommended. Revisions associated with the federal estate in this version include updates to the Federal estate (fee ownership parcels, easement interest, management designations, and proclamation boundaries), with authoritative data from 7 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census Bureau), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), and the U.S. Forest Service (USFS). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://ngda-gov-units-geoplatform.hub.arcgis.com/pages/federal-lands-workgroup/ ). This includes improved the representation of boundaries and attributes for the National Park Service, U.S. Forest Service, Bureau of Land Management, and U.S. Fish and Wildlife Service lands, in collaboration with agency data-stewards, in response to feedback from the PAD-US Team and stakeholders. Additionally, National Cemetery boundaries were added using geospatial boundary data provided by the U.S. Department of Veterans Affairs and NASA boundaries were added using data contained in the USGS National Boundary Dataset (NBD). State Updates - USGS is committed to building capacity in the state data steward network and the PAD-US Team to increase the frequency of state land and NGO partner updates, as resources allow. State Lands Workgroup ( https://ngda-gov-units-geoplatform.hub.arcgis.com/pages/state-lands-workgroup ) is focused on improving protected land inventories in PAD-US, increase update efficiency, and facilitate local review. PAD-US 4.0 included updates and additions from the following seventeen states and territories: California (state, local, and nonprofit fee); Colorado (state, local, and nonprofit fee and easement); Georgia (state and local fee); Kentucky (state, local, and nonprofit fee and easement); Maine (state, local, and nonprofit fee and easement); Montana (state, local, and nonprofit fee); Nebraska (state fee); New Jersey (state, local, and nonprofit fee and easement); New York (state, local, and nonprofit fee and easement); North Carolina (state, local, and nonprofit fee); Pennsylvania (state, local, and nonprofit fee and easement); Puerto Rico (territory fee); Tennessee (land trust fee); Texas (state, local, and nonprofit fee); Virginia (state, local, and nonprofit fee); West Virginia (state, local, and nonprofit fee); and Wisconsin (state fee data). Additionally, the following datasets were incorporated from NGO data partners: Trust for Public Land (TPL) Parkserve (new fee and easement data); The Nature Conservancy (TNC) Lands (fee owned by TNC); TNC Northeast Secured Areas; Ducks Unlimited (land trust fee); and the National Conservation Easement Database (NCED). All state and NGO easement submissions are provided to NCED. For more information regarding the PAD-US dataset please visit, https://www.usgs.gov/programs/gap-analysis-project/science/protected-areas . For more information regarding the PAD-US dataset please visit, https://www.usgs.gov/programs/gap-analysis-project/science/protected-areas . For more information about data aggregation please review the PAD-US Data Manual available at https://www.usgs.gov/programs/gap-analysis-project/pad-us-data-manual . A version history of PAD-US updates is summarized below (See https://www.usgs.gov/programs/gap-analysis-project/pad-us-data-history/ for more information): 1) First posted - April 2009 (Version 1.0 - available from the PAD-US: Team pad-us@usgs.gov). 2) Revised - May 2010 (Version 1.1 - available from the PAD-US: Team pad-us@usgs.gov). 3) Revised - April 2011 (Version 1.2 - available from the PAD-US: Team pad-us@usgs.gov). 4) Revised - November 2012 (Version 1.3) https://doi.org/10.5066/F79Z92XD 5) Revised - May 2016 (Version 1.4) https://doi.org/10.5066/F7G73BSZ 6) Revised - September 2018 (Version 2.0) https://doi.org/10.5066/P955KPLE 7) Revised - September 2020 (Version 2.1) https://doi.org/10.5066/P92QM3NT 8) Revised - January 2022 (Version 3.0) https://doi.org/10.5066/P9Q9LQ4B 9) Revised - April 2024 (Version 4.0) https://doi.org/10.5066/P96WBCHS Comparing protected area trends between PAD-US versions is not recommended without consultation with USGS as many changes reflect improvements to agency and organization GIS systems, or conservation and recreation measure classification, rather than actual changes in protected area acquisition on the ground.
N
United States annual income distribution by work experience and gender...
neilsberg.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). United States annual income distribution by work experience and gender dataset: Number of individuals ages 15+ with income, 2023 // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/bacb49c0-f4ce-11ef-8577-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Variables measured
Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time, Number of males working full time for a given income bracket, Number of males working part time for a given income bracket, Number of females working full time for a given income bracket, Number of females working part time for a given income bracket
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To portray the number of individuals for both the genders (Male and Female), within each income bracket we conducted an initial analysis and categorization of the American Community Survey data. Households are categorized, and median incomes are reported based on the self-identified gender of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within United States. The dataset can be utilized to gain insights into gender-based income distribution within the United States population, aiding in data analysis and decision-making..

Key observations

Employment patterns: Within United States, among individuals aged 15 years and older with income, there were 119.64 million men and 117.56 million women in the workforce. Among them, 66.07 million men were engaged in full-time, year-round employment, while 50.33 million women were in full-time, year-round roles.

Annual income under $24,999: Of the male population working full-time, 7.45% fell within the income range of under $24,999, while 10.76% of the female population working full-time was represented in the same income bracket.

Annual income above $100,000: 29.72% of men in full-time roles earned incomes exceeding $100,000, while 18.56% of women in full-time positions earned within this income bracket.

Refer to the research insights for more key observations on more income brackets ( Annual income under $24,999, Annual income between $25,000 and $49,999, Annual income between $50,000 and $74,999, Annual income between $75,000 and $99,999 and Annual income above $100,000) and employment types (full-time year-round and part-time)

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Income brackets:

$1 to $2,499 or loss

$2,500 to $4,999

$5,000 to $7,499

$7,500 to $9,999

$10,000 to $12,499

$12,500 to $14,999

$15,000 to $17,499

$17,500 to $19,999

$20,000 to $22,499

$22,500 to $24,999

$25,000 to $29,999

$30,000 to $34,999

$35,000 to $39,999

$40,000 to $44,999

$45,000 to $49,999

$50,000 to $54,999

$55,000 to $64,999

$65,000 to $74,999

$75,000 to $99,999

$100,000 or more

Variables / Data Columns

Income Bracket: This column showcases 20 income brackets ranging from $1 to $100,000+..

Full-Time Males: The count of males employed full-time year-round and earning within a specified income bracket

Part-Time Males: The count of males employed part-time and earning within a specified income bracket

Full-Time Females: The count of females employed full-time year-round and earning within a specified income bracket

Part-Time Females: The count of females employed part-time and earning within a specified income bracket

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.

Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for United States median household income by race. You can refer the same here
d
Protected Areas Database of the United States (PAD-US) 3.0 (ver. 2.0, March...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 3.0 (ver. 2.0, March 2023) [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-ver-2-0-march-2023
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme ( https://communities.geoplatform.gov/ngda-cadastre/ ). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all open space public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, permanent and long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g. 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of U.S. public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. The PAD-US geodatabase maps and describes areas using thirty-six attributes and five separate feature classes representing the U.S. protected areas network: Fee (ownership parcels), Designation, Easement, Marine, Proclamation and Other Planning Boundaries. An additional Combined feature class includes the full PAD-US inventory to support data management, queries, web mapping services, and analyses. The Feature Class (FeatClass) field in the Combined layer allows users to extract data types as needed. A Federal Data Reference file geodatabase lookup table (PADUS3_0Combined_Federal_Data_References) facilitates the extraction of authoritative federal data provided or recommended by managing agencies from the Combined PAD-US inventory. This PAD-US Version 3.0 dataset includes a variety of updates from the previous Version 2.1 dataset (USGS, 2020, https://doi.org/10.5066/P92QM3NT ), achieving goals to: 1) Annually update and improve spatial data representing the federal estate for PAD-US applications; 2) Update state and local lands data as state data-steward and PAD-US Team resources allow; and 3) Automate data translation efforts to increase PAD-US update efficiency. The following list summarizes the integration of "best available" spatial data to ensure public lands and other protected areas from all jurisdictions are represented in the PAD-US (other data were transferred from PAD-US 2.1). Federal updates - The USGS remains committed to updating federal fee owned lands data and major designation changes in annual PAD-US updates, where authoritative data provided directly by managing agencies are available or alternative data sources are recommended. The following is a list of updates or revisions associated with the federal estate: 1) Major update of the Federal estate (fee ownership parcels, easement interest, and management designations where available), including authoritative data from 8 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census Bureau), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), U.S. Forest Service (USFS), and National Oceanic and Atmospheric Administration (NOAA). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/ ). 2) Improved the representation (boundaries and attributes) of the National Park Service, U.S. Forest Service, Bureau of Land Management, and U.S. Fish and Wildlife Service lands, in collaboration with agency data-stewards, in response to feedback from the PAD-US Team and stakeholders. 3) Added a Federal Data Reference file geodatabase lookup table (PADUS3_0Combined_Federal_Data_References) to the PAD-US 3.0 geodatabase to facilitate the extraction (by Data Provider, Dataset Name, and/or Aggregator Source) of authoritative data provided directly (or recommended) by federal managing agencies from the full PAD-US inventory. A summary of the number of records (Frequency) and calculated GIS Acres (vs Documented Acres) associated with features provided by each Aggregator Source is included; however, the number of records may vary from source data as the "State Name" standard is applied to national files. The Feature Class (FeatClass) field in the table and geodatabase describe the data type to highlight overlapping features in the full inventory (e.g. Designation features often overlap Fee features) and to assist users in building queries for applications as needed. 4) Scripted the translation of the Department of Defense, Census Bureau, and Natural Resource Conservation Service source data into the PAD-US format to increase update efficiency. 5) Revised conservation measures (GAP Status Code, IUCN Category) to more accurately represent protected and conserved areas. For example, Fish and Wildlife Service (FWS) Waterfowl Production Area Wetland Easements changed from GAP Status Code 2 to 4 as spatial data currently represents the complete parcel (about 10.54 million acres primarily in North Dakota and South Dakota). Only aliquot parts of these parcels are documented under wetland easement (1.64 million acres). These acreages are provided by the U.S. Fish and Wildlife Service and are referenced in the PAD-US geodatabase Easement feature class 'Comments' field. State updates - The USGS is committed to building capacity in the state data-steward network and the PAD-US Team to increase the frequency of state land updates, as resources allow. The USGS supported efforts to significantly increase state inventory completeness with the integration of local parks data in the PAD-US 2.1, and developed a state-to-PAD-US data translation script during PAD-US 3.0 development to pilot in future updates. Additional efforts are in progress to support the technical and organizational strategies needed to increase the frequency of state updates. The PAD-US 3.0 included major updates to the following three states: 1) California - added or updated state, regional, local, and nonprofit lands data from the California Protected Areas Database (CPAD), managed by GreenInfo Network, and integrated conservation and recreation measure changes following review coordinated by the data-steward with state managing agencies. Developed a data translation Python script (see Process Step 2 Source Data Documentation) in collaboration with the data-steward to increase the accuracy and efficiency of future PAD-US updates from CPAD. 2) Virginia - added or updated state, local, and nonprofit protected areas data (and removed legacy data) from the Virginia Conservation Lands Database, provided by the Virginia Department of Conservation and Recreation's Natural Heritage Program, and integrated conservation and recreation measure changes following review by the data-steward. 3) West Virginia - added or updated state, local, and nonprofit protected areas data provided by the West Virginia University, GIS Technical Center. For more information regarding the PAD-US dataset please visit, https://www.usgs.gov/gapanalysis/PAD-US/. For more information about data aggregation please review the PAD-US Data Manual available at https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-manual . A version history of PAD-US updates is summarized below (See https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-history for more information): 1) First posted - April 2009 (Version 1.0 - available from the PAD-US: Team pad-us@usgs.gov). 2) Revised - May 2010 (Version 1.1 - available from the PAD-US: Team pad-us@usgs.gov). 3) Revised - April 2011 (Version 1.2 - available from the PAD-US: Team pad-us@usgs.gov). 4) Revised - November 2012 (Version 1.3) https://doi.org/10.5066/F79Z92XD 5) Revised - May 2016 (Version 1.4) https://doi.org/10.5066/F7G73BSZ 6) Revised - September 2018 (Version 2.0) https://doi.org/10.5066/P955KPLE 7) Revised - September 2020 (Version 2.1) https://doi.org/10.5066/P92QM3NT 8) Revised - January 2022 (Version 3.0) https://doi.org/10.5066/P9Q9LQ4B Comparing protected area trends between PAD-US versions is not recommended without consultation with USGS as many changes reflect improvements to agency and organization GIS systems, or conservation and recreation measure classification, rather than actual changes in protected area acquisition on the ground.
N
Median Household Income by Racial Categories in State College, PA (, in 2023...
neilsberg.com
csv, json
Updated Mar 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Median Household Income by Racial Categories in State College, PA (, in 2023 inflation-adjusted dollars) [Dataset]. https://www.neilsberg.com/research/datasets/e0c37ad2-f665-11ef-a994-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Mar 1, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Pennsylvania, State College
Variables measured
Median Household Income for Asian Population, Median Household Income for Black Population, Median Household Income for White Population, Median Household Income for Some other race Population, Median Household Income for Two or more races Population, Median Household Income for American Indian and Alaska Native Population, Median Household Income for Native Hawaiian and Other Pacific Islander Population
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To portray the median household income within each racial category idetified by the US Census Bureau, we conducted an initial analysis and categorization of the data. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). It is important to note that the median household income estimates exclusively represent the identified racial categories and do not incorporate any ethnicity classifications. Households are categorized, and median incomes are reported based on the self-identified race of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the median household income across different racial categories in State College. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to gain insights into economic disparities and trends and explore the variations in median houshold income for diverse racial categories.

Key observations

Based on our analysis of the distribution of State College population by race & ethnicity, the population is predominantly White. This particular racial category constitutes the majority, accounting for 80.12% of the total residents in State College. Notably, the median household income for White households is $50,296. Interestingly, despite the White population being the most populous, it is worth noting that Some Other Race households actually reports the highest median household income, with a median income of $60,333. This reveals that, while Whites may be the most numerous in State College, Some Other Race households experience greater economic prosperity in terms of median household income.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Racial categories include:

White

Black or African American

American Indian and Alaska Native

Asian

Native Hawaiian and Other Pacific Islander

Some other race

Two or more races (multiracial)

Variables / Data Columns

Race of the head of household: This column presents the self-identified race of the household head, encompassing all relevant racial categories (excluding ethnicity) applicable in State College.

Median household income: Median household income, adjusting for inflation, presented in 2023-inflation-adjusted dollars

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for State College median household income by race. You can refer the same here
o
US Cities: Demographics
public.opendatasoft.com
data.smartidf.services
+3more
csv, excel, json
Updated Jul 27, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). US Cities: Demographics [Dataset]. https://public.opendatasoft.com/explore/dataset/us-cities-demographics/
Explore at:
excel, csv, jsonAvailable download formats
Dataset updated
Jul 27, 2017
License
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
Area covered
United States
Description
This dataset contains information about the demographics of all US cities and census-designated places with a population greater or equal to 65,000. This data comes from the US Census Bureau's 2015 American Community Survey. This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.
Gut microbiota diversity across ethnicities in the United States
plos.figshare.com
tiff
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew W. Brooks; Sambhawa Priya; Ran Blekhman; Seth R. Bordenstein (2023). Gut microbiota diversity across ethnicities in the United States [Dataset]. http://doi.org/10.1371/journal.pbio.2006842
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pbio.2006842
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Andrew W. Brooks; Sambhawa Priya; Ran Blekhman; Seth R. Bordenstein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Composed of hundreds of microbial species, the composition of the human gut microbiota can vary with chronic diseases underlying health disparities that disproportionally affect ethnic minorities. However, the influence of ethnicity on the gut microbiota remains largely unexplored and lacks reproducible generalizations across studies. By distilling associations between ethnicity and differences in two US-based 16S gut microbiota data sets including 1,673 individuals, we report 12 microbial genera and families that reproducibly vary by ethnicity. Interestingly, a majority of these microbial taxa, including the most heritable bacterial family, Christensenellaceae, overlap with genetically associated taxa and form co-occurring clusters linked by similar fermentative and methanogenic metabolic processes. These results demonstrate recurrent associations between specific taxa in the gut microbiota and ethnicity, providing hypotheses for examining specific members of the gut microbiota as mediators of health disparities.
Z
RRING Global Survey Research Dataset (WP3)
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jun 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorenz, Lars (2021). RRING Global Survey Research Dataset (WP3) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4719937
Explore at:
Dataset updated
Jun 25, 2021
Dataset provided by
Lorenz, Lars
Jensen, Eric
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The RRING Work Package 3 (WP3) objective was to clarify how Research Funding Organisations (RFOs) and Research Performing Organisations (RPOs) operated within region-specific research and innovation environments. It explored how they navigated the governance and regulatory frameworks for Responsible Research and Innovation (RRI), as well as offering their perspectives on the entities responsible for RRI-related policy and action in their locales.

This data set covers the global survey research part, which was designed to contextualise how RPOs and RFOs interacted within the research environment and with non-academic stakeholders. Countries were grouped according to the UNESCO regions of the world and key results per region are listed below. For a detailed analysis and further findings of the work completed under WP3 of the RRING project, please refer to the full deliverable document "State of the Art of RRI in the Five UNESCO World Regions" [link to be inserted].

European and North American States

‘Diverse and inclusive': Respondents were most attitudinally supportive of the importance of ensuring ethical principles were applied in R&I (92%), followed by diverse perspectives (88%), and gender equality (79%). Including ethnic minorities was the area which garnered the least attitudinal support (71%). Respondents took the most practical steps towards engaging with diverse perspectives (63%), and the least towards inclusion of ethnic minorities (24%).

‘Anticipative and reflective’: Respondents widely agreed (82%) with the importance of ensuring R&I work does not cause concerns for society, but only 37% confirmed they had taken practical steps to ensure this.

‘Open and transparent’: Vast majorities of respondents agreed on the importance of keeping R&I methods open and transparent (94%), with 65% also confirming they take practical steps to do this. An equally high number agreed on the importance of making the results of R&I work accessible to as wide a public as possible (94%), and 68% confirmed this through their reported actions. This indicated the smallest value-action gap of all RRI measures for respondents from European and North American countries. Attitudinal agreement on the importance of making data freely available to the public was lower (83%), as was the practical action aspect for this measure (45%).

‘Responsive and adaptive to change’: Most respondents agreed (89%) that it was important to ensure their work addresses societal needs, and 62% confirmed that they take practical steps towards this aim.

Latin American and Caribbean States

‘Diverse and inclusive': Respondents were most attitudinally supportive of the importance of gender equality in R&I (86%), followed by ensuring ethical principles are applied (85%), and diverse perspectives incorporated (83%). Including ethnic minorities was the area which garnered the least attitudinal support (77%). Respondents took the most practical steps towards ensuring ethical principles guide their work (50%), and the least towards including ethnic minorities (25%), but the smallest value action gap was found for gender equality.

‘Anticipative and reflective’: Respondents agreed (79%) that it is important to ensure R&I work does not cause concerns for society, but only 29% confirmed they had taken practical steps to ensure this.

‘Open and transparent’: The majority of respondents agreed on the importance of keeping R&I methods open and transparent (89%), with 45% indicating they had taken practical action. A majority also agreed on the importance of making the results of R&I work accessible to as wide a public as possible (88%), and 44% backed this up with practical action. Attitudinal agreement on the importance of making data freely available to the public was slightly lower (81%), as was the practical action aspect for this measure (35%).

‘Responsive and adaptive to change’: Most respondents agreed (84%) that it was important to ensure their work addresses societal needs, and 49% confirmed that they take practical steps towards this aim.

Asian and Pacific States

‘Diverse and inclusive': Respondents were most attitudinally supportive of the importance of ensuring ethical principles were applied in R&I (90%), followed by diverse perspectives (89%), and gender equality (86%). Including ethnic minorities was the area which garnered the least attitudinal support (76%). Respondents took the most practical steps towards engaging with diverse perspectives (65%), and the least towards including ethnic minorities (30%).

‘Anticipative and reflective’: Respondents widely agreed (78%) with the importance of ensuring R&I work does not cause concerns for society, and 42% confirmed they had taken practical steps to ensure this.

‘Open and transparent’: The majority of respondents agreed on the importance of keeping R&I methods open and transparent (91%), with 58% indicating they take practical steps to do this. A majority also agreed on the importance of making the results of R&I work accessible to as wide a public as possible (89%), and 64% backed this up with practical action. Attitudinal agreement on the importance of making data freely available to the public was lower (79%), as was the practical action aspect for this measure (40%).

‘Responsive and adaptive to change’: Most respondents agreed (92%) that it was important to ensure their work addresses societal needs, and 69% confirmed that they take practical steps towards this aim. This was the RRI measure with the smallest valueaction gap for respondents from the Asian and Pacific region.

Arab States

‘Diverse and inclusive': Respondents were most attitudinally supportive of the importance of ensuring ethical principles were applied in R&I (93%), followed by diverse perspectives (81%), and gender equality (85%). Including ethnic minorities was the area which garnered the least attitudinal support (74%). Respondents took the most practical steps towards engaging with diverse perspectives (66%), which equated to one of two equally small value-action gaps for respondents from Arab states, and the least practical steps towards inclusion of ethnic minorities (22%).

‘Anticipative and reflective’: A high proportion of respondents (85%) agreed that it is important to ensure R&I work does not cause concerns for society. However, only 38% confirmed they had taken practical steps to ensure this.

‘Open and transparent’: The majority of respondents agreed on the importance of keeping R&I methods open and transparent (89%), with 59% also confirming they take practical steps to do this. A majority also agreed on the importance of making the results of R&I work accessible to as wide a public as possible (90%), and 66% backed this up with practical action. Ensuring public accessibility of research results was the second of two measures with equally small value-action gaps. Attitudinal agreement on the importance of making data freely available to the public was much lower (78%), which also reflected the practical action aspect for this measure (49%).

‘Responsive and adaptive to change’: Most respondents agreed (96%) that it was important to ensure their work addresses societal needs, and 68% confirmed that they take practical steps to achieve this.

African States

‘Diverse and inclusive': Respondents were most attitudinally supportive of the importance of ensuring engagement with diverse perspectives and expertise in R&I (91%), followed by ensuring ethical principles are applied (90%), and gender equality (89%). Including ethnic minorities was the area which garnered the least attitudinal support (74%). Respondents took the most practical steps towards ensuring ethical principles guide their work (57%), and the least towards including ethnic minorities (32%).

‘Anticipative and reflective’: The majority of respondents (85%) agreed that it is important to ensure R&I work does not cause concerns for society, with 59% confirming that they take practical steps to ensure this.

‘Open and transparent’: A high proportion of respondents agreed on the importance of keeping R&I methods open and transparent (90%), with 54% also confirming they take practical steps to do this. A majority also agreed on the importance of making the results of R&I work accessible to as wide a public as possible (86%), and 56% backed this up with practical action. Attitudinal agreement on the importance of making data freely available to the public was significantly lower (73%), as was the practical action aspect for this measure (38%).

‘Responsive and adaptive to change’: Respondents mostly agreed (92%) that it was important to ensure their work addresses societal needs, and 64% confirmed that they take practical steps towards this aim. This was the RRI measure with the smallest valueaction gap for respondents from African states.

Note: Please refer to the "RRING WP3 - Survey Data Documentation" document for detailed instructions on how to use this dataset.
Data from: UAIC Ichthyological Collection
gbif.org
Updated Oct 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Worth Pugh; Worth Pugh (2021). UAIC Ichthyological Collection [Dataset]. http://doi.org/10.15468/a2laag
Explore at:
Unique identifier
https://doi.org/10.15468/a2laag
Dataset updated
Oct 25, 2021
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
University of Alabama Biodiversity and Systematics
Authors
Worth Pugh; Worth Pugh
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered

Description
The State of Alabama contains the most diverse fish fauna of North America. The University of Alabama Ichthyological Collection (UAIC) documents this diversity and is one of the largest educational and research collections of fishes in the southeastern United States. This nationally and internationally recognized biological resource includes over one million preserved, skeletal, and frozen specimens, some dating back to the mid 1900's, and is the best single resource documenting past and present distributions and abundances of fishes in the State.
d
Protected Areas Database of the United States (PAD-US) 1.4
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 1.4 [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-1-4
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
NOTE: A more current version of the Protected Areas Database of the United States (PAD-US) is available: PAD-US 2.0 https://doi.org/10.5066/P955KPLE. The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public open space and voluntarily provided, private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastral Theme (http://www.fgdc.gov/ngda-reports/NGDA_Datasets.html). PAD-US is an ongoing project with several published versions of a spatial database of areas dedicated to the preservation of biological diversity, and other natural, recreational or cultural uses, managed for these purposes through legal or other effective means. The geodatabase maps and describes public open space and other protected areas. Most areas are public lands owned in fee; however, long-term easements, leases, and agreements or administrative designations documented in agency management plans may be included. The PAD-US database strives to be a complete “best available” inventory of protected areas (lands and waters) including data provided by managing agencies and organizations. The dataset is built in collaboration with several partners and data providers (http://gapanalysis.usgs.gov/padus/stewards/). See Supplemental Information Section of this metadata record for more information on partnerships and links to major partner organizations. As this dataset is a compilation of many data sets; data completeness, accuracy, and scale may vary. Federal and state data are generally complete, while local government and private protected area coverage is about 50% complete, and depends on data management capacity in the state. For completeness estimates by state: http://www.protectedlands.net/partners. As the federal and state data are reasonably complete; focus is shifting to completing the inventory of local gov and voluntarily provided, private protected areas. The PAD-US geodatabase contains over twenty-five attributes and four feature classes to support data management, queries, web mapping services and analyses: Marine Protected Areas (MPA), Fee, Easements and Combined. The data contained in the MPA Feature class are provided directly by the National Oceanic and Atmospheric Administration (NOAA) Marine Protected Areas Center (MPA, http://marineprotectedareas.noaa.gov ) tracking the National Marine Protected Areas System. The Easements feature class contains data provided directly from the National Conservation Easement Database (NCED, http://conservationeasement.us ) The MPA and Easement feature classes contain some attributes unique to the sole source databases tracking them (e.g. Easement Holder Name from NCED, Protection Level from NOAA MPA Inventory). The "Combined" feature class integrates all fee, easement and MPA features as the best available national inventory of protected areas in the standard PAD-US framework. In addition to geographic boundaries, PAD-US describes the protection mechanism category (e.g. fee, easement, designation, other), owner and managing agency, designation type, unit name, area, public access and state name in a suite of standardized fields. An informative set of references (i.e. Aggregator Source, GIS Source, GIS Source Date) and "local" or source data fields provide a transparent link between standardized PAD-US fields and information from authoritative data sources. The areas in PAD-US are also assigned conservation measures that assess management intent to permanently protect biological diversity: the nationally relevant "GAP Status Code" and global "IUCN Category" standard. A wealth of attributes facilitates a wide variety of data analyses and creates a context for data to be used at local, regional, state, national and international scales. More information about specific updates and changes to this PAD-US version can be found in the Data Quality Information section of this metadata record as well as on the PAD-US website, http://gapanalysis.usgs.gov/padus/data/history/.) Due to the completeness and complexity of these data, it is highly recommended to review the Supplemental Information Section of the metadata record as well as the Data Use Constraints, to better understand data partnerships as well as see tips and ideas of appropriate uses of the data and how to parse out the data that you are looking for. For more information regarding the PAD-US dataset please visit, http://gapanalysis.usgs.gov/padus/. To find more data resources as well as view example analysis performed using PAD-US data visit, http://gapanalysis.usgs.gov/padus/resources/. The PAD-US dataset and data standard are compiled and maintained by the USGS Gap Analysis Program, http://gapanalysis.usgs.gov/ . For more information about data standards and how the data are aggregated please review the “Standards and Methods Manual for PAD-US,” http://gapanalysis.usgs.gov/padus/data/standards/ .
Z
INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET
data.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nafiz Sadman (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Nishat Anjum
Kishor Datta Gupta
Nafiz Sadman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh, United States
Description
Introduction

There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

2 Data-set Introduction

2.1 Data Collection

We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

The headline must have one or more words directly or indirectly related to COVID-19.

The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

Avoid taking duplicate reports.

Maintain a time frame for the above mentioned newspapers.

To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

2.2 Data Pre-processing and Statistics

Some pre-processing steps performed on the newspaper report dataset are as follows:

Remove hyperlinks.

Remove non-English alphanumeric characters.

Remove stop words.

Lemmatize text.

While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

The primary data statistics of the two dataset are shown in Table 1 and 2.

Table 1: Covid-News-USA-NNK data statistics

No of words per headline

7 to 20

No of words per body content

150 to 2100

Table 2: Covid-News-BD-NNK data statistics No of words per headline

10 to 20

No of words per body content

100 to 1500

2.3 Dataset Repository

We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

3 Literature Review

Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

4 Our experiments and Result analysis

We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

In February, both the news paper have talked about China and source of the outbreak.

StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

Washington Post discussed global issues more than StarTribune.

StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
F
Audio Visual Speech Dataset: American English
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Audio Visual Speech Dataset: American English [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/american-english-visual-speech-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the US English Language Visual Speech Dataset! This dataset is a collection of diverse, single-person unscripted spoken videos supporting research in visual speech recognition, emotion detection, and multimodal communication.
Dataset Content
This visual speech dataset contains 1000 videos in US English language each paired with a corresponding high-fidelity audio track. Each participant is answering a specific question in a video in an unscripted and spontaneous nature.
•Participant Diversity:
•
Speakers: The dataset includes visual speech data from more than 200 participants from different states/provinces of United States of America.

•
Regions: Ensures a balanced representation of Skip 3 accents, dialects, and demographics.

•
Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

Video Data
While recording each video extensive guidelines are kept in mind to maintain the quality and diversity.
•Recording Details:
•
File Duration: Average duration of 30 seconds to 3 minutes per video.

•
Formats: Videos are available in MP4 or MOV format.

•
Resolution: Videos are recorded in ultra-high-definition resolution with 30 fps or above.

•
Device: Both the latest Android and iOS devices are used in this collection.

•
Recording Conditions: Videos were recorded under various conditions to ensure diversity and reduce bias:

•
Indoor and Outdoor Settings: Includes both indoor and outdoor recordings.

•
Lighting Variations: Captures videos in daytime, nighttime, and varying lighting conditions.

•
Camera Positions: Includes handheld and fixed camera positions, as well as portrait and landscape orientations.

•
Face Orientation: Contains straight face and tilted face angles.

•
Participant Positions: Records participants in both standing and seated positions.

•
Motion Variations: Features both stationary and moving videos, where participants pass through different lighting conditions.

•
Occlusions: Includes videos where the participant's face is partially occluded by hand movements, microphones, hair, glasses, and facial hair.

•
Focus: In each video, the participant's face remains in focus throughout the video duration, ensuring the face stays within the video frame.

•
Video Content: In each video, the participant answers a specific question in an unscripted manner. These questions are designed to capture various emotions of participants. The dataset contain videos expressing following human emotions:

•Happy
•Sad
•Excited
•Angry
•Annoyed
•Normal
•
Question Diversity: For each human emotion participant answered a specific question expressing that particular emotion.

Metadata
The dataset provides comprehensive metadata for each video recording and participant:
•
a
Iowa - USGS National Elevation Dataset
data-iowageomapserver.hub.arcgis.com
Updated Oct 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
michbeck@iastate.edu_iowageomapserver (2023). Iowa - USGS National Elevation Dataset [Dataset]. https://data-iowageomapserver.hub.arcgis.com/datasets/95a5c88a7fe64332adc65c7b0fb47adf
Explore at:
Dataset updated
Oct 27, 2023
Dataset authored and provided by
michbeck@iastate.edu_iowageomapserver
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
The National Elevation Dataset (NED) is a primary elevation data product that has been produced and distributed by the U.S. Geological Survey (USGS). Since its inception, the USGS has compiled and published topographic information in many forms, and the NED is a significant development in this long line of products that describe the land surface. The NED provides seamless raster elevation data of the conterminous United States (CONUS), Alaska, Hawaii, U.S. island territories, Mexico, and Canada. The NED is derived from diverse source datasets that are processed to a specification with consistent resolutions, coordinate system, elevation units, and horizontal and vertical datums. The NED was developed as the logical result of the maturation of the long-standing USGS elevation program, which for many years concentrated on production of quadrangle-based digital elevation models (DEM). The NED contributes to the elevation layer of The National Map, and it provides basic elevation information for earth science studies and mapping applications in the U.S. and most of North America.For over 15 years (1999–2014), the NED served as the flagship elevation product of the USGS. In 2015, the 3D Elevation Program (3DEP) was initiated. When the 3DEP initiative became operational, the name “National Elevation Dataset” (and the abbreviation “NED”) were retired as the USGS elevation activities and data were rebranded under the 3DEP banner. However, elevation data produced and distributed as part of the NED are still widely used (and distributed by other entities), so there is a continuing need for detailed documentation, including how it was produced, its accuracy, and how it is used.
d
Protected Areas Database of the United States (PAD-US)
search.dataone.org
datadiscoverystudio.org
+1more
Updated Oct 26, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Geological Survey (USGS) Gap Analysis Program (GAP) (2017). Protected Areas Database of the United States (PAD-US) [Dataset]. https://search.dataone.org/view/0459986b-9a0e-41d9-9997-cad0fbea9c4e
Explore at:
Dataset updated
Oct 26, 2017
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
US Geological Survey (USGS) Gap Analysis Program (GAP)
Time period covered
Jan 1, 2005 - Jan 1, 2016
Area covered
United States,
Variables measured
Shape, Access, Des_Nm, Des_Tp, Loc_Ds, Loc_Nm, Agg_Src, GAPCdDt, GAP_Sts, GIS_Src, and 20 more
Description
The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public open space and voluntarily provided, private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastral Theme (http://www.fgdc.gov/ngda-reports/NGDA_Datasets.html). PAD-US is an ongoing project with several published versions of a spatial database of areas dedicated to the preservation of biological diversity, and other natural, recreational or cultural uses, managed for these purposes through legal or other effective means. The geodatabase maps and describes public open space and other protected areas. Most areas are public lands owned in fee; however, long-term easements, leases, and agreements or administrative designations documented in agency management plans may be included. The PAD-US database strives to be a complete “best available” inventory of protected areas (lands and waters) including data provided by managing agencies and organizations. The dataset is built in collaboration with several partners and data providers (http://gapanalysis.usgs.gov/padus/stewards/). See Supplemental Information Section of this metadata record for more information on partnerships and links to major partner organizations. As this dataset is a compilation of many data sets; data completeness, accuracy, and scale may vary. Federal and state data are generally complete, while local government and private protected area coverage is about 50% complete, and depends on data management capacity in the state. For completeness estimates by state: http://www.protectedlands.net/partners. As the federal and state data are reasonably complete; focus is shifting to completing the inventory of local gov and voluntarily provided, private protected areas. The PAD-US geodatabase contains over twenty-five attributes and four feature classes to support data management, queries, web mapping services and analyses: Marine Protected Areas (MPA), Fee, Easements and Combined. The data contained in the MPA Feature class are provided directly by the National Oceanic and Atmospheric Administration (NOAA) Marine Protected Areas Center (MPA, http://marineprotectedareas.noaa.gov ) tracking the National Marine Protected Areas System. The Easements feature class contains data provided directly from the National Conservation Easement Database (NCED, http://conservationeasement.us ) The MPA and Easement feature classes contain some attributes unique to the sole source databases tracking them (e.g. Easement Holder Name from NCED, Protection Level from NOAA MPA Inventory). The "Combined" feature class integrates all fee, easement and MPA features as the best available national inventory of protected areas in the standard PAD-US framework. In addition to geographic boundaries, PAD-US describes the protection mechanism category (e.g. fee, easement, designation, other), owner and managing agency, designation type, unit name, area, public access and state name in a suite of standardized fields. An informative set of references (i.e. Aggregator Source, GIS Source, GIS Source Date) and "local" or source data fields provide a transparent link between standardized PAD-US fields and information from authoritative data sources. The areas in PAD-US are also assigned conservation measures that assess management intent to permanently protect biological diversity: the nationally relevant "GAP Status Code" and global "IUCN Category" standard. A wealth of attributes facilitates a wide variety of data analyses and creates a context for data to be used at local, regional, state, national and international scales. More information about specific updates and changes to this PAD-US version can be found in the Data Quality Information section of this metadata record as well as on the PAD-US website, http://gapanalysis.usgs.gov/padus/data/history/.) Due to the completeness and complexity of these data, it is highly recommended to review the Supplemental Information Section of the metadata record as well as the Data Use Constraints, to better understand data partnerships as well as see tips and ideas of appropriate uses of the data and how to parse out the data that you are looking for. For more information regarding the PAD-US dataset please visit, http://gapanalysis.usgs.gov/padus/. To find more data resources as well as view example analysis performed using PAD-US data visit, http://gapanalysis.usgs.gov/padus/resources/. The PAD-US dataset and data standard are compiled and maintained by the USGS Gap Analysis Program, http://gapanalysis.usgs.gov/ . For more information about data standards and how the data are aggregated please review the “Standards and Methods Manual for PAD-US,” http://gapanalysis.usgs.gov/padus/data/standards/ .
H
Extracted Data From: Smart Location Database
dataverse.harvard.edu
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2025). Extracted Data From: Smart Location Database [Dataset]. http://doi.org/10.7910/DVN/WY9T73
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/WY9T73
Dataset updated
Feb 19, 2025
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2010
Area covered
United States
Description
This submission includes publicly available data extracted in its original form. Please reference the Related Publication listed here for source and citation information: https://catalog.data.gov/dataset/smart-location-database7 If you have questions about the underlying data stored here, please contact Thomas John (thomas.john@epa.gov). If you have questions or recommendations related to this metadata entry and extracted data, please contact the CAFE Data Management team at: climatecafe@bu.edu. "The Smart Location Database is a nationwide geographic data resource for measuring location efficiency. It includes more than 90 attributes summarizing characteristics, such as housing density, diversity of land use, neighborhood design, destination accessibility, transit service, employment and demographics. Most attributes are available for every census block group in the United States. A large body of research has demonstrated that land use and urban form can have a significant effect on transportation outcomes. People who live and/or work in compact neighborhoods with a walkable street grid and easy access to public transit, jobs, stores, and services are more likely to have several transportation options to meet their everyday needs. As a result, they can choose to drive less, which reduces their emissions of greenhouse gases and other pollutants compared to people who live and work in places that are not location efficient. Walking, biking, and taking public transit can also save people money and improve their health by encouraging physical activity. The Smart Location Database summarizes several demographic, employment, and built environment variables for every census block group (CBG) in the United States. The database includes indicators of the commonly cited “D” variables shown in the transportation research literature to be related to travel behavior. The Ds include residential and employment density, land use diversity, design of the built environment, access to destinations, and distance to transit. SLD variables can be used as inputs to travel demand models, baseline data for scenario planning studies, and combined into composite indicators characterizing the relative location efficiency of CBG within U.S. metropolitan regions. EPA first released a beta version of the Smart Location Database in 2011. The initial full version was released in 2013, and the database was updated to its current version in 2021." Quote from https://www.epa.gov/smartgrowth/smart-location-mapping and https://catalog.data.gov/dataset/smart-location-database7
Data from: SNAPSHOT USA 2019-2023: The first five years of data from a...
data.niaid.nih.gov
dataone.org
+2more
zip
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brigit Rooney; William McShea; Roland Kays; Michael Cove (2025). SNAPSHOT USA 2019-2023: The first five years of data from a coordinated camera trap survey of the United States [Dataset]. http://doi.org/10.5061/dryad.k0p2ngfhn
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.k0p2ngfhn
Dataset updated
Apr 10, 2025
Dataset provided by
North Carolina Museum of Natural Sciences
North Carolina State University
Smithsonian Conservation Biology Institute
Authors
Brigit Rooney; William McShea; Roland Kays; Michael Cove
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
United States
Description
SNAPSHOT USA is an annual, multi-contributor camera trap survey of mammals across the United States. The growing SNAPSHOT USA dataset is intended for tracking the spatial and temporal responses of mammal populations to changes in land use, land cover, and climate. These data will be useful for exploring the drivers of spatial and temporal changes in relative abundance and distribution, as well as the impacts of species interactions on daily activity patterns. SNAPSHOT USA 2019–2023 contains 987,979 records of camera trap image sequence data and 9,694 records of camera trap deployment metadata. Data were collected across the United States of America in all 50 states, 12 ecoregions, and many ecosystems. Data were collected between August 1st and December 29th each year from 2019 to 2023. The dataset includes a wide range of taxa but is primarily focused on medium to large mammals. SNAPSHOT USA 2019–2023 comprises two .csv files. The original data can be found within the SNAPSHOT USA Initiative in the Wildlife Insights platform. Methods The first three annual SNAPSHOT USA surveys were coordinated by Roland Kays, Michael Cove, and William McShea. The 2019, 2020, and 2021 datasets are accessible for public use through the Supporting Information of their respective publications. Although the 2019 and 2020 surveys were originally processed and stored in eMammal (https://www.emammal.si.edu), all data are now housed in Wildlife Insights (WI) within the SNAPSHOT USA Initiative. The two most recent surveys, 2022 and 2023, were coordinated by the SNAPSHOT USA Survey Coordinator Brigit Rooney. This dataset represents the first publication of 2022 and 2023 SNAPSHOT USA data. The SNAPSHOT USA project developed a standard protocol in 2019 to survey mammals >100 g and large identifiable birds. Cameras are unbaited and set at approximately 50 cm height across an array of at least 7 cameras with a minimum distance of 200 m and a maximum of 5 km between them. The collection period for SNAPSHOT USA data is between September and October and the target minimum of camera trap-nights per array is 400. Some contributors to SNAPSHOT USA 2019–2023 started collecting data earlier or deployed cameras later based on locations or logistics, and we chose to include data from August 1st through December 29th each year in this dataset. The first two years of SNAPSHOT USA data incorporated an Expert Review Tool to verify the accuracy of every identification, as that was built in to the eMammal repository. This tool required SNAPSHOT USA project managers (Cove and Kays in 2019, with more taxon-specific reviewers in 2020) to review and confirm all species identifications, in an effort to minimize identification errors. As eMammal automatically grouped all uploaded images into “sequences” of images taken within 60 seconds of each other, by using the image timestamps, species identifications were made for individual sequences rather than images. These data have since been transferred to WI, where they underwent opportunistic review and correction by the SNAPSHOT USA Survey Coordinator. In contrast, SNAPSHOT USA 2021, 2022, and 2023 were managed and identified entirely in WI. All SNAPSHOT USA projects in this repository were created as “Sequence” projects, to enable the identification of sequences in the same manner as eMammal. Each 60-second sequence of images was classified to the narrowest taxonomic level possible by three iterations of validation. First, WI’s Artificial Intelligence algorithm suggested a taxonomic identification. This algorithm consists of a multiclass classification deep convolutional neural network model that uses pre-trained image embedding from Inception, a model used to identify objects. Second, each array’s Principal Investigator was responsible for validating the data, fixing Artificial Intelligence identification mistakes, and approving the data they contributed to the survey. Lastly, the SNAPSHOT USA Survey Coordinator quality-checked the deployment data and as many identified sequences as possible. This was a multistep process that began with checking the sequence metadata for obvious timestamp errors by organizing them chronologically in Microsoft Excel, and the deployment metadata for location errors by mapping their coordinates and looking for outliers. Next, the coordinator checked the sequence metadata for unlikely identifications, including species detections in places outside their known range, and verified their accuracy by viewing the images in WI. Finally, identifications for the most common species were verified by using the “Species” filter on WI to look for mistakes, one species at a time. When combining the five years of SNAPSHOT USA data to create SNAPSHOT USA 2019–2023, several aspects of the data were standardized to ensure consistency across all years. These were camera array names, camera location names, and taxonomy classifications. To match protocol requirements, all camera locations less than 5 km apart were classified as one array. This resulted in combining several arrays that were originally recorded under different names and ensuring that arrays in the same place maintained the same name each year. The camera location names were standardized by ensuring that all locations with geographic coordinates that were the same to four decimal places, in Decimal Degrees notation, had the same name. However, the original coordinates were retained in the dataset. Finally, all species taxonomy classifications for the 2019 and 2020 datasets (identified in eMammal) were standardized to match those used by WI. As part of this process, all subspecies of mammals in the dataset were changed to species level (e.g., Florida black bear (Ursus americanus floridanus) became American black bear (Ursus americanus)). For mammal taxonomy classifications, WI uses a combination of the International Union for Conservation of Nature (IUCN) Red List of Threatened Species (2023; https://iucnredlist.org) and the American Society of Mammalogists Mammal Diversity Database (2024; https://www.mammaldiversity.org). For bird species, WI uses Birdlife International’s taxonomy classifications (2024; https://datazone.birdlife.org/species/search). The WI taxonomy is continually updated in response to public user suggestions and the taxonomy used in the SNAPSHOT USA 2019–2023 dataset reflects the WI taxonomy used in June 2024.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 2.1 [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-2-1

Protected Areas Database of the United States (PAD-US) 2.1

Explore at:

108 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jul 6, 2024

Dataset provided by

United States Geological Surveyhttp://www.usgs.gov/

Area covered

United States

Description

NOTE: A more current version of the Protected Areas Database of the United States (PAD-US) is available: PAD-US 3.0 https://doi.org/10.5066/P9Q9LQ4B. The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme (https://communities.geoplatform.gov/ngda-cadastre/). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g. 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. The PAD-US geodatabase maps and describes areas using over twenty-five attributes and five feature classes representing the U.S. protected areas network in separate feature classes: Fee (ownership parcels), Designation, Easement, Marine, Proclamation and Other Planning Boundaries. Five additional feature classes include various combinations of the primary layers (for example, Combined_Fee_Easement) to support data management, queries, web mapping services, and analyses. This PAD-US Version 2.1 dataset includes a variety of updates and new data from the previous Version 2.0 dataset (USGS, 2018 https://doi.org/10.5066/P955KPLE ), achieving the primary goal to "Complete the PAD-US Inventory by 2020" (https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-vision) by addressing known data gaps with newly available data. The following list summarizes the integration of "best available" spatial data to ensure public lands and other protected areas from all jurisdictions are represented in PAD-US, along with continued improvements and regular maintenance of the federal theme. Completing the PAD-US Inventory: 1) Integration of over 75,000 city parks in all 50 States (and the District of Columbia) from The Trust for Public Land's (TPL) ParkServe data development initiative (https://parkserve.tpl.org/) added nearly 2.7 million acres of protected area and significantly reduced the primary known data gap in previous PAD-US versions (local government lands). 2) First-time integration of the Census American Indian/Alaskan Native Areas (AIA) dataset (https://www2.census.gov/geo/tiger/TIGER2019/AIANNH) representing the boundaries for federally recognized American Indian reservations and off-reservation trust lands across the nation (as of January 1, 2020, as reported by the federally recognized tribal governments through the Census Bureau's Boundary and Annexation Survey) addressed another major PAD-US data gap. 3) Aggregation of nearly 5,000 protected areas owned by local land trusts in 13 states, aggregated by Ducks Unlimited through data calls for easements to update the National Conservation Easement Database (https://www.conservationeasement.us/), increased PAD-US protected areas by over 350,000 acres. Maintaining regular Federal updates: 1) Major update of the Federal estate (fee ownership parcels, easement interest, and management designations), including authoritative data from 8 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), U.S. Forest Service (USFS), National Oceanic and Atmospheric Administration (NOAA). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/); 2) Complete National Marine Protected Areas (MPA) update: from the National Oceanic and Atmospheric Administration (NOAA) MPA Inventory, including conservation measure ('GAP Status Code', 'IUCN Category') review by NOAA; Other changes: 1) PAD-US field name change - The "Public Access" field name changed from 'Access' to 'Pub_Access' to avoid unintended scripting errors associated with the script command 'access'. 2) Additional field - The "Feature Class" (FeatClass) field was added to all layers within PAD-US 2.1 (only included in the "Combined" layers of PAD-US 2.0 to describe which feature class data originated from). 3) Categorical GAP Status Code default changes - National Monuments are categorically assigned GAP Status Code = 2 (previously GAP 3), in the absence of other information, to better represent biodiversity protection restrictions associated with the designation. The Bureau of Land Management Areas of Environmental Concern (ACECs) are categorically assigned GAP Status Code = 3 (previously GAP 2) as the areas are administratively protected, not permanent. More information is available upon request. 4) Agency Name (FWS) geodatabase domain description changed to U.S. Fish and Wildlife Service (previously U.S. Fish & Wildlife Service). 5) Select areas in the provisional PAD-US 2.1 Proclamation feature class were removed following a consultation with the data-steward (Census Bureau). Tribal designated statistical areas are purely a geographic area for providing Census statistics with no land base. Most affected areas are relatively small; however, 4,341,120 acres and 37 records were removed in total. Contact Mason Croft (masoncroft@boisestate) for more information about how to identify these records. For more information regarding the PAD-US dataset please visit, https://usgs.gov/gapanalysis/PAD-US/. For more information about data aggregation please review the Online PAD-US Data Manual available at https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-manual .

Clear search

Close search

Google apps

Main menu

Protected Areas Database of the United States (PAD-US) 2.1

An experienced racial-ethnic diversity dataset in the United States using...

Median Household Income by Racial Categories in State Line City, IN (, in...

About this dataset

Content

Inspiration

Recommended for further research

USA Name Data

Context

Content

Acknowledgements

Inspiration

2020 - 2021 Diversity Report

Protected Areas Database of the United States (PAD-US) 4.0

United States annual income distribution by work experience and gender...

About this dataset

Content

Inspiration

Recommended for further research

Protected Areas Database of the United States (PAD-US) 3.0 (ver. 2.0, March...

Median Household Income by Racial Categories in State College, PA (, in 2023...

About this dataset

Content

Inspiration

Recommended for further research

US Cities: Demographics

Gut microbiota diversity across ethnicities in the United States

RRING Global Survey Research Dataset (WP3)

Data from: UAIC Ichthyological Collection

Protected Areas Database of the United States (PAD-US) 1.4

INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

Audio Visual Speech Dataset: American English

Introduction

Dataset Content

Video Data

Metadata

Iowa - USGS National Elevation Dataset

Protected Areas Database of the United States (PAD-US)

Extracted Data From: Smart Location Database

Data from: SNAPSHOT USA 2019-2023: The first five years of data from a...

Protected Areas Database of the United States (PAD-US) 2.1See More Versions

Protected Areas Database of the United States (PAD-US) 2.1