Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 50 states in the United States by Non-Hispanic Some Other Race (SOR) population, as estimated by the United States Census Bureau. It also highlights population changes in each states over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
This dataset is part of the Geographical repository maintained by Opendatasoft. This dataset contains data for counties and equivalent entities in United States of America. The primary legal divisions of most states are termed counties. In Louisiana, these divisions are known as parishes. In Alaska, which has no counties, the equivalent entities are the organized boroughs, city and boroughs, municipalities, and for the unorganized area, census areas. The latter are delineated cooperatively for statistical purposes by the State of Alaska and the Census Bureau. In four states (Maryland, Missouri, Nevada, and Virginia), there are one or more incorporated places that are independent of any county organization and thus constitute primary divisions of their states. These incorporated places are known as independent cities and are treated as equivalent entities for purposes of data presentation. The District of Columbia and Guam have no primary divisions, and each area is considered an equivalent entity for purposes of data presentation. The Census Bureau treats the following entities as equivalents of counties for purposes of data presentation: Municipios in Puerto Rico, Districts and Islands in American Samoa, Municipalities in the Commonwealth of the Northern Mariana Islands, and Islands in the U.S. Virgin Islands. The entire area of the United States, Puerto Rico, and the Island Areas is covered by counties or equivalent entities.Processors and tools are using this data. Enhancements Add ISO 3166-3 codes. Simplify geometries to provide better performance across the services. Add administrative hierarchy.
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of United States by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for United States. The dataset can be utilized to understand the population distribution of United States by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in United States. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for United States.
Key observations
Largest age group (population): Male # 25-29 years (11.59 million) | Female # 25-29 years (11.26 million). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for United States Population by Gender. You can refer the same here
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The USGS National Hydrography Dataset (NHD) downloadable data collection from The National Map (TNM) is a comprehensive set of digital spatial data that encodes information about naturally occurring and constructed bodies of surface water (lakes, ponds, and reservoirs), paths through which water flows (canals, ditches, streams, and rivers), and related entities such as point features (springs, wells, stream gages, and dams). The information encoded about these features includes classification and other characteristics, delineation, geographic name, position and related measures, a "reach code" through which other information can be related to the NHD, and the direction of water flow. The network of reach codes delineating water and transported material flow allows users to trace movement in upstream and downstream directions. In addition to this geographic information, the dataset contains metadata that supports the exchange of future updates and improvements to the data. The NHD supports many applications, such as making maps, geocoding observations, flow modeling, data maintenance, and stewardship. For additional information on NHD, go to https://www.usgs.gov/core-science-systems/ngp/national-hydrography.
DWR was the steward for NHD and Watershed Boundary Dataset (WBD) in California. We worked with other organizations to edit and improve NHD and WBD, using the business rules for California. California's NHD improvements were sent to USGS for incorporation into the national database. The most up-to-date products are accessible from the USGS website. Please note that the California portion of the National Hydrography Dataset is appropriate for use at the 1:24,000 scale.
For additional derivative products and resources, including the major features in geopackage format, please go to this page: https://data.cnra.ca.gov/dataset/nhd-major-features Archives of previous statewide extracts of the NHD going back to 2018 may be found at https://data.cnra.ca.gov/dataset/nhd-archive.
In September 2022, USGS officially notified DWR that the NHD would become static as USGS resources will be devoted to the transition to the new 3D Hydrography Program (3DHP). 3DHP will consist of LiDAR-derived hydrography at a higher resolution than NHD. Upon completion, 3DHP data will be easier to maintain, based on a modern data model and architecture, and better meet the requirements of users that were documented in the Hydrography Requirements and Benefits Study (2016). The initial releases of 3DHP include NHD data cross-walked into the 3DHP data model. It will take several years for the 3DHP to be built out for California. Please refer to the resources on this page for more information.
The FINAL,STATIC version of the National Hydrography Dataset for California was published for download by USGS on December 27, 2023. This dataset can no longer be edited by the state stewards. The next generation of national hydrography data is the USGS 3D Hydrography Program (3DHP).
Questions about the California stewardship of these datasets may be directed to nhd_stewardship@water.ca.gov.
This project was undertaken to establish a computerized skeletal database composed of recent forensic cases to represent the present ethnic diversity and demographic structure of the United States population. The intent was to accumulate a forensic skeletal sample large and diverse enough to reflect different socioeconomic groups of the general population from different geographical regions of the country in order to enable researchers to revise the standards being used for forensic skeletal identification. The database is composed of eight data files, comprising four categories. The primary "biographical" or "identification" files (Part 1, Demographic Data, and Part 2, Geographic and Death Data) comprise the first category of information and pertain to the positive identification of each of the 1,514 data records in the database. Information in Part 1 includes sex, ethnic group affiliation, birth date, age at death, height (living and cadaver), and weight (living and cadaver). Variables in Part 2 pertain to the nature of the remains, means and sources of identification, city and state/country born, occupation, date missing/last seen, date of discovery, date of death, time since death, cause of death, manner of death, deposit/exposure of body, area found, city, county, and state/country found, handedness, and blood type. The Medical History File (Part 3) represents the second category of information and contains data on the documented medical history of the individual. Variables in Part 3 include general comments on medical history as well as comments on congenital malformations, dental notes, bone lesions, perimortem trauma, and other comments. The third category consists of an inventory file (Part 4, Skeletal Inventory Data) in which data pertaining to the specific contents of the database are maintained. This includes the inventory of skeletal material by element and side (left and right), indicating the condition of the bone as either partial or complete. The variables in Part 4 provide a skeletal inventory of the cranium, mandible, dentition, and postcranium elements and identify the element as complete, fragmentary, or absent. If absent, four categories record why it is missing. The last part of the database is composed of three skeletal data files, covering quantitative observations of age-related changes in the skeleton (Part 5), cranial measurements (Part 6), and postcranial measurements (Part 7). Variables in Part 5 provide assessments of epiphyseal closure and cranial suture closure (left and right), rib end changes (left and right), Todd Pubic Symphysis, Suchey-Brooks Pubic Symphysis, McKern & Steward--Phases I, II, and III, Gilbert & McKern--Phases I, II, and III, auricular surface, and dorsal pubic pitting (all for left and right). Variables in Part 6 include cranial measurements (length, breadth, height) and mandibular measurements (height, thickness, diameter, breadth, length, and angle) of various skeletal elements. Part 7 provides postcranial measurements (length, diameter, breadth, circumference, and left and right, where appropriate) of the clavicle, scapula, humerus, radius, ulna, scarum, innominate, femur, tibia, fibula, and calcaneus. A small file of noted problems for a few cases is also included (Part 8).
Description:
The “Fifty States Car License Plates” dataset was initially gathered from a job post on Upwork.com. The client provided this dataset as part of the project requirements. It serves as a foundational resource for developing machine learning models aimed at recognizing and differentiating license plate numbers from other text strings present in car images.
Content This dataset includes a diverse collection of car images, each featuring visible license plate numbers. Alongside the license plate numbers, the images contain various other text strings, which present a challenge for machine learning models to accurately identify and isolate the license plate numbers from the other textual content.
Download Dataset
Dataset Composition
Number of Images: The dataset contains a modest number of images showcasing cars from different states, each with a clear view of the license plate.
Image Details: Each image is captured in a variety of settings and angles to reflect real-world scenarios where cars and their license plates are encountered.
Text Annotations: Besides license plate numbers, the images may include other text elements such as bumper stickers, dealership names, and other signage.
Usage This dataset is particularly useful for training and testing machine learning models designed for:
License Plate Recognition (LPR): Teaching models to accurately detect and read license plate numbers.
Text Differentiation: Enabling models to distinguish between license plate numbers and other extraneous text present in the image.
Computer Vision: Enhancing object detection and recognition capabilities in varied real-world conditions.
Potential Applications
Traffic Management Systems: Improving automated license plate recognition systems used in toll collection, traffic monitoring, and law enforcement.
Parking Management: Streamlining entry and exit processes in parking facilities through automated license plate recognition.
Security Systems: Enhancing vehicle tracking and monitoring systems in secure areas such as corporate campuses, airports, and residential communities.
Challenges and Considerations
Diverse Text Elements: The presence of multiple text elements requires advanced algorithms to accurately identify and extract license plate numbers.
Varied Lighting and Angles: Images captured in different lighting conditions and from various angles add complexity to the model training process, requiring robust image processing techniques.
Conclusion The “Fifty States Car License Plates” dataset provides a valuable resource for developers and researchers working on license plate recognition and related computer vision applications. By addressing the challenges posed by varied text elements and real-world imaging conditions, this dataset contributes to the advancement of accurate and reliable LPR systems.
This dataset is sourced from Kaggle.
The purpose of this data collection was to record information about the cases, litigants, amicus participants, and the opinions decided by the Supreme Court under the tenure of Chief Justices Earl Warren (1953-1969) and Warren Burger (1969-1986) and others through 1993. The approach of this study was to proceed deductively, rather than seek to infer values of a particular group of justices. This method allows the investigation of value conflicts that are not litigated, as well as the value conflicts represented in Supreme Court opinions. Opinions are coded on the basis of their literal content, and the data are organized around the opinions. There are eight types of opinions. Within each type, up to six topics are coded, and within each topic, up to two values are coded. There are three integrated parts to this study, each of which can be linked to the other files by specific variables. Part 1, Supreme Court Database, contains basic case attributes from UNITED STATES SUPREME COURT JUDICIAL DATABASE, 1953-1993 TERMS (ICPSR 9422) and the opinions given in the cases. Part 2, Briefs, gives information on the filers and co-filers for cases in which amicus curie briefs were filed. Part 3, Groups, lists the litigants' names. The distinct aspects of the Court's decisions are covered by six types of variables in Part 1: (1) identification variables including case citation, docket number, unit of analysis, and number of records per unit of analysis, (2) background variables offering information on origin of case, source of case, reason for granting cert, parties to the case, direction of the lower court's decision, and manner in which the Court takes jurisdiction, (3) chronological variables covering date of term of court, chief justice, and natural court, (4) substantive variables including multiple legal provisions, authority for decision, issue, issue areas, and direction of decision, (5) outcome variables supplying information on form of decision, disposition of case, winning party, declaration of unconstitutionality, and multiple memorandum decisions, and (6) voting and opinion variables pertaining to the vote in the case and to the direction of the individual justices' votes.
Jurisdictional Unit, 2022-05-21. For use with WFDSS, IFTDSS, IRWIN, and InFORM.This is a feature service which provides Identify and Copy Feature capabilities. If fast-drawing at coarse zoom levels is a requirement, consider using the tile (map) service layer located at https://nifc.maps.arcgis.com/home/item.html?id=3b2c5daad00742cd9f9b676c09d03d13.OverviewThe Jurisdictional Agencies dataset is developed as a national land management geospatial layer, focused on representing wildland fire jurisdictional responsibility, for interagency wildland fire applications, including WFDSS (Wildland Fire Decision Support System), IFTDSS (Interagency Fuels Treatment Decision Support System), IRWIN (Interagency Reporting of Wildland Fire Information), and InFORM (Interagency Fire Occurrence Reporting Modules). It is intended to provide federal wildland fire jurisdictional boundaries on a national scale. The agency and unit names are an indication of the primary manager name and unit name, respectively, recognizing that:There may be multiple owner names.Jurisdiction may be held jointly by agencies at different levels of government (ie State and Local), especially on private lands, Some owner names may be blocked for security reasons.Some jurisdictions may not allow the distribution of owner names. Private ownerships are shown in this layer with JurisdictionalUnitIdentifier=null,JurisdictionalUnitAgency=null, JurisdictionalUnitKind=null, and LandownerKind="Private", LandownerCategory="Private". All land inside the US country boundary is covered by a polygon.Jurisdiction for privately owned land varies widely depending on state, county, or local laws and ordinances, fire workload, and other factors, and is not available in a national dataset in most cases.For publicly held lands the agency name is the surface managing agency, such as Bureau of Land Management, United States Forest Service, etc. The unit name refers to the descriptive name of the polygon (i.e. Northern California District, Boise National Forest, etc.).These data are used to automatically populate fields on the WFDSS Incident Information page.This data layer implements the NWCG Jurisdictional Unit Polygon Geospatial Data Layer Standard.Relevant NWCG Definitions and StandardsUnit2. A generic term that represents an organizational entity that only has meaning when it is contextualized by a descriptor, e.g. jurisdictional.Definition Extension: When referring to an organizational entity, a unit refers to the smallest area or lowest level. Higher levels of an organization (region, agency, department, etc) can be derived from a unit based on organization hierarchy.Unit, JurisdictionalThe governmental entity having overall land and resource management responsibility for a specific geographical area as provided by law.Definition Extension: 1) Ultimately responsible for the fire report to account for statistical fire occurrence; 2) Responsible for setting fire management objectives; 3) Jurisdiction cannot be re-assigned by agreement; 4) The nature and extent of the incident determines jurisdiction (for example, Wildfire vs. All Hazard); 5) Responsible for signing a Delegation of Authority to the Incident Commander.See also: Unit, Protecting; LandownerUnit IdentifierThis data standard specifies the standard format and rules for Unit Identifier, a code used within the wildland fire community to uniquely identify a particular government organizational unit.Landowner Kind & CategoryThis data standard provides a two-tier classification (kind and category) of landownership. Attribute Fields JurisdictionalAgencyKind Describes the type of unit Jurisdiction using the NWCG Landowner Kind data standard. There are two valid values: Federal, and Other. A value may not be populated for all polygons.JurisdictionalAgencyCategoryDescribes the type of unit Jurisdiction using the NWCG Landowner Category data standard. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State. A value may not be populated for all polygons.JurisdictionalUnitNameThe name of the Jurisdictional Unit. Where an NWCG Unit ID exists for a polygon, this is the name used in the Name field from the NWCG Unit ID database. Where no NWCG Unit ID exists, this is the “Unit Name” or other specific, descriptive unit name field from the source dataset. A value is populated for all polygons.JurisdictionalUnitIDWhere it could be determined, this is the NWCG Standard Unit Identifier (Unit ID). Where it is unknown, the value is ‘Null’. Null Unit IDs can occur because a unit may not have a Unit ID, or because one could not be reliably determined from the source data. Not every land ownership has an NWCG Unit ID. Unit ID assignment rules are available from the Unit ID standard, linked above.LandownerKindThe landowner category value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. There are three valid values: Federal, Private, or Other.LandownerCategoryThe landowner kind value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State, Private.DataSourceThe database from which the polygon originated. Be as specific as possible, identify the geodatabase name and feature class in which the polygon originated.SecondaryDataSourceIf the Data Source is an aggregation from other sources, use this field to specify the source that supplied data to the aggregation. For example, if Data Source is "PAD-US 2.1", then for a USDA Forest Service polygon, the Secondary Data Source would be "USDA FS Automated Lands Program (ALP)". For a BLM polygon in the same dataset, Secondary Source would be "Surface Management Agency (SMA)."SourceUniqueIDIdentifier (GUID or ObjectID) in the data source. Used to trace the polygon back to its authoritative source.MapMethod:Controlled vocabulary to define how the geospatial feature was derived. Map method may help define data quality. MapMethod will be Mixed Method by default for this layer as the data are from mixed sources. Valid Values include: GPS-Driven; GPS-Flight; GPS-Walked; GPS-Walked/Driven; GPS-Unknown Travel Method; Hand Sketch; Digitized-Image; DigitizedTopo; Digitized-Other; Image Interpretation; Infrared Image; Modeled; Mixed Methods; Remote Sensing Derived; Survey/GCDB/Cadastral; Vector; Phone/Tablet; OtherDateCurrentThe last edit, update, of this GIS record. Date should follow the assigned NWCG Date Time data standard, using 24 hour clock, YYYY-MM-DDhh.mm.ssZ, ISO8601 Standard.CommentsAdditional information describing the feature. GeometryIDPrimary key for linking geospatial objects with other database systems. Required for every feature. This field may be renamed for each standard to fit the feature.JurisdictionalUnitID_sansUSNWCG Unit ID with the "US" characters removed from the beginning. Provided for backwards compatibility.JoinMethodAdditional information on how the polygon was matched information in the NWCG Unit ID database.LocalNameLocalName for the polygon provided from PADUS or other source.LegendJurisdictionalAgencyJurisdictional Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.LegendLandownerAgencyLandowner Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.DataSourceYearYear that the source data for the polygon were acquired.Data InputThis dataset is based on an aggregation of 4 spatial data sources: Protected Areas Database US (PAD-US 2.1), data from Bureau of Indian Affairs regional offices, the BLM Alaska Fire Service/State of Alaska, and Census Block-Group Geometry. NWCG Unit ID and Agency Kind/Category data are tabular and sourced from UnitIDActive.txt, in the WFMI Unit ID application (https://wfmi.nifc.gov/unit_id/Publish.html). Areas of with unknown Landowner Kind/Category and Jurisdictional Agency Kind/Category are assigned LandownerKind and LandownerCategory values of "Private" by use of the non-water polygons from the Census Block-Group geometry.PAD-US 2.1:This dataset is based in large part on the USGS Protected Areas Database of the United States - PAD-US 2.`. PAD-US is a compilation of authoritative protected areas data between agencies and organizations that ultimately results in a comprehensive and accurate inventory of protected areas for the United States to meet a variety of needs (e.g. conservation, recreation, public health, transportation, energy siting, ecological, or watershed assessments and planning). Extensive documentation on PAD-US processes and data sources is available.How these data were aggregated:Boundaries, and their descriptors, available in spatial databases (i.e. shapefiles or geodatabase feature classes) from land management agencies are the desired and primary data sources in PAD-US. If these authoritative sources are unavailable, or the agency recommends another source, data may be incorporated by other aggregators such as non-governmental organizations. Data sources are tracked for each record in the PAD-US geodatabase (see below).BIA and Tribal Data:BIA and Tribal land management data are not available in PAD-US. As such, data were aggregated from BIA regional offices. These data date from 2012 and were substantially updated in 2022. Indian Trust Land affiliated with Tribes, Reservations, or BIA Agencies: These data are not considered the system of record and are not intended to be used as such. The Bureau of Indian Affairs (BIA), Branch of Wildland Fire Management (BWFM) is not the originator of these data. The
Population based cancer incidence rates were abstracted from National Cancer Institute, State Cancer Profiles for all available counties in the United States for which data were available. This is a national county-level database of cancer data that are collected by state public health surveillance systems. All-site cancer is defined as any type of cancer that is captured in the state registry data, though non-melanoma skin cancer is not included. All-site age-adjusted cancer incidence rates were abstracted separately for males and females. County-level annual age-adjusted all-site cancer incidence rates for years 2006–2010 were available for 2687 of 3142 (85.5%) counties in the U.S. Counties for which there are fewer than 16 reported cases in a specific area-sex-race category are suppressed to ensure confidentiality and stability of rate estimates; this accounted for 14 counties in our study. Two states, Kansas and Virginia, do not provide data because of state legislation and regulations which prohibit the release of county level data to outside entities. Data from Michigan does not include cases diagnosed in other states because data exchange agreements prohibit the release of data to third parties. Finally, state data is not available for three states, Minnesota, Ohio, and Washington. The age-adjusted average annual incidence rate for all counties was 453.7 per 100,000 persons. We selected 2006–2010 as it is subsequent in time to the EQI exposure data which was constructed to represent the years 2000–2005. We also gathered data for the three leading causes of cancer for males (lung, prostate, and colorectal) and females (lung, breast, and colorectal). The EQI was used as an exposure metric as an indicator of cumulative environmental exposures at the county-level representing the period 2000 to 2005. A complete description of the datasets used in the EQI are provided in Lobdell et al. and methods used for index construction are described by Messer et al. The EQI was developed for the period 2000– 2005 because it was the time period for which the most recent data were available when index construction was initiated. The EQI includes variables representing each of the environmental domains. The air domain includes 87 variables representing criteria and hazardous air pollutants. The water domain includes 80 variables representing overall water quality, general water contamination, recreational water quality, drinking water quality, atmospheric deposition, drought, and chemical contamination. The land domain includes 26 variables representing agriculture, pesticides, contaminants, facilities, and radon. The built domain includes 14 variables representing roads, highway/road safety, public transit behavior, business environment, and subsidized housing environment. The sociodemographic environment includes 12 variables representing socioeconomics and crime. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., L. Messer, K. Rappazzo , C. Gray, S. Grabich , and D. Lobdell. County-level environmental quality and associations with cancer incidence#. Cancer. John Wiley & Sons Incorporated, New York, NY, USA, 123(15): 2901-2908, (2017).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the United States population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of United States across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2022, the population of United States was 333,287,557, a 0.38% increase year-by-year from 2021. Previously, in 2021, United States population was 332,031,554, an increase of 0.16% compared to a population of 331,511,512 in 2020. Over the last 20 plus years, between 2000 and 2022, population of United States increased by 51,125,146. In this period, the peak population was 333,287,557 in the year 2022. The numbers suggest that the population has not reached its peak yet and is showing a trend of further growth. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for United States Population by Year. You can refer the same here
This data provides locations and technical specifications of legacy versions (ver. 1.0 - ver. X.X) of the United States Wind Turbines database. Each release, typically done quarterly, updates the database with newly installed wind turbines, removes wind turbines that have been identified as dismantled, and applies other verifications based on updated imagery and ongoing quality-control. Turbine data were gathered from the Federal Aviation Administration's (FAA) Digital Obstacle File (DOF) and Obstruction Evaluation Airport Airspace Analysis (OE-AAA), the American Wind Energy Association (AWEA), Lawrence Berkeley National Laboratory (LBNL), and the United States Geological Survey (USGS), and were merged and collapsed into a single data set. Verification of the turbine positions was done by visual interpretation using high-resolution aerial imagery in ESRI ArcGIS Desktop. A locational error of plus or minus 10 meters for turbine locations was tolerated. Technical specifications for turbines were assigned based on the wind turbine make and models as provided by manufacturers and project developers directly, and via FAA datasets, information on the wind project developer or turbine manufacturer websites, or other online sources. Some facility and turbine information on make and model did not exist or was difficult to obtain. Thus, uncertainty may exist for certain turbine specifications. Similarly, some turbines were not yet built, not built at all, or for other reasons cannot be verified visually. Location and turbine specifications data quality are rated and a confidence is recorded for both. None of the data are field verified. The current version is available for download at https://doi.org/10.5066/F7TX3DN0. The USWTDB Viewer, created by the USGS Energy Resources Program, lets you visualize, inspect, interact, and download the most current USWTDB version only, through a dynamic web application. https://eerscmap.usgs.gov/uswtdb/viewer/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes an aggregated and event-correlated analysis of power outages in the United States, synthesized by integrating three data sources: the Environment for the Analysis of Geo-Located Energy Information (EAGLE-I), the Electric Emergency Incident Disturbance Report (DOE-417), and Annual Estimates of the Resident Population for Counties 2024 (CO-EST2024-POP). The EAGLE-I dataset, spanning from 2014 to 2023, encompasses over 146 million customers and offers county-level outage information at 15-minute intervals. The data has been processed, filtered, and aggregated to deliver an enhanced perspective on power outages, which are then correlated with DOE-417 data based on geographic location as well as the start and end times of events. For each major disturbance documented in DOE-417, essential metrics are defined to quantify the outages associated with the event. This dataset supports researchers in examining outages triggered by major disturbances like extreme weather and physical disruptions, thereby aiding studies on power system resilience.
Links to the raw data for generating the correlated dataset are included below as "DOE-417", "EAGLE-I", and "CO-EST2024-POP" resources.
Acknowledgement: This work is funded by the Laboratory Directed Research and Development (LDRD) at the Pacific Northwest National Laboratory (PNNL) as part of the Resilience Through Data-Driven, Intelligently Designed Control (RD2C) Initiative.
The Geographic Information Retrieval and Analysis System (GIRAS) was developed in the mid 70s to put into digital form a number of data layers which were of interest to the USGS. One of these data layers was the Hydrologic Units. The map is based on the Hydrologic Unit Maps published by the U.S. Geological Survey Office of Water Data Coordination, together with the list descriptions and name of region, subregion, accounting units, and cataloging unit. The hydrologic units are encoded with an eight- digit number that indicates the hydrologic region (first two digits), hydrologic subregion (second two digits), accounting unit (third two digits), and cataloging unit (fourth two digits). The data produced by GIRAS was originally collected at a scale of 1:250K. Some areas, notably major cities in the west, were recompiled at a scale of 1:100K. In order to join the data together and use the data in a geographic information system (GIS) the data were processed in the ARC/INFO GUS software package. Within the GIS, the data were edgematched and the neatline boundaries between maps were removed to create a single data set for the conterminous United States. NOTE: A version of this data theme that is more throughly checked (though based on smaller-scale maps) is available here: https://water.usgs.gov/lookup/getspatial?huc2m HUC, GIRAS, Hydrologic Units, 1:250 For the most current data and information relating to hydrologic unit codes (HUCs) please see http://water.usgs.gov/GIS/huc.html. The Watershed Boundary Dataset (WBD) is the most current data available for watershed delineation. See http://www.nrcs.usda.gov/wps/portal/nrcs/main/national/water/watersheds/dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Michigan Public Policy Survey (MPPS) is a program of state-wide surveys of local government leaders in Michigan. The MPPS is designed to fill an important information gap in the policymaking process. While there are ongoing surveys of the business community and of the citizens of Michigan, before the MPPS there were no ongoing surveys of local government officials that were representative of all general purpose local governments in the state. Therefore, while we knew the policy priorities and views of the state's businesses and citizens, we knew very little about the views of the local officials who are so important to the economies and community life throughout Michigan. The MPPS was launched in 2009 by the Center for Local, State, and Urban Policy (CLOSUP) at the University of Michigan and is conducted in partnership with the Michigan Association of Counties, Michigan Municipal League, and Michigan Townships Association. The associations provide CLOSUP with contact information for the survey's respondents, and consult on survey topics. CLOSUP makes all decisions on survey design, data analysis, and reporting, and receives no funding support from the associations. The surveys investigate local officials' opinions and perspectives on a variety of important public policy issues and solicit factual information about their localities relevant to policymaking. Over time, the program has covered issues such as fiscal, budgetary and operational policy, fiscal health, public sector compensation, workforce development, local-state governmental relations, intergovernmental collaboration, economic development strategies and initiatives such as placemaking and economic gardening, the role of local government in environmental sustainability, energy topics such as hydraulic fracturing ("fracking") and wind power, trust in government, views on state policymaker performance, opinions on the impacts of the Federal Stimulus Program (ARRA), and more. The program will investigate many other issues relevant to local and state policy in the future. A searchable database of every question the MPPS has asked is available on CLOSUP's website. Results of MPPS surveys are currently available as reports, and via online data tables. Out of a commitment to promoting public knowledge of Michigan local governance, the Center for Local, State, and Urban Policy is releasing public use datasets. In order to protect respondent confidentiality, CLOSUP has divided the data collected in each wave of the survey into separate datasets focused on different topics that were covered in the survey. Each dataset contains only variables relevant to that subject, and the datasets cannot be linked together. Variables have also been omitted or recoded to further protect respondent confidentiality. For researchers looking for a more extensive release of the MPPS data, restricted datasets are available through openICPSR's Virtual Data Enclave. Please note: additional waves of MPPS public use datasets are being prepared, and will be available as part of this project as soon as they are completed. For information on accessing MPPS public use and restricted datasets, please visit the MPPS data access page: http://closup.umich.edu/mpps-download-datasets
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de450955https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de450955
Abstract (en): The American College Catalog Study Database (CCS) contains academic data on 286 four-year colleges and universities in the United States. CCS is one of two databases produced by the Colleges and Universities 2000 project based at the University of California-Riverside. The CCS database comprises a sampled subset of institutions from the related Institutional Data Archive (IDA) on American Higher Education (ICPSR 34874). Coding for CCS was based on college catalogs obtained from College Source, Inc. The data are organized in a panel design, with measurements taken at five-year intervals: academic years 1975-76, 1980-81, 1985-86, 1990-91, 1995-96, 2000-01, 2005-06, and 2010-11. The database is based on information reported in each institution's college catalog, and includes data regarding changes in major academic units (schools and colleges), departments, interdisciplinary programs, and general education requirements. For schools and departments, changes in structure were coded, including new units, name changes, splits in units, units moved to new schools, reconstituted units, consolidated units, departments reduced to program status, and eliminated units. The American College Catalog Study Database (CCS) is intended to allow researchers to examine changes in the structure of institutionalized knowledge in four-year colleges and universities within the United States. For information on the study design, including detailed coding conventions, please see the Original P.I. Documentation section of the ICPSR Codebook. The data are not weighted. Dataset 1, Characteristics Variables, contains three weight variables (IDAWT, CCSWT, and CASEWEIGHT) which users may wish to apply during analysis. For additional information on weights, please see the Original P.I. Documentation section of the ICPSR Codebook. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. Response Rates: Approximately 75 percent of IDA institutions are included in CCS. For additional information on response rates, please see the Original P.I. Documentation section of the ICPSR Codebook. Four-year not-for-profit colleges and universities in the United States. Smallest Geographic Unit: state CCS includes 286 institutions drawn from the IDA sample of 384 United States four-year colleges and universities. CCS contains every IDA institution for which a full set of catalogs could be located at the initiation of the project in 2000. CCS contains seven datasets that can be linked through an institutional identification number variable (PROJ_ID). Since the data are organized in a panel format, it is also necessary to use a second variable (YEAR) to link datasets. For a brief description of each CCS dataset, please see Appendix B within the Original P.I. Documentation section of the ICPSR Codebook.There are date discrepancies between the data and the Original P.I. Documentation. Study Time Periods and Collection Dates reflect dates that are present in the data. No additional information was provided.Please note that the related data collection featuring the Institutional Data Archive on American Higher Education, 1970-2011, will be available as ICPSR 34874. Additional information on the American College Catalog Study Database (CCS) and the Institutional Data Archive (IDA) database can be found on the Colleges and Universities 2000 Web site.
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides micromagnetic simulation data collected from a series of computational experiments on the effects of polygonal system shape on the energy of different magnetic states in FeGe. The data here form the results of the study ‘Skyrmion states in thin confined polygonal nanostructures.’
The dataset is split into several directories:
Data
square-samples and triangle-samples
These directories contain final state ‘relaxed’ magnetization fields for square and triangle samples respectively. The files within are organised into directories such that a sample of side length d = 40nm and which was subjected to an applied field of 500mT is labelled d40b500. Within each directory are twelve VTK unstructured grid format files (with file extension “.vtu”). These can be viewed in a variety of programmes; as of the time of writing we recommend either ParaView or MayaVi. The twelve files correspond to twelve simulations for each sample simulated, corresponding to twelve states from which each sample was relaxed - these are described in the paper which this dataset accompanies, but we note the labels are:
‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘h’, ‘u’, ‘r1’, ‘r2’, ‘r3’, ‘h2’, ‘h3’
where:
0 - 4 are incomplete to overcomplete skyrmions,
h, h2 and h3 are helical states with different periodicities
r1-r3 are different random states
u is the uniform magnetisation
The vtu files are labelled according to parameters used in the simulation. For example, a file labelled ‘160_10_3_0_u_wd000000.vtu’ encodes that:
The simulation was of a sample with side length 160nm.
The simulation was of a sample of thickness 10nm.
The maximum length of an edge in the finite element mesh of the sample was 3nm.
The system was relaxed from the ‘u’.
‘wd’ encodes that the simulation was performed with a full demagnetizing calculation.
square-npys and triangle-npys
These directories contain computed information about each of the final states stored in square-samples and triangle-samples. This information is stored in NumPy npz files, and can be read in Python straightforwardly using the function numpy.load. Within each npz file, there are 8 arrays, each with 12 elements. These arrays are:
‘E’ - corresponds to the total energy of the relaxed state.
‘E_exchange’ - corresponds to the Exchange energy of the relaxed state.
‘E_demag’ - corresponds to the Demagnetizing energy of the relaxed state.
‘E_dmi’ - corresponds to the Dzyaloshinskii-Moriya energy of the relaxed state.
‘E_zeeman’ - corresponds to the Zeeman energy of the relaxed state.
‘S’ - Calculated Skyrmion number of the relaxed state.
‘S_abs’ - Calculated absolute Skyrmion number - see paper for calculation details.
‘m_av’ - Computed normalised average magnetisation in x, y, and z directions for relaxed state
The twelve elements here correspond to the aforementioned twelve states relaxed from, and the ordering of the array is that of the order given above.
square-classified and triangle-classified
These directories contain a labelled dataset which gives details about what the final state in each simulation is. The files are stored as plain text, and are labelled with the following structure (the meanings of which are defined in the paper which this dataset accompanies):
iSk - Incomplete Skyrmion
Sk, or a number n followed by Sk - n Skyrmions in the state.
He - A helical state
Target - A target state.
The files contain the names of png files which are generated from the vtu files in the format ‘d_165b_350_2.png’. This example, if found in the ‘Sk.txt’ file, means that the sample which was 165nm in side length and which was relaxed under a field of 350mT from initial state 2 was found at equilibrium in a Skyrmion state.
Figures
square-pngs and triangle-pngs
These directories contain generated pngs from the vtu files. These are included for convenience as they take several hours to generate. Each directory contains three subdirectories:
all-states
This directory contains the simulation results from all samples, in the format ‘d_165b_350_2.png’, which means that the image contained here is that of the 165nm side length sample relaxed under a 350mT field from initial state 2.
ground-state
This directory contains the images which correspond to the lowest energy state found from all of the initial states. These are labelled as ‘d_180b_50.png’, such that the image contained in this file is the the lowest energy state found from all twelve simulations of the 180nm sidelength under a 50mT field.
uniform-state
This directory contains the images which correspond to the states relaxed only from the uniform state. These are labelled such that an image labelled ‘d_55b_100.png’ is the state found from relaxing a 180nm sample under a 100mT applied field.
phase-diagrams
These are the generated phase diagrams which are found in the paper.
scripts
This folder contains Python scripts which generate the png files mentioned above, and also the phase diagram figures for the paper this dataset accompanies. The scripts are labelled descriptively with what they do - for e.g. ’triangle-generate-png-all-states.py’ contains the script which loads vtu files and generates the png files. The exception here is ’render.py’ which provides functions used across multiple scripts. These scripts can be modified - for example; the function 'export_vector_field' has many options which can be adjusted to, for example, plot different components of the magnetization.
In order to run the scripts reproducibly, in the root directory we have provided a Makefile which builds each component. In order to reproduce the figures yourself, on a Linux system, ParaView must be installed. The Makefile has been tested on Ubuntu 16.04 with ParaView 5.0.1. In addition, a number of Python dependencies must also be installed. These are:
scipy >=0.19.1
numpy >= 1.11.0
matplotlib == 1.5.2
pillow>=3.1.2
We have included a requirements.txt file which specifies these dependencies; they can be installed by running 'pip install -r requirements.txt' from the directory.
Once all dependencies are installed, simply run the command ‘make’ from the shell to build the Docker image and generate the figures. Note the scripts will take a long time to run - at the time of writing the runtime will be on the order of several hours on a high-specification desktop machine. For convenience, we have therefore included the generated figures within the repository (as noted above). It should be noted that for the versions used in the paper, adjustments have been made after the generation of the figures, (for e.g. to add images of states within the metastability figure, and overlaying boundaries in the phase diagrams).
If you want to reproduce only the phase diagrams, and not the pngs, the command ‘make phase-diagrams’ will do so. This is the smallest part of the figure reproduction, and takes around 5 minutes on a high-specification desktop.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The United States Census Bureau’s International Dataset provides estimates of country populations since 1950 and projections through 2050.
The U.S. Census Bureau provides estimates and projections for countries and areas that are recognized by the U.S. Department of State that have a population of at least 5,000. Specifically, the data set includes midyear population figures broken down by age and gender assignment at birth. Additionally, they provide time-series data for attributes including fertility rates, birth rates, death rates, and migration rates.
Fork this kernel to get started.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_international
https://cloud.google.com/bigquery/public-data/international-census
Dataset Source: www.census.gov
This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source -http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by Steve Richey from Unsplash.
What countries have the longest life expectancy?
Which countries have the largest proportion of their population under 25?
Which countries are seeing the largest net migration?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Non-Hispanic population of State Line by race. It includes the distribution of the Non-Hispanic population of State Line across various race categories as identified by the Census Bureau. The dataset can be utilized to understand the Non-Hispanic population distribution of State Line across relevant racial categories.
Key observations
With a zero Hispanic population, State Line is 100% Non-Hispanic. Among the Non-Hispanic population, the largest racial group is White alone with a population of 3 (100% of the total Non-Hispanic population).
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for State Line Population by Race & Ethnicity. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 50 states in the United States by Non-Hispanic Some Other Race (SOR) population, as estimated by the United States Census Bureau. It also highlights population changes in each states over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.