In 2023, Washington, D.C. had the highest population density in the United States, with 11,130.69 people per square mile. As a whole, there were about 94.83 residents per square mile in the U.S., and Alaska was the state with the lowest population density, with 1.29 residents per square mile. The problem of population density Simply put, population density is the population of a country divided by the area of the country. While this can be an interesting measure of how many people live in a country and how large the country is, it does not account for the degree of urbanization, or the share of people who live in urban centers. For example, Russia is the largest country in the world and has a comparatively low population, so its population density is very low. However, much of the country is uninhabited, so cities in Russia are much more densely populated than the rest of the country. Urbanization in the United States While the United States is not very densely populated compared to other countries, its population density has increased significantly over the past few decades. The degree of urbanization has also increased, and well over half of the population lives in urban centers.
This layer presents the Census 2010 Urbanized Areas (UA) and Urban Clusters (UC). A UA consists of contiguous, densely settled census block groups (BGs) and census blocks that meet minimum population density requirements (1000ppsm /500ppsm), along with adjacent densely settled census blocks that together encompass a population of at least 50,000 people. A UC consists of contiguous, densely settled census BGs and census blocks that meet minimum population density requirements, along with adjacent densely settled census blocks that together encompass a population of at least 2,500 people, but fewer than 50,000 people. The dataset covers the 50 States plus the District of Columbia within United States.
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Important Note: This item is in mature support as of June 2023 and will retire in December 2025. A new version of this item is available for your use.The layers going from 1:1 to 1:1.5M present the 2010 Census Urbanized Areas (UA) and Urban Clusters (UC). A UA consists of contiguous, densely settled census block groups (BGs) and census blocks that meet minimum population density requirements (1000 people per square mile (ppsm) / 500 ppsm), along with adjacent densely settled census blocks that together encompass a population of at least 50,000 people. A UC consists of contiguous, densely settled census BGs and census blocks that meet minimum population density requirements, along with adjacent densely settled census blocks that together encompass a population of at least 2,500 people, but fewer than 50,000 people. The dataset covers the 50 States plus the District of Columbia within United States. The layer going over 1:1.5M presents the urban areas in the United States derived from the urban areas layer of the Digital Chart of the World (DCW). It provides information about the locations, names, and populations of urbanized areas for conducting geographic analysis on national and large regional scales. To download the data for this layer as a layer package for use in ArcGIS desktop applications, refer to USA Census Urban Areas.
This map shows population density of the United States. Areas in darker magenta have much higher population per square mile than areas in orange or yellow. Data is from the U.S. Census Bureau’s 2020 Census Demographic and Housing Characteristics. The map's layers contain total population counts by sex, age, and race groups for Nation, State, County, Census Tract, and Block Group in the United States and Puerto Rico. From the Census:"Population density allows for broad comparison of settlement intensity across geographic areas. In the U.S., population density is typically expressed as the number of people per square mile of land area. The U.S. value is calculated by dividing the total U.S. population (316 million in 2013) by the total U.S. land area (3.5 million square miles).When comparing population density values for different geographic areas, then, it is helpful to keep in mind that the values are most useful for small areas, such as neighborhoods. For larger areas (especially at the state or country scale), overall population density values are less likely to provide a meaningful measure of the density levels at which people actually live, but can be useful for comparing settlement intensity across geographies of similar scale." SourceAbout the dataYou can use this map as is and you can also modify it to use other attributes included in its layers. This map's layers contain total population counts by sex, age, and race groups data from the 2020 Census Demographic and Housing Characteristics. This is shown by Nation, State, County, Census Tract, Block Group boundaries. Each geography layer contains a common set of Census counts based on available attributes from the U.S. Census Bureau. There are also additional calculated attributes related to this topic, which can be mapped or used within analysis.Vintage of boundaries and attributes: 2020 Demographic and Housing Characteristics Table(s): P1, H1, H3, P2, P3, P5, P12, P13, P17, PCT12 (Not all lines of these DHC tables are available in this feature layer.)Data downloaded from: U.S. Census Bureau’s data.census.gov siteDate the Data was Downloaded: May 25, 2023Geography Levels included: Nation, State, County, Census Tract, Block GroupNational Figures: included in Nation layer The United States Census Bureau Demographic and Housing Characteristics: 2020 Census Results 2020 Census Data Quality Geography & 2020 Census Technical Documentation Data Table Guide: includes the final list of tables, lowest level of geography by table and table shells for the Demographic Profile and Demographic and Housing Characteristics.News & Updates This map is ready to be used in ArcGIS Pro, ArcGIS Online and its configurable apps, Story Maps, dashboards, Notebooks, Python, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the U.S. Census Bureau when using this data. Data Processing Notes: These 2020 Census boundaries come from the US Census TIGER geodatabases. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For Census tracts and block groups, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract and block group boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2020 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are unchanged and available as attributes within the data table (units are square meters). The layer contains all US states, Washington D.C., and Puerto Rico. Census tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99). Block groups that fall within the same criteria (Block Group denoted as 0 with no area land) have also been removed.Percentages and derived counts, are calculated values (that can be identified by the "_calc_" stub in the field name). Field alias names were created based on the Table Shells file available from the Data Table Guide for the Demographic Profile and Demographic and Housing Characteristics. Not all lines of all tables listed above are included in this layer. Duplicative counts were dropped. For example, P0030001 was dropped, as it is duplicative of P0010001.To protect the privacy and confidentiality of respondents, their data has been protected using differential privacy techniques by the U.S. Census Bureau.
The number of Reddit users in the United States was forecast to continuously increase between 2024 and 2028 by in total 10.3 million users (+5.21 percent). After the ninth consecutive increasing year, the Reddit user base is estimated to reach 208.12 million users and therefore a new peak in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Mexico and Canada.
VITAL SIGNS INDICATOR Poverty (EQ5)
FULL MEASURE NAME The share of the population living in households that earn less than 200 percent of the federal poverty limit
LAST UPDATED December 2018
DESCRIPTION Poverty refers to the share of the population living in households that earn less than 200 percent of the federal poverty limit, which varies based on the number of individuals in a given household. It reflects the number of individuals who are economically struggling due to low household income levels.
DATA SOURCE U.S Census Bureau: Decennial Census http://www.nhgis.org (1980-1990) http://factfinder2.census.gov (2000)
U.S. Census Bureau: American Community Survey Form C17002 (2006-2017) http://api.census.gov
METHODOLOGY NOTES (across all datasets for this indicator) The U.S. Census Bureau defines a national poverty level (or household income) that varies by household size, number of children in a household, and age of householder. The national poverty level does not vary geographically even though cost of living is different across the United States. For the Bay Area, where cost of living is high and incomes are correspondingly high, an appropriate poverty level is 200% of poverty or twice the national poverty level, consistent with what was used for past equity work at MTC and ABAG. For comparison, however, both the national and 200% poverty levels are presented.
For Vital Signs, the poverty rate is defined as the number of people (including children) living below twice the poverty level divided by the number of people for whom poverty status is determined. Poverty rates do not include unrelated individuals below 15 years old or people who live in the following: institutionalized group quarters, college dormitories, military barracks, and situations without conventional housing. The household income definitions for poverty change each year to reflect inflation. The official poverty definition uses money income before taxes and does not include capital gains or noncash benefits (such as public housing, Medicaid, and food stamps). For the national poverty level definitions by year, see: https://www.census.gov/hhes/www/poverty/data/threshld/index.html For an explanation on how the Census Bureau measures poverty, see: https://www.census.gov/hhes/www/poverty/about/overview/measure.html
For the American Community Survey datasets, 1-year data was used for region, county, and metro areas whereas 5-year rolling average data was used for city and census tract.
To be consistent across metropolitan areas, the poverty definition for non-Bay Area metros is twice the national poverty level. Data were not adjusted for varying income and cost of living levels across the metropolitan areas.
This resource is a member of a series. The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2020 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new development, and so forth, may require boundary revisions. In addition, census tracts occasionally are split due to population growth, or combined as a result of substantial population decline. Census tract boundaries generally follow visible and identifiable features. They may follow legal boundaries such as minor civil division (MCD) or incorporated place boundaries in some States and situations to allow for census tract-to-governmental unit relationships where the governmental boundaries tend to remain unchanged between censuses. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.
The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.
DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 state summary including the following metrics, including the change from the data reported the previous day: COVID-19 Cases (confirmed and probable) COVID-19 Tests Reported (molecular and antigen) Daily Test Positivity Patients Currently Hospitalized with COVID-19 COVID-19-Associated Deaths Additional notes: The cumulative count of tests reported for 1/17/2021 includes 286,103 older tests from previous dates, which had been missing from previous reports due to a data processing error. The older tests were added to the cumulative count of tests reported, but they were not included in the calculation of change from the previous reporting day or daily percent test positivity. Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov. Starting April 4, 2022, negative rapid antigen and rapid PCR test results for SARS-CoV-2 are no longer required to be reported to the Connecticut Department of Public Health as of April 4. Negative test results from laboratory based molecular (PCR/NAAT) results are still required to be reported as are all positive test results from both molecular (PCR/NAAT) and antigen tests.
In 2023, the number of data compromises in the United States stood at 3,205 cases. Meanwhile, over 353 million individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2022, healthcare, financial services, and manufacturing were the three industry sectors that recorded most data breaches. The number of healthcare data breaches in the United States has gradually increased within the past few years. In the financial sector, data compromises increased almost twice between 2020 and 2022, while manufacturing saw an increase of more than three times in data compromise incidents. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.
Brazil and the United States are the two most populous countries in the Americas today. In 1500, the year that Pedro Álvares Cabral made landfall in present-day Brazil and claimed it for the Portuguese crown, it is estimated that there were roughly one million people living in the region. Some estimates for the present-day United States give a population of two million in the year 1500, although estimates vary greatly. By 1820, the population of the U.S. was still roughly double that of Brazil, but rapid growth in the 19th century would see it grow 4.5 times larger by 1890, before the difference shrunk during the 20th century. In 2024, the U.S. has a population over 340 million people, making it the third most populous country in the world, while Brazil has a population of almost 218 million and is the sixth most populous. Looking to the future, population growth is expected to be lower in Brazil than in the U.S. in the coming decades, as Brazil's fertility rates are already lower, and migration rates into the United States will be much higher. Historical development The indigenous peoples of present-day Brazil and the U.S. were highly susceptible to diseases brought from the Old World; combined with mass displacement and violence, their population growth rates were generally low, therefore migration from Europe and the import of enslaved Africans drove population growth in both regions. In absolute numbers, more Europeans migrated to North America than Brazil, whereas more slaves were transported to Brazil than the U.S., but European migration to Brazil increased significantly in the early 1900s. The U.S. also underwent its demographic transition much earlier than in Brazil, therefore its peak period of population growth was almost a century earlier than Brazil. Impact of ethnicity The demographics of these countries are often compared, not only because of their size, location, and historical development, but also due to the role played by ethnicity. In the mid-1800s, these countries had the largest slave societies in the world, but a major difference between the two was the attitude towards interracial procreation. In Brazil, relationships between people of different ethnic groups were more common and less stigmatized than in the U.S., where anti-miscegenation laws prohibited interracial relationships in many states until the 1960s. Racial classification was also more rigid in the U.S., and those of mixed ethnicity were usually classified by their non-white background. In contrast, as Brazil has a higher degree of mixing between those of ethnic African, American, and European heritage, classification is less obvious, and factors such as physical appearance or societal background were often used to determine racial standing. For most of the 20th century, Brazil's government promoted the idea that race was a non-issue and that Brazil was racially harmonious, but most now acknowledge that this actually ignored inequality and hindered progress. Racial inequality has been a prevalent problem in both countries since their founding, and today, whites generally fare better in terms of education, income, political representation, and even life expectancy. Despite this adversity, significant progress has been made in recent decades, as public awareness of inequality has increased, and authorities in both countries have made steps to tackle disparities in areas such as education, housing, and employment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Household Saving Rate in the United States increased to 4.60 percent in January from 3.50 percent in December of 2024. This dataset provides - United States Personal Savings Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
This resource is a member of a series. The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2020 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new development, and so forth, may require boundary revisions. In addition, census tracts occasionally are split due to population growth, or combined as a result of substantial population decline. Census tract boundaries generally follow visible and identifiable features. They may follow legal boundaries such as minor civil division (MCD) or incorporated place boundaries in some States and situations to allow for census tract-to-governmental unit relationships where the governmental boundaries tend to remain unchanged between censuses. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
NOTE: As of 2/16/2023, this table is not being updated. For data on COVID-19 updated (bivalent) booster coverage by town please to go to https://data.ct.gov/Health-and-Human-Services/COVID-19-Updated-Bivalent-Booster-Coverage-By-Town/bqd5-4jgh.
This table shows the number and percent of residents of each CT town that have initiated COVID-19 vaccination, are fully vaccinated and who have received additional dose 1.
All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected.
In the data shown here, a person who has received at least one dose of COVID-19 vaccine is considered to have initiated vaccination. A person is considered fully vaccinated if he/she has completed a primary vaccination series by receiving 2 doses of the Pfizer, Novavax or Moderna vaccines or 1 dose of the Johnson & Johnson vaccine. The fully vaccinated are a subset of the people who have received at least one dose.
A person who completed a Pfizer, Moderna, Novavax or Johnson & Johnson primary series (as defined above) and then had an additional monovalent dose of COVID-19 vaccine is considered to have had additional dose 1. The additional dose may be Pfizer, Moderna, Novavax or Johnson & Johnson and may be a different type from the primary series. For people who had a primary Pfizer or Moderna series, additional dose 1 was counted starting August 18th, 2021. For people with a Johnson & Johnson primary series additional dose 1 was counted starting October 22nd, 2021. For most people, additional dose 1 is a booster. However, additional dose 1 may represent a supplement to the primary series for a people who is moderately or severely immunosuppressed. Bivalent booster administrations are not included in the additional dose 1 calculations.
The percent with at least one dose many be over-estimated, and the percent fully vaccinated and with additional dose 1 may be under-estimated because of vaccine administration records for individuals that cannot be linked because of differences in how names or date of birth are reported.
Percentages are calculated using 2019 census data (https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Annual-Town-and-County-Population-for-Connecticut).
Town of residence is verified by geocoding the reported address and then mapping it to a town using municipal boundaries. If an address cannot be geocoded, the reported town is used, if available. People for whom an address is not currently available are shown in this table as “Address pending validation”. Out-of-state residents vaccinated by CT providers are excluded from the table.
Town-level coverage estimates have been capped at 100%. Observed coverage may be greater than 100% for multiple reasons, including census denominator data not including all individuals that currently reside in the town (e.g., part time residents, change in population size since the census), errors in address data or other reporting errors. Also, the percent with at least one dose many be over-estimated, and the percent fully vaccinated and with additional dose 1 may be under-estimated when records for an individual cannot be linked because of differences in how names or date of birth are reported.
Caution should be used when interpreting coverage estimates for towns with large college/university populations since coverage may be underestimated. In the census, college/university students who live on or just off campus would be counted in the college/university town. However, if a student was vaccinated while studying remotely in his/her hometown, the student may be counted as a vaccine recipient in that town.
SVI refers to the CDC's Social Vulnerability Index - a measure that combines 15 demographic variables to identify communities most vulnerable to negative health impacts from disasters and public health crises. Measures of social vulnerability include socioeconomic status, household composition, disability, race, ethnicity, language, and transportation limitations - among others. Towns with a "yes" in the "Has SVI tract >0.75" field are those that have at least one census tract that is in the top quartile of vulnerability (e.g., a high-need area). 34 towns in Connecticut have at least one census tract in the top quartile for vulnerability.
Connecticut COVID-19 Vaccine Program providers are required to report information on all COVID-19 vaccine doses administered to CT WiZ, the Connecticut Immunization Information System. Data on doses administered to CT residents out-of-state are being added to CT WiZ jurisdiction-by-jurisdiction. Doses administered by some Federal entities (including Department of Defense, Department of Correction, Department of Veteran’s Affairs, Indian Health Service) are not yet reported to CT WiZ. Data reported here reflect the vaccination records currently reported to CT WiZ.
Note: This dataset takes the place of the original "COVID-19 Vaccinations by Town" dataset (https://data.ct.gov/Health-and-Human-Services/COVID-19-Vaccinations-by-Town/pdqi-ds7f) , which will not be updated after 4/15/2021. A breakdown of vaccinations by town and by age group is also available here: https://data.ct.gov/Health-and-Human-Services/COVID-19-Vaccinations-by-Town-and-Age-Group/gngw-ukpw .
As part of continuous data quality improvement efforts, duplicate records were removed from the COVID-19 vaccination data during the weeks of 4/19/2021 and 4/26/2021.
IPUMS-International is an effort to inventory, preserve, harmonize, and disseminate census microdata from around the world. The project has collected the world's largest archive of publicly available census samples. The data are coded and documented consistently across countries and over time to facillitate comparative research. IPUMS-International makes these data available to qualified researchers free of charge through a web dissemination system.
The IPUMS project is a collaboration of the Minnesota Population Center, National Statistical Offices, and international data archives. Major funding is provided by the U.S. National Science Foundation and the Demographic and Behavioral Sciences Branch of the National Institute of Child Health and Human Development. Additional support is provided by the University of Minnesota Office of the Vice President for Research, the Minnesota Population Center, and Sun Microsystems.
National coverage
Households and Group Quarters
UNITS IDENTIFIED: - Dwellings: No - Vacant units: Yes - Households: Yes - Individuals: Yes - Group quarters: Yes
UNIT DESCRIPTIONS: - Households: Dwelling places excluding institutions and transient quarters. - Group quarters: No threshold was applied; in order for a household to be considered group quarters in 2000, it had to be on the list of group quarters that is continuously maintained by the Census Bureau.
Residents of the 50 states (not the outlying areas).
Census/enumeration data [cen]
MICRODATA SOURCE: U.S. Census Bureau
SAMPLE UNIT: Household
SAMPLE FRACTION: 5%
SAMPLE SIZE (person records): 14,081,466
Face-to-face [f2f]
The 2000 census used a long form questionnaire. Long Form Sampling Entities (LFSEs) were used to determine sampling rates. If the smallest LFSE that included all or any part of a block had an estimated housing unit count of less than 800, the housing units in the block were sampled at a 1-in-2 rate. If it had an estimated housing unit count of 800 or more but less than 1,200, units were sampled at a 1-in-4 rate. If a block was not in either of the two previous categories, and was part of an interim census tract with 2,000 or more estimated housing units, units were sampled at a 1-in-8 rate. Housing units in all remaining blocks were sampled at a 1-in-6 rate. When all sampling rates were taken into account across the nation, approximately 1 out of every 6 housing units was included in the Census 2000 sample.
UNDERCOUNT: No official estimates
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.
However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.
2 Data-set Introduction
2.1 Data Collection
We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:
The headline must have one or more words directly or indirectly related to COVID-19.
The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.
The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.
Avoid taking duplicate reports.
Maintain a time frame for the above mentioned newspapers.
To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.
2.2 Data Pre-processing and Statistics
Some pre-processing steps performed on the newspaper report dataset are as follows:
Remove hyperlinks.
Remove non-English alphanumeric characters.
Remove stop words.
Lemmatize text.
While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.
The primary data statistics of the two dataset are shown in Table 1 and 2.
Table 1: Covid-News-USA-NNK data statistics
No of words per headline
7 to 20
No of words per body content
150 to 2100
Table 2: Covid-News-BD-NNK data statistics No of words per headline
10 to 20
No of words per body content
100 to 1500
2.3 Dataset Repository
We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.
3 Literature Review
Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.
Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].
Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.
Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.
4 Our experiments and Result analysis
We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:
In February, both the news paper have talked about China and source of the outbreak.
StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.
Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.
Washington Post discussed global issues more than StarTribune.
StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.
While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.
We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases
where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States US: People Using Basic Sanitation Services: % of Population data was reported at 99.970 % in 2015. This records an increase from the previous number of 99.969 % for 2014. United States US: People Using Basic Sanitation Services: % of Population data is updated yearly, averaging 99.968 % from Dec 2000 (Median) to 2015, with 16 observations. The data reached an all-time high of 99.970 % in 2015 and a record low of 99.967 % in 2000. United States US: People Using Basic Sanitation Services: % of Population data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s USA – Table US.World Bank: Health Statistics. The percentage of people using at least basic sanitation services, that is, improved sanitation facilities that are not shared with other households. This indicator encompasses both people using basic sanitation services as well as those using safely managed sanitation services. Improved sanitation facilities include flush/pour flush to piped sewer systems, septic tanks or pit latrines; ventilated improved pit latrines, compositing toilets or pit latrines with slabs.; ; WHO/UNICEF Joint Monitoring Programme (JMP) for Water Supply, Sanitation and Hygiene (washdata.org).; Weighted Average;
This resource is a member of a series. The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2020 Census Participant Statistical Areas Program. The Census Bureau delineated the census tracts in situations where no local participant existed or where all the potential participants declined to participate. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. When first delineated, census tracts were designed to be homogeneous with respect to population characteristics, economic status, and living conditions. The spatial size of census tracts varies widely depending on the density of settlement. Physical changes in street patterns caused by highway construction, new development, and so forth, may require boundary revisions. In addition, census tracts occasionally are split due to population growth, or combined as a result of substantial population decline. Census tract boundaries generally follow visible and identifiable features. They may follow legal boundaries such as minor civil division (MCD) or incorporated place boundaries in some States and situations to allow for census tract-to-governmental unit relationships where the governmental boundaries tend to remain unchanged between censuses. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The "Forest Proximate People" (FPP) dataset is one of the data layers contributing to the development of indicator #13, “number of forest-dependent people in extreme poverty,” of the Collaborative Partnership on Forests (CPF) Global Core Set of forest-related indicators (GCS). The FPP dataset provides an estimate of the number of people living in or within 5 kilometers of forests (forest-proximate people) for the year 2019 with a spatial resolution of 100 meters at a global level.
For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L. Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: A new methodology and global estimates. Background Paper to The State of the World’s Forests 2022 report. Rome, FAO.
Contact points:
Maintainer: Leticia Pina
Maintainer: Sarah E., Castle
Data lineage:
The FPP data are generated using Google Earth Engine. Forests are defined by the Copernicus Global Land Cover (CGLC) (Buchhorn et al. 2020) classification system’s definition of forests: tree cover ranging from 15-100%, with or without understory of shrubs and grassland, and including both open and closed forests. Any area classified as forest sized ≥ 1 ha in 2019 was included in this definition. Population density was defined by the WorldPop global population data for 2019 (WorldPop 2018). High density urban populations were excluded from the analysis. High density urban areas were defined as any contiguous area with a total population (using 2019 WorldPop data for population) of at least 50,000 people and comprised of pixels all of which met at least one of two criteria: either the pixel a) had at least 1,500 people per square km, or b) was classified as “built-up” land use by the CGLC dataset (where “built-up” was defined as land covered by buildings and other manmade structures) (Dijkstra et al. 2020). Using these datasets, any rural people living in or within 5 kilometers of forests in 2019 were classified as forest proximate people. Euclidean distance was used as the measure to create a 5-kilometer buffer zone around each forest cover pixel. The scripts for generating the forest-proximate people and the rural-urban datasets using different parameters or for different years are published and available to users. For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L., Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: a new methodology and global estimates. Background Paper to The State of the World’s Forests 2022. Rome, FAO.
References:
Buchhorn, M., Smets, B., Bertels, L., De Roo, B., Lesiv, M., Tsendbazar, N.E., Herold, M., Fritz, S., 2020. Copernicus Global Land Service: Land Cover 100m: collection 3 epoch 2019. Globe.
Dijkstra, L., Florczyk, A.J., Freire, S., Kemper, T., Melchiorri, M., Pesaresi, M. and Schiavina, M., 2020. Applying the degree of urbanisation to the globe: A new harmonised definition reveals a different picture of global urbanisation. Journal of Urban Economics, p.103312.
WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University, 2018. Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP00645
Online resources:
GEE asset for "Forest proximate people - 5km cutoff distance"
In 2023, Washington, D.C. had the highest population density in the United States, with 11,130.69 people per square mile. As a whole, there were about 94.83 residents per square mile in the U.S., and Alaska was the state with the lowest population density, with 1.29 residents per square mile. The problem of population density Simply put, population density is the population of a country divided by the area of the country. While this can be an interesting measure of how many people live in a country and how large the country is, it does not account for the degree of urbanization, or the share of people who live in urban centers. For example, Russia is the largest country in the world and has a comparatively low population, so its population density is very low. However, much of the country is uninhabited, so cities in Russia are much more densely populated than the rest of the country. Urbanization in the United States While the United States is not very densely populated compared to other countries, its population density has increased significantly over the past few decades. The degree of urbanization has also increased, and well over half of the population lives in urban centers.