23 datasets found

c
Poverty Rate
data.ccrpc.org
data.cuuats.cloud.ccrpc.org
csv
Updated Oct 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Champaign County Regional Planning Commission (2024). Poverty Rate [Dataset]. https://data.ccrpc.org/dataset/poverty-rate
Explore at:
csv(393)Available download formats
Dataset updated
Oct 17, 2024
Dataset provided by
Champaign County Regional Planning Commission
Description
This poverty rate data shows what percentage of the measured population* falls below the poverty line. Poverty is closely related to income: different “poverty thresholds” are in place for different sizes and types of household. A family or individual is considered to be below the poverty line if that family or individual’s income falls below their relevant poverty threshold. For more information on how poverty is measured by the U.S. Census Bureau (the source for this indicator’s data), visit the U.S. Census Bureau’s poverty webpage.

The poverty rate is an important piece of information when evaluating an area’s economic health and well-being. The poverty rate can also be illustrative when considered in the contexts of other indicators and categories. As a piece of data, it is too important and too useful to omit from any indicator set.

The poverty rate for all individuals in the measured population in Champaign County has hovered around roughly 20% since 2005. However, it reached its lowest rate in 2021 at 14.9%, and its second lowest rate in 2023 at 16.3%. Although the American Community Survey (ACS) data shows fluctuations between years, given their margins of error, none of the differences between consecutive years’ estimates are statistically significant, making it impossible to identify a trend.

Poverty rate data was sourced from the U.S. Census Bureau’s American Community Survey 1-Year Estimates, which are released annually.

As with any datasets that are estimates rather than exact counts, it is important to take into account the margins of error (listed in the column beside each figure) when drawing conclusions from the data.

Due to the impact of the COVID-19 pandemic, instead of providing the standard 1-year data products, the Census Bureau released experimental estimates from the 1-year data in 2020. This includes a limited number of data tables for the nation, states, and the District of Columbia. The Census Bureau states that the 2020 ACS 1-year experimental tables use an experimental estimation methodology and should not be compared with other ACS data. For these reasons, and because data is not available for Champaign County, no data for 2020 is included in this Indicator.

For interested data users, the 2020 ACS 1-Year Experimental data release includes a dataset on Poverty Status in the Past 12 Months by Age.

*According to the U.S. Census Bureau document “How Poverty is Calculated in the ACS," poverty status is calculated for everyone but those in the following groups: “people living in institutional group quarters (such as prisons or nursing homes), people in military barracks, people in college dormitories, living situations without conventional housing, and unrelated individuals under 15 years old."

Sources: U.S. Census Bureau; American Community Survey, 2023 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using data.census.gov; (17 October 2024).; U.S. Census Bureau; American Community Survey, 2022 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using data.census.gov; (25 September 2023).; U.S. Census Bureau; American Community Survey, 2021 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using data.census.gov; (16 September 2022).; U.S. Census Bureau; American Community Survey, 2019 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using data.census.gov; (8 June 2021).; U.S. Census Bureau; American Community Survey, 2018 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using data.census.gov; (8 June 2021).; U.S. Census Bureau; American Community Survey, 2017 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (13 September 2018).; U.S. Census Bureau; American Community Survey, 2016 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (14 September 2017).; U.S. Census Bureau; American Community Survey, 2015 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (19 September 2016).; U.S. Census Bureau; American Community Survey, 2014 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2013 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2012 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2011 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2010 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2009 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2008 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2007 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2006 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2005 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).
N
United States Age Group Population Dataset: A Complete Breakdown of United...
neilsberg.com
csv, json
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). United States Age Group Population Dataset: A Complete Breakdown of United States Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aabf26b9-4983-11ef-ae5d-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jul 24, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Variables measured
Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the United States population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for United States. The dataset can be utilized to understand the population distribution of United States by age. For example, using this dataset, we can identify the largest age group in United States.

Key observations

The largest age group in United States was for the group of age 30 to 34 years years with a population of 22.71 million (6.86%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in United States was the 80 to 84 years years with a population of 6.25 million (1.89%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group in consideration

Population: The population for the specific age group in the United States is shown in this column.

% of Total Population: This column displays the population of each age group as a proportion of United States total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for United States Population by Age. You can refer the same here
c
Poverty Status by Town - Datasets - CTData.org
data.ctdata.org
Updated Mar 16, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Poverty Status by Town - Datasets - CTData.org [Dataset]. http://data.ctdata.org/dataset/poverty-status-by-town
Explore at:
Dataset updated
Mar 16, 2016
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Census Bureau determines that a person is living in poverty when his or her total household income compared with the size and composition of the household is below the poverty threshold. The Census Bureau uses the federal government's official definition of poverty to determine the poverty threshold. Beginning in 2000, individuals were presented with the option to select one or more races. In addition, the Census asked individuals to identify their race separately from identifying their Hispanic origin. The Census has published individual tables for the races and ethnicities provided as supplemental information to the main table that does not dissaggregate by race or ethnicity. Race categories include the following - White, Black or African American, American Indian or Alaska Native, Asian, Native Hawaiian or Other Pacific Islander, Some other race, and Two or more races. We are not including specific combinations of two or more races as the counts of these combinations are small. Ethnic categories include - Hispanic or Latino and White Non-Hispanic. This data comes from the American Community Survey (ACS) 5-Year estimates, table B17001. The ACS collects these data from a sample of households on a rolling monthly basis. ACS aggregates samples into one-, three-, or five-year periods. CTdata.org generally carries the five-year datasets, as they are considered to be the most accurate, especially for geographic areas that are the size of a county or smaller.Poverty status determined is the denominator for the poverty rate. It is the population for which poverty status was determined so when poverty is calculated they exclude institutionalized people, people in military group quarters, people in college dormitories, and unrelated individuals under 15 years of age.Below poverty level are households as determined by the thresholds based on the criteria of looking at household size, Below poverty level are households as determined by the thresholds based on the criteria of looking at household size, number of children, and age of householder.number of children, and age of householder.
Stock Portfolio Data with Prices and Indices
kaggle.com
Updated Mar 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikita Manaenkov (2025). Stock Portfolio Data with Prices and Indices [Dataset]. http://doi.org/10.34740/kaggle/dsv/11140976
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/11140976
Dataset updated
Mar 23, 2025
Dataset provided by
Kaggle
Authors
Nikita Manaenkov
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
This dataset consists of five CSV files that provide detailed data on a stock portfolio and related market performance over the last 5 years. It includes portfolio positions, stock prices, and major U.S. market indices (NASDAQ, S&P 500, and Dow Jones). The data is essential for conducting portfolio analysis, financial modeling, and performance tracking.

1. Portfolio

This file contains the portfolio composition with details about individual stock positions, including the quantity of shares, sector, and their respective weights in the portfolio. The data also includes the stock's closing price.

Columns:

Ticker: The stock symbol (e.g., AAPL, TSLA)

Quantity: The number of shares in the portfolio

Sector: The sector the stock belongs to (e.g., Technology, Healthcare)

Close: The closing price of the stock

Weight: The weight of the stock in the portfolio (as a percentage of total portfolio)

2. Portfolio Prices

This file contains historical pricing data for the stocks in the portfolio. It includes daily open, high, low, close prices, adjusted close prices, returns, and volume of traded stocks.

Columns:

Date: The date of the data point

Ticker: The stock symbol

Open: The opening price of the stock on that day

High: The highest price reached on that day

Low: The lowest price reached on that day

Close: The closing price of the stock

Adjusted: The adjusted closing price after stock splits and dividends

Returns: Daily percentage return based on close prices

Volume: The volume of shares traded that day

3. NASDAQ

This file contains historical pricing data for the NASDAQ Composite index, providing similar data as in the Portfolio Prices file, but for the NASDAQ market index.

Columns:

Date: The date of the data point

Ticker: The stock symbol (for NASDAQ index, this will be "IXIC")

Open: The opening price of the index

High: The highest value reached on that day

Low: The lowest value reached on that day

Close: The closing value of the index

Adjusted: The adjusted closing value after any corporate actions

Returns: Daily percentage return based on close values

Volume: The volume of shares traded

4. S&P 500

This file contains similar historical pricing data, but for the S&P 500 index, providing insights into the performance of the top 500 U.S. companies.

Columns:

Date: The date of the data point

Ticker: The stock symbol (for S&P 500 index, this will be "SPX")

Open: The opening price of the index

High: The highest value reached on that day

Low: The lowest value reached on that day

Close: The closing value of the index

Adjusted: The adjusted closing value after any corporate actions

Returns: Daily percentage return based on close values

Volume: The volume of shares traded

5. Dow Jones

This file contains similar historical pricing data for the Dow Jones Industrial Average, providing insights into one of the most widely followed stock market indices in the world.

Columns:

Date: The date of the data point

Ticker: The stock symbol (for Dow Jones index, this will be "DJI")

Open: The opening price of the index

High: The highest value reached on that day

Low: The lowest value reached on that day

Close: The closing value of the index

Adjusted: The adjusted closing value after any corporate actions

Returns: Daily percentage return based on close values

Volume: The volume of shares traded

Personal Portfolio Data

This data is received using a custom framework that fetches real-time and historical stock data from Yahoo Finance. It provides the portfolio’s data based on user-specific stock holdings and performance, allowing for personalized analysis. The personal framework ensures the portfolio data is automatically retrieved and updated with the latest stock prices, returns, and performance metrics.

This part of the dataset would typically involve data specific to a particular user’s stock positions, weights, and performance, which can be integrated with the other files for portfolio performance analysis.
🇺🇸 Fiscally US Cities
kaggle.com
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mexwell (2024). 🇺🇸 Fiscally US Cities [Dataset]. https://www.kaggle.com/datasets/mexwell/fiscally-us-cities
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 31, 2024
Dataset provided by
Kaggle
Authors
mexwell
Area covered
United States
Description
Motivation

In the United States, city governments provide many services: they run public school districts, administer certain welfare and health programs, build roads and manage airports, provide police and fire protection, inspect buildings, and often run water and utility systems. Cities also get revenues through certain local taxes, various fees and permit costs, sale of property, and through the fees they charge for the utilities they run.

It would be interesting to compare all these expenses and revenues across cities and over time, but also quite difficult. Cities share many of these service responsibilities with other government agencies: in one particular city, some roads may be maintained by the state government, some law enforcement provided by the county sheriff, some schools run by independent school districts with their own tax revenue, and some utilities run by special independent utility districts. These governmental structures vary greatly by state and by individual city. It would be hard to make a fair comparison without taking into account all these differences.

This dataset takes into account all those differences. The Lincoln Institute of Land Policy produces what they call “Fiscally Standardized Cities” (FiSCs), aggregating all services provided to city residents regardless of how they may be divided up by different government agencies and jurisdictions. Using this, we can study city expenses and revenues, and how the proportions of different costs vary over time.

Data

The dataset tracks over 200 American cities between 1977 and 2020. Each row represents one city for one year. Revenue and expenditures are broken down into more than 120 categories.

Values are available for FiSCs and also for the entities that make it up: the city, the county, independent school districts, and any special districts, such as utility districts. There are hence five versions of each variable, with suffixes indicating the entity. For example, taxes gives the FiSC’s tax revenue, while taxes_city, taxes_cnty, taxes_schl, and taxes_spec break it down for the city, county, school districts, and special districts.

The values are organized hierarchically. For example, taxes is the sum of tax_property (property taxes), tax_sales_general (sales taxes), tax_income (income tax), and tax_other (other taxes). And tax_income is itself the sum of tax_income_indiv (individual income tax) and tax_income_corp (corporate income tax) subcategories.

Variable Description

year Year for these values

city_name Name of the city, such as “AK: Anchorage”, where “AK” is the standard two-letter abbreviation for Alaska

city_population Estimated city population, based on Census data

county_name Name of the county the city is in

county_population Estimated county population, based on Census data

cpi Consumer Price Index for this year, scaled so that 2020 is 1.

relationship_city_school Type of school district. 1: City-wide independent school district that serves the entire city. 2: County-wide independent school district that serves the entire county. 3: One or more independent school districts whose boundaries extend beyond the city. 4: School district run by or dependent on the city. 5: School district run by or dependent on the county.

enrollment Estimated number of public school students living in the city.

districts_in_city Estimated number of school districts in the city.

consolidated_govt Whether the city has a consolidated city-county government (1 = yes, 0 = no). For example, Philadelphia’s city and county government are the same entity; they are not separate governments.

id2_city 12-digit city identifier, from the Annual Survey of State and Local Government Finances

id2_county 12-digit county identifier

city_types Two types: core and legacy. There are 150 core cities, “including the two largest cities in each state, plus all cities with populations of 150,000+ in 1980 and 200,000+ in 2010”. Legacy cities include “95 cities with population declines of at least 20 percent from their peak, poverty rates exceeding the national average, and a peak population of at least 50,000”. Some cities are both (denoted “core

The revenue and expenses variables are described in this detailed table. Further documentation is available on the FiSC Database website, linked in References below.

All monetary data is already adjusted for inflation, and is given in terms of 2020 US dollars per capita. The Consumer Price Index is provided for each year if you prefer to use numbers not adjusted for inflation, scaled so that 2020 is 1; simply divide each value by the CPI to get the value in that year’s nominal dollars. The total population is also provided if you want total values instead of per-capita values.

Questions

Do some exploratory data analysis. Are there any outlying cities? Any interesting trends and rela...
Z
INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET
data.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nafiz Sadman (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Kishor Datta Gupta
Nishat Anjum
Nafiz Sadman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh, United States
Description
Introduction

There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

2 Data-set Introduction

2.1 Data Collection

We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

The headline must have one or more words directly or indirectly related to COVID-19.

The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

Avoid taking duplicate reports.

Maintain a time frame for the above mentioned newspapers.

To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

2.2 Data Pre-processing and Statistics

Some pre-processing steps performed on the newspaper report dataset are as follows:

Remove hyperlinks.

Remove non-English alphanumeric characters.

Remove stop words.

Lemmatize text.

While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

The primary data statistics of the two dataset are shown in Table 1 and 2.

Table 1: Covid-News-USA-NNK data statistics

No of words per headline

7 to 20

No of words per body content

150 to 2100

Table 2: Covid-News-BD-NNK data statistics No of words per headline

10 to 20

No of words per body content

100 to 1500

2.3 Dataset Repository

We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

3 Literature Review

Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

4 Our experiments and Result analysis

We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

In February, both the news paper have talked about China and source of the outbreak.

StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

Washington Post discussed global issues more than StarTribune.

StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
U.S. poverty rate 1990-2023
statista.com
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). U.S. poverty rate 1990-2023 [Dataset]. https://www.statista.com/statistics/200463/us-poverty-rate-since-1990/
Explore at:
Dataset updated
Sep 16, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In 2023, the around 11.1 percent of the population was living below the national poverty line in the United States. Poverty in the United StatesAs shown in the statistic above, the poverty rate among all people living in the United States has shifted within the last 15 years. The United Nations Educational, Scientific and Cultural Organization (UNESCO) defines poverty as follows: “Absolute poverty measures poverty in relation to the amount of money necessary to meet basic needs such as food, clothing, and shelter. The concept of absolute poverty is not concerned with broader quality of life issues or with the overall level of inequality in society.” The poverty rate in the United States varies widely across different ethnic groups. American Indians and Alaska Natives are the ethnic group with the most people living in poverty in 2022, with about 25 percent of the population earning an income below the poverty line. In comparison to that, only 8.6 percent of the White (non-Hispanic) population and the Asian population were living below the poverty line in 2022. Children are one of the most poverty endangered population groups in the U.S. between 1990 and 2022. Child poverty peaked in 1993 with 22.7 percent of children living in poverty in that year in the United States. Between 2000 and 2010, the child poverty rate in the United States was increasing every year; however,this rate was down to 15 percent in 2022. The number of people living in poverty in the U.S. varies from state to state. Compared to California, where about 4.44 million people were living in poverty in 2022, the state of Minnesota had about 429,000 people living in poverty.

Instagram accounts with the most followers worldwide 2024

statista.com

Updated Jun 17, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon (2025). Instagram accounts with the most followers worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset updated

Jun 17, 2025

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.

              The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.

              How popular is Instagram?

              Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.

              Who uses Instagram?

              Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.

              Celebrity influencers on Instagram
              Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.

u
American Community Survey
gstore.unm.edu
csv, geojson, gml +5
Updated Mar 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/adecfea6-fcd7-4c41-8165-165c4490a9da/metadata/FGDC-STD-001-1998.html
Explore at:
kml(5), csv(5), xls(5), json(5), geojson(5), zip(5), gml(5), shp(5)Available download formats
Dataset updated
Mar 6, 2020
Dataset provided by
Earth Data Analysis Center
Time period covered
2018
Area covered
West Bounding Coordinate -109.050173 East Bounding Coordinate -103.001964 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.332172, New Mexico
Description
A broad and generalized selection of 2014-2018 US Census Bureau 2018 5-year American Community Survey population data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico Census tracts). The selection is not comprehensive, but allows a first-level characterization of total population, male and female, and both broad and narrowly-defined age groups. In addition to the standard selection of age-group breakdowns (by male or female), the dataset provides supplemental calculated fields which combine several attributes into one (for example, the total population of persons under 18, or the number of females over 65 years of age). The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users.The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. While the ACS contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by Census tract boundaries in New Mexico. Census tracts are small, relatively permanent statistical subdivisions of a county or equivalent entity, and were defined by local participants as part of the 2010 Census Participant Statistical Areas Program. The primary purpose of census tracts is to provide a stable set of geographic units for the presentation of census data and comparison back to previous decennial censuses. Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people. State and county boundaries always are census tract boundaries in the standard census geographic hierarchy. In a few rare instances, a census tract may consist of noncontiguous areas. These noncontiguous areas may occur where the census tracts are coextensive with all or parts of legal entities that are themselves noncontiguous. For the 2010 Census, the census tract code range of 9400 through 9499 was enforced for census tracts that include a majority American Indian population according to Census 2000 data and/or their area was primarily covered by federally recognized American Indian reservations and/or off-reservation trust lands; the code range 9800 through 9899 was enforced for those census tracts that contained little or no population and represented a relatively large special land use area such as a National Park, military installation, or a business/industrial park; and the code range 9900 through 9998 was enforced for those census tracts that contained only water area, no land area.
d
Current Population Survey (CPS)
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/AK4FDD
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
CDC COVID-19 Vaccine Tracker
kaggle.com
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). CDC COVID-19 Vaccine Tracker [Dataset]. https://www.kaggle.com/datasets/thedevastator/cdc-covid-19-vaccine-tracker
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 4, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description
CDC COVID-19 Vaccine Tracker

Cumulative and Daily Counts of COVID-19 Vaccine Doses in the United States

By Nicky Forster [source]

About this dataset

The dataset contains data points such as the cumulative count of people who have received at least one dose of the vaccine, new doses administered on a specific date, cumulative count of doses distributed in the country, percentage of population that has completed the full vaccine series, cumulative count of Pfizer and Moderna vaccine doses administered in each state, seven-day rolling averages for new doses administered and distributed, among others.

It also provides insights into the vaccination status at both national and state levels. The dataset includes information on the percentage of population that has received at least one dose of the vaccine, percentage of population that has completed the full vaccine series, cumulative counts per 100k population for both distributed and administered doses.

Additionally, it presents data specific to each state, including their abbreviation and name. It outlines details such as cumulative counts per 100k population for both distributed and administered doses in each state. Furthermore, it indicates if there were instances where corrections resulted in single-day negative counts.

The dataset is compiled from daily snapshots obtained from CDC's COVID Data Tracker. Please note that there may be reporting delays by healthcare providers up to 72 hours after administering a dose.

This comprehensive dataset serves various purposes including tracking vaccination progress over time across different locations within the United States. It can be used by researchers, policymakers or anyone interested in analyzing trends related to COVID-19 vaccination efforts at both national and state levels

How to use the dataset

Familiarize Yourself with the Columns: Take a look at the available columns in this dataset to understand what information is included. These columns provide details such as state abbreviations, state names, dates of data snapshots, cumulative counts of doses distributed and administered, people who have received at least one dose or completed the vaccine series, percentages of population coverage, manufacturer-specific data, and seven-day rolling averages.

Explore Cumulative Counts: The dataset includes cumulative counts that show the total number of doses distributed or administered over time. You can analyze these numbers to track trends in vaccination progress in different states or regions.

Analyze Daily Counts: The dataset also provides daily counts of new vaccine doses distributed and administered on specific dates. By examining these numbers, you can gain insights into vaccination rates on a day-to-day basis.

Study Population Coverage Metrics: Metrics such as pct_population_received_at_least_one_dose and pct_population_series_complete give you an understanding of how much of each state's population has received at least one dose or completed their vaccine series respectively.

Utilize Manufacturer Data: The columns related to Pfizer and Moderna provide information about the number of doses administered for each manufacturer separately. By analyzing this data, you can compare vaccination rates between different vaccines.

Consider Rolling Averages: The seven-day rolling average columns allow you to smooth out fluctuations in daily counts by calculating an average over a week's time window. This can help identify long-term trends more accurately.

Compare States: You can compare vaccination progress between different states by filtering the dataset based on state names or abbreviations. This way, you can observe variations in distribution and administration rates among different regions.

Visualize the Data: Creating charts and graphs will help you visualize the data more effectively. Plotting trends over time or comparing different metrics for various states can provide powerful visual representations of vaccination progress.

Stay Informed: Keep in mind that this dataset is continuously updated as new data becomes available. Make sure to check for any updates or refreshed datasets to obtain the most recent information on COVID-19 vaccine distributions and administrations

Research Ideas

Vaccination Analysis: This dataset can be used to analyze the progress of COVID-19 vaccinations in the United States. By examining the cumulative counts of doses distributed and administered, as well as the number of people who have received at least one dose or completed the vaccine series, researchers and policymakers can assess how effectively vaccines are being rolled out and monitor...
d
PREDIK Data-Driven: Geospatial Data | USA | Tailor-made datasets: Foot...
datarade.ai
Updated Oct 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Predik Data-driven (2021). PREDIK Data-Driven: Geospatial Data | USA | Tailor-made datasets: Foot traffic & Places Data [Dataset]. https://datarade.ai/data-products/predik-data-driven-geospatial-data-usa-tailor-made-datas-predik-data-driven
Explore at:
.json, .csv, .xls, .sqlAvailable download formats
Dataset updated
Oct 13, 2021
Dataset authored and provided by
Predik Data-driven
Area covered
United States
Description
This Location Data & Foot traffic dataset available for all countries include enriched raw mobility data and visitation at POIs to answer questions such as:

-How often do people visit a location? (daily, monthly, absolute, and averages). -What type of places do they visit ? (parks, schools, hospitals, etc) -Which social characteristics do people have in a certain POI? - Breakdown by type: residents, workers, visitors. -What's their mobility like enduring night hours & day hours?
-What's the frequency of the visits partition by day of the week and hour of the day?

Extra insights -Visitors´ relative income Level. -Visitors´ preferences as derived by their visits to shopping, parks, sports facilities, churches, among others.

Overview & Key Concepts Each record corresponds to a ping from a mobile device, at a particular moment in time and at a particular latitude and longitude. We procure this data from reliable technology partners, which obtain it through partnerships with location-aware apps. All the process is compliant with applicable privacy laws.

We clean and process these massive datasets with a number of complex, computer-intensive calculations to make them easier to use in different data science and machine learning applications, especially those related to understanding customer behavior.

Featured attributes of the data Device speed: based on the distance between each observation and the previous one, we estimate the speed at which the device is moving. This is particularly useful to differentiate between vehicles, pedestrians, and stationery observations.

Night base of the device: we calculate the approximated location of where the device spends the night, which is usually their home neighborhood.

Day base of the device: we calculate the most common daylight location during weekdays, which is usually their work location.

Income level: we use the night neighborhood of the device, and intersect it with available socioeconomic data, to infer the device’s income level. Depending on the country, and the availability of good census data, this figure ranges from a relative wealth index to a currency-calculated income.

POI visited: we intersect each observation with a number of POI databases, to estimate check-ins to different locations. POI databases can vary significantly, in scope and depth, between countries.

Category of visited POI: for each observation that can be attributable to a POI, we also include a standardized location category (park, hospital, among others). Coverage: Worldwide.

Delivery schemas We can deliver the data in three different formats:

Full dataset: one record per mobile ping. These datasets are very large, and should only be consumed by experienced teams with large computing budgets.

Visitation stream: one record per attributable visit. This dataset is considerably smaller than the full one but retains most of the more valuable elements in the dataset. This helps understand who visited a specific POI, characterize and understand the consumer's behavior.

Audience profiles: one record per mobile device in a given period of time (usually monthly). All the visitation stream is aggregated by category. This is the most condensed version of the dataset and is very useful to quickly understand the types of consumers in a particular area and to create cohorts of users.
Single-earner and dual-earner census families by number of children
www150.statcan.gc.ca
ouvert.canada.ca
+2more
Updated Jun 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2024). Single-earner and dual-earner census families by number of children [Dataset]. http://doi.org/10.25318/1110002801-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1110002801-eng
Dataset updated
Jun 27, 2024
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Families of tax filers; Single-earner and dual-earner census families by number of children (final T1 Family File; T1FF).
d
Data from: What We Eat In America (WWEIA) Database
catalog.data.gov
agdatacommons.nal.usda.gov
+1more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). What We Eat In America (WWEIA) Database [Dataset]. https://catalog.data.gov/dataset/what-we-eat-in-america-wweia-database-f7f35
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Area covered
United States
Description
What We Eat in America (WWEIA) is the dietary intake interview component of the National Health and Nutrition Examination Survey (NHANES). WWEIA is conducted as a partnership between the U.S. Department of Agriculture (USDA) and the U.S. Department of Health and Human Services (DHHS). Two days of 24-hour dietary recall data are collected through an initial in-person interview, and a second interview conducted over the telephone within three to 10 days. Participants are given three-dimensional models (measuring cups and spoons, a ruler, and two household spoons) and/or USDA's Food Model Booklet (containing drawings of various sizes of glasses, mugs, bowls, mounds, circles, and other measures) to estimate food amounts. WWEIA data are collected using USDA's dietary data collection instrument, the Automated Multiple-Pass Method (AMPM). The AMPM is a fully computerized method for collecting 24-hour dietary recalls either in-person or by telephone. For each 2-year data release cycle, the following dietary intake data files are available: Individual Foods File - Contains one record per food for each survey participant. Foods are identified by USDA food codes. Each record contains information about when and where the food was consumed, whether the food was eaten in combination with other foods, amount eaten, and amounts of nutrients provided by the food. Total Nutrient Intakes File - Contains one record per day for each survey participant. Each record contains daily totals of food energy and nutrient intakes, daily intake of water, intake day of week, total number foods reported, and whether intake was usual, much more than usual or much less than usual. The Day 1 file also includes salt use in cooking and at the table; whether on a diet to lose weight or for other health-related reason and type of diet; and frequency of fish and shellfish consumption (examinees one year or older, Day 1 file only). DHHS is responsible for the sample design and data collection, and USDA is responsible for the survey’s dietary data collection methodology, maintenance of the databases used to code and process the data, and data review and processing. USDA also funds the collection and processing of Day 2 dietary intake data, which are used to develop variance estimates and calculate usual nutrient intakes. Resources in this dataset:Resource Title: What We Eat In America (WWEIA) main web page. File Name: Web Page, url: https://www.ars.usda.gov/northeast-area/beltsville-md-bhnrc/beltsville-human-nutrition-research-center/food-surveys-research-group/docs/wweianhanes-overview/ Contains data tables, research articles, documentation data sets and more information about the WWEIA program. (Link updated 05/13/2020)
N
Atlanta, GA Population Breakdown By Race (Excluding Ethnicity) Dataset:...
neilsberg.com
csv, json
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Atlanta, GA Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/atlanta-ga-population-by-race/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 21, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Georgia, Atlanta
Variables measured
Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Atlanta by race. It includes the population of Atlanta across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Atlanta across relevant racial categories.

Key observations

The percent distribution of Atlanta population by race (across all racial categories recognized by the U.S. Census Bureau): 39.89% are white, 46.92% are Black or African American, 0.28% are American Indian and Alaska Native, 4.98% are Asian, 0.06% are Native Hawaiian and other Pacific Islander, 2.08% are some other race and 5.80% are multiracial.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Racial categories include:

White

Black or African American

American Indian and Alaska Native

Asian

Native Hawaiian and Other Pacific Islander

Some other race

Two or more races (multiracial)

Variables / Data Columns

Race: This column displays the racial categories (excluding ethnicity) for the Atlanta

Population: The population of the racial category (excluding ethnicity) in the Atlanta is shown in this column.

% of Total Population: This column displays the percentage distribution of each race as a proportion of Atlanta total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Atlanta Population by Race & Ethnicity. You can refer the same here
d
Data from: Survey of Gun Owners in the United States, 1996
catalog.data.gov
icpsr.umich.edu
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Survey of Gun Owners in the United States, 1996 [Dataset]. https://catalog.data.gov/dataset/survey-of-gun-owners-in-the-united-states-1996-6028b
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justice
Area covered
United States
Description
This study was undertaken to obtain information on the characteristics of gun ownership, gun-carrying practices, and weapons-related incidents in the United States -- specifically, gun use and other weapons used in self-defense against humans and animals. Data were gathered using a national random-digit-dial telephone survey. The respondents were comprised of 1,905 randomly-selected adults aged 18 and older living in the 50 United States. All interviews were completed between May 28 and July 2, 1996. The sample was designed to be a representative sample of households, not of individuals, so researchers did not interview more than one adult from each household. To start the interview, six qualifying questions were asked, dealing with (1) gun ownership, (2) gun-carrying practices, (3) gun display against the respondent, (4) gun use in self-defense against animals, (5) gun use in self-defense against people, and (6) other weapons used in self-defense. A "yes" response to a qualifying question led to a series of additional questions on the same topic as the qualifying question. Part 1, Survey Data, contains the coded data obtained during the interviews, and Part 2, Open-Ended-Verbatim Responses, consists of the answers to open-ended questions provided by the respondents. Information collected for Part 1 covers how many firearms were owned by household members, types of firearms owned (handguns, revolvers, pistols, fully automatic weapons, and assault weapons), whether the respondent personally owned a gun, reasons for owning a gun, type of gun carried, whether the gun was ever kept loaded, kept concealed, used for personal protection, or used for work, and whether the respondent had a permit to carry the gun. Additional questions focused on incidents in which a gun was displayed in a hostile manner against the respondent, including the number of times such an incident took place, the location of the event in which the gun was displayed against the respondent, whether the police were contacted, whether the individual displaying the gun was known to the respondent, whether the incident was a burglary, robbery, or other planned assault, and the number of shots fired during the incident. Variables concerning gun use by the respondent in self-defense against an animal include the number of times the respondent used a gun in this manner and whether the respondent was hunting at the time of the incident. Other variables in Part 1 deal with gun use in self-defense against people, such as the location of the event, if the other individual knew the respondent had a gun, the type of gun used, any injuries to the respondent or to the individual that required medical attention or hospitalization, whether the incident was reported to the police, whether there were any arrests, whether other weapons were used in self-defense, the type of other weapon used, location of the incident in which the other weapon was used, and whether the respondent was working as a police officer or security guard or was in the military at the time of the event. Demographic variables in Part 1 include the gender, race, age, household income, and type of community (city, suburb, or rural) in which the respondent lived. Open-ended questions asked during the interview comprise the variables in Part 2. Responses include descriptions of where the respondent was when he or she displayed a gun (in self-defense or otherwise), specific reasons why the respondent displayed a gun, how the other individual reacted when the respondent displayed the gun, how the individual knew the respondent had a gun, whether the police were contacted for specific self-defense events, and if not, why not.
2017 Census of Agriculture - Census Data Query Tool (CDQT)
agdatacommons.nal.usda.gov
bin
Updated Feb 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA National Agricultural Statistics Service (2024). 2017 Census of Agriculture - Census Data Query Tool (CDQT) [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/2017_Census_of_Agriculture_-_Census_Data_Query_Tool_CDQT_/24663345
Explore at:
binAvailable download formats
Dataset updated
Feb 13, 2024
Dataset provided by
United States Department of Agriculturehttp://usda.gov/
National Agricultural Statistics Servicehttp://www.nass.usda.gov/
Authors
USDA National Agricultural Statistics Service
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Census of Agriculture is a complete count of U.S. farms and ranches and the people who operate them. Even small plots of land - whether rural or urban - growing fruit, vegetables or some food animals count if $1,000 or more of such products were raised and sold, or normally would have been sold, during the Census year. The Census of Agriculture, taken only once every five years, looks at land use and ownership, operator characteristics, production practices, income and expenditures. For America's farmers and ranchers, the Census of Agriculture is their voice, their future, and their opportunity. The Census Data Query Tool (CDQT) is a web-based tool that is available to access and download table level data from the Census of Agriculture Volume 1 publication. The data found via the CDQT may also be accessed in the NASS Quick Stats database. The CDQT is unique in that it automatically displays data from the past five Census of Agriculture publications. The CDQT is presented as a "2017 centric" view of the Census of Agriculture data. All data series that are present in the 2017 dataset are available within the CDQT, and any matching data series from prior Census years will also display (back to 1997). If a data series is not included in the 2017 dataset, then data cells will remain blank in the tool. For example, one of the data series had a label change from "Operator" to "Producer." This means that data from prior Census years labelled "Operator" will not show up where the label has changed to “Producer” for 2017. The new Census Data Query Tool application can be used to query Census data from 1997 through 2017. Data are searchable by Census table and are downloadable as CSV or PDF files. 2017 Census Ag Atlas Maps are also available for download. Resources in this dataset:Resource Title: 2017 Census of Agriculture - Census Data Query Tool (CDQT). File Name: Web Page, url: https://www.nass.usda.gov/Quick_Stats/CDQT/chapter/1/table/1 The Census Data Query Tool (CDQT) is a web based tool that is available to access and download table level data from the Census of Agriculture Volume 1 publication. The data found via the CDQT may also be accessed in the NASS Quick Stats database. The CDQT is unique in that it automatically displays data from the past five Census of Agriculture publications. The CDQT is presented as a "2017 centric" view of the Census of Agriculture data. All data series that are present in the 2017 dataset are available within the CDQT, and any matching data series from prior Census years will also display (back to 1997). If a data series is not included in the 2017 dataset, then data cells will remain blank in the tool. For example, one of the data series had a label change from "Operator" to "Producer." This means that data from prior Census years labelled "Operator" will not show up where the label has changed to "Producer" for 2017. Using CDQT:

Upon entering the CDQT, a data table is present. Changing the parameters at the top of the data table will retrieve different combinations of Census Chapter, Table, State, or County (when selecting Chapter 2). For the U.S., Volume 1, US/State Chapter 1 will include only U.S. data; Chapter 2 will include U.S. and State level data. For a State, Volume 1 US/State Level Data Chapter 1 will include only the State level data; Chapter 2 will include the State and county level data. Once a selection is made, press the “Update Grid” button to retrieve the new data table. Comma-separated values (CSV) download, compatible with most spreadsheet and database applications: to download a CSV file of the data as it is currently presented in the data grid, press the "CSV" button in the "Export Data" section of the toolbar. When CSV is chosen, data will be downloaded as numeric. To view the source PDF file for the data table, press the "View PDF" button in the toolbar.
Number of data compromises and impacted individuals in U.S. 2005-2024
statista.com
ai-chatbox.pro
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of data compromises and impacted individuals in U.S. 2005-2024 [Dataset]. https://www.statista.com/statistics/273550/data-breaches-recorded-in-the-united-states-by-number-of-breaches-and-records-exposed/
Explore at:
Dataset updated
May 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In 2024, the number of data compromises in the United States stood at 3,158 cases. Meanwhile, over 1.35 billion individuals were affected in the same year by data compromises, including data breaches, leakage, and exposure. While these are three different events, they have one thing in common. As a result of all three incidents, the sensitive data is accessed by an unauthorized threat actor. Industries most vulnerable to data breaches Some industry sectors usually see more significant cases of private data violations than others. This is determined by the type and volume of the personal information organizations of these sectors store. In 2024 the financial services, healthcare, and professional services were the three industry sectors that recorded most data breaches. Overall, the number of healthcare data breaches in some industry sectors in the United States has gradually increased within the past few years. However, some sectors saw decrease. Largest data exposures worldwide In 2020, an adult streaming website, CAM4, experienced a leakage of nearly 11 billion records. This, by far, is the most extensive reported data leakage. This case, though, is unique because cyber security researchers found the vulnerability before the cyber criminals. The second-largest data breach is the Yahoo data breach, dating back to 2013. The company first reported about one billion exposed records, then later, in 2017, came up with an updated number of leaked records, which was three billion. In March 2018, the third biggest data breach happened, involving India’s national identification database Aadhaar. As a result of this incident, over 1.1 billion records were exposed.

Instagram: distribution of global audiences 2024, by age group

statista.com

Updated Jun 17, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon (2025). Instagram: distribution of global audiences 2024, by age group [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset updated

Jun 17, 2025

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

As of April 2024, almost 32 percent of global Instagram audiences were aged between 18 and 24 years, and 30.6 percent of users were aged between 25 and 34 years. Overall, 16 percent of users belonged to the 35 to 44 year age group.

              Instagram users

              With roughly one billion monthly active users, Instagram belongs to the most popular social networks worldwide. The social photo sharing app is especially popular in India and in the United States, which have respectively 362.9 million and 169.7 million Instagram users each.

              Instagram features

              One of the most popular features of Instagram is Stories. Users can post photos and videos to their Stories stream and the content is live for others to view for 24 hours before it disappears. In January 2019, the company reported that there were 500 million daily active Instagram Stories users. Instagram Stories directly competes with Snapchat, another photo sharing app that initially became famous due to it’s “vanishing photos” feature.
              As of the second quarter of 2021, Snapchat had 293 million daily active users.

HNWI worldwide 2024, by country
statista.com
Updated Mar 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). HNWI worldwide 2024, by country [Dataset]. https://www.statista.com/forecasts/1171539/hnwi-by-country
Explore at:
Dataset updated
Mar 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 1, 2024 - Dec 31, 2024
Area covered
Albania
Description
The United States is leading the ranking by number of high networth individuals , recording 26.9 million individuals. Following closely behind is China with 13.9 million individuals, while Lesotho is trailing the ranking with 0 thousand individuals, resulting in a difference of 26.9 million individuals to the ranking leader, the United States. High Net Worth Individuals are here defined as persons with investible assets of at least one million U.S. dollars in current exchange rate terms.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in more than 150 countries and regions worldwide. All input data are sourced from international institutions, national statistical offices, and trade associations. All data has been are processed to generate comparable datasets (see supplementary notes under details for more information).

Facebook

Twitter

Click to copy link

Link copied

Cite

Champaign County Regional Planning Commission (2024). Poverty Rate [Dataset]. https://data.ccrpc.org/dataset/poverty-rate

Poverty Rate

Explore at:

csv(393)Available download formats

Dataset updated

Oct 17, 2024

Dataset provided by

Champaign County Regional Planning Commission

Description

This poverty rate data shows what percentage of the measured population* falls below the poverty line. Poverty is closely related to income: different “poverty thresholds” are in place for different sizes and types of household. A family or individual is considered to be below the poverty line if that family or individual’s income falls below their relevant poverty threshold. For more information on how poverty is measured by the U.S. Census Bureau (the source for this indicator’s data), visit the U.S. Census Bureau’s poverty webpage.

The poverty rate is an important piece of information when evaluating an area’s economic health and well-being. The poverty rate can also be illustrative when considered in the contexts of other indicators and categories. As a piece of data, it is too important and too useful to omit from any indicator set.

The poverty rate for all individuals in the measured population in Champaign County has hovered around roughly 20% since 2005. However, it reached its lowest rate in 2021 at 14.9%, and its second lowest rate in 2023 at 16.3%. Although the American Community Survey (ACS) data shows fluctuations between years, given their margins of error, none of the differences between consecutive years’ estimates are statistically significant, making it impossible to identify a trend.

Poverty rate data was sourced from the U.S. Census Bureau’s American Community Survey 1-Year Estimates, which are released annually.

As with any datasets that are estimates rather than exact counts, it is important to take into account the margins of error (listed in the column beside each figure) when drawing conclusions from the data.

Due to the impact of the COVID-19 pandemic, instead of providing the standard 1-year data products, the Census Bureau released experimental estimates from the 1-year data in 2020. This includes a limited number of data tables for the nation, states, and the District of Columbia. The Census Bureau states that the 2020 ACS 1-year experimental tables use an experimental estimation methodology and should not be compared with other ACS data. For these reasons, and because data is not available for Champaign County, no data for 2020 is included in this Indicator.

For interested data users, the 2020 ACS 1-Year Experimental data release includes a dataset on Poverty Status in the Past 12 Months by Age.

*According to the U.S. Census Bureau document “How Poverty is Calculated in the ACS," poverty status is calculated for everyone but those in the following groups: “people living in institutional group quarters (such as prisons or nursing homes), people in military barracks, people in college dormitories, living situations without conventional housing, and unrelated individuals under 15 years old."

Sources: U.S. Census Bureau; American Community Survey, 2023 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using data.census.gov; (17 October 2024).; U.S. Census Bureau; American Community Survey, 2022 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using data.census.gov; (25 September 2023).; U.S. Census Bureau; American Community Survey, 2021 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using data.census.gov; (16 September 2022).; U.S. Census Bureau; American Community Survey, 2019 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using data.census.gov; (8 June 2021).; U.S. Census Bureau; American Community Survey, 2018 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using data.census.gov; (8 June 2021).; U.S. Census Bureau; American Community Survey, 2017 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (13 September 2018).; U.S. Census Bureau; American Community Survey, 2016 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (14 September 2017).; U.S. Census Bureau; American Community Survey, 2015 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (19 September 2016).; U.S. Census Bureau; American Community Survey, 2014 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2013 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2012 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2011 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2010 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2009 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2008 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2007 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2006 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).; U.S. Census Bureau; American Community Survey, 2005 American Community Survey 1-Year Estimates, Table S1701; generated by CCRPC staff; using American FactFinder; (16 March 2016).

Clear search

Close search

Google apps

Main menu

Poverty Rate

United States Age Group Population Dataset: A Complete Breakdown of United...

About this dataset

Content

Inspiration

Recommended for further research

Poverty Status by Town - Datasets - CTData.org

Stock Portfolio Data with Prices and Indices

1. Portfolio

2. Portfolio Prices

3. NASDAQ

4. S&P 500

5. Dow Jones

Personal Portfolio Data

🇺🇸 Fiscally US Cities

Motivation

Data

Variable Description

Questions

INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

U.S. poverty rate 1990-2023

Instagram accounts with the most followers worldwide 2024

American Community Survey

Current Population Survey (CPS)

CDC COVID-19 Vaccine Tracker

CDC COVID-19 Vaccine Tracker

Cumulative and Daily Counts of COVID-19 Vaccine Doses in the United States

About this dataset

How to use the dataset

Research Ideas

PREDIK Data-Driven: Geospatial Data | USA | Tailor-made datasets: Foot...

Single-earner and dual-earner census families by number of children

Data from: What We Eat In America (WWEIA) Database

Atlanta, GA Population Breakdown By Race (Excluding Ethnicity) Dataset:...

About this dataset

Content

Inspiration

Recommended for further research

Data from: Survey of Gun Owners in the United States, 1996

2017 Census of Agriculture - Census Data Query Tool (CDQT)

Number of data compromises and impacted individuals in U.S. 2005-2024

Instagram: distribution of global audiences 2024, by age group

HNWI worldwide 2024, by country

Poverty Rate