6 datasets found
  1. Census Data

    • catalog.data.gov
    • datadiscoverystudio.org
    • +3more
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Bureau of the Census (2024). Census Data [Dataset]. https://catalog.data.gov/dataset/census-data
    Explore at:
    Dataset updated
    Mar 1, 2024
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Description

    The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.

  2. C

    Pittsburgh American Community Survey Data 2015 - Household Types

    • data.wprdc.org
    • catalog.data.gov
    • +1more
    csv
    Updated May 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Pittsburgh (2023). Pittsburgh American Community Survey Data 2015 - Household Types [Dataset]. https://data.wprdc.org/dataset/pittsburgh-american-community-survey-data-household-types
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 21, 2023
    Dataset provided by
    City of Pittsburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Pittsburgh
    Description

    The data on relationship to householder were derived from answers to Question 2 in the 2015 American Community Survey (ACS), which was asked of all people in housing units. The question on relationship is essential for classifying the population information on families and other groups. Information about changes in the composition of the American family, from the number of people living alone to the number of children living with only one parent, is essential for planning and carrying out a number of federal programs.

    The responses to this question were used to determine the relationships of all persons to the householder, as well as household type (married couple family, nonfamily, etc.). From responses to this question, we were able to determine numbers of related children, own children, unmarried partner households, and multi-generational households. We calculated average household and family size. When relationship was not reported, it was imputed using the age difference between the householder and the person, sex, and marital status.

    Household – A household includes all the people who occupy a housing unit. (People not living in households are classified as living in group quarters.) A housing unit is a house, an apartment, a mobile home, a group of rooms, or a single room that is occupied (or if vacant, is intended for occupancy) as separate living quarters. Separate living quarters are those in which the occupants live separately from any other people in the building and which have direct access from the outside of the building or through a common hall. The occupants may be a single family, one person living alone, two or more families living together, or any other group of related or unrelated people who share living arrangements.

    Average Household Size – A measure obtained by dividing the number of people in households by the number of households. In cases where people in households are cross-classified by race or Hispanic origin, people in the household are classified by the race or Hispanic origin of the householder rather than the race or Hispanic origin of each individual.

    Average household size is rounded to the nearest hundredth.

    Comparability – The relationship categories for the most part can be compared to previous ACS years and to similar data collected in the decennial census, CPS, and SIPP. With the change in 2008 from “In-law” to the two categories of “Parent-in-law” and “Son-in-law or daughter-in-law,” caution should be exercised when comparing data on in-laws from previous years. “In-law” encompassed any type of in-law such as sister-in-law. Combining “Parent-in-law” and “son-in-law or daughter-in-law” does not represent all “in-laws” in 2008.

    The same can be said of comparing the three categories of “biological” “step,” and “adopted” child in 2008 to “Child” in previous years. Before 2008, respondents may have considered anyone under 18 as “child” and chosen that category. The ACS includes “foster child” as a category. However, the 2010 Census did not contain this category, and “foster children” were included in the “Other nonrelative” category. Therefore, comparison of “foster child” cannot be made to the 2010 Census. Beginning in 2013, the “spouse” category includes same-sex spouses.

  3. d

    Data from: What We Eat In America (WWEIA) Database

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). What We Eat In America (WWEIA) Database [Dataset]. https://catalog.data.gov/dataset/what-we-eat-in-america-wweia-database-f7f35
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Area covered
    United States
    Description

    What We Eat in America (WWEIA) is the dietary intake interview component of the National Health and Nutrition Examination Survey (NHANES). WWEIA is conducted as a partnership between the U.S. Department of Agriculture (USDA) and the U.S. Department of Health and Human Services (DHHS). Two days of 24-hour dietary recall data are collected through an initial in-person interview, and a second interview conducted over the telephone within three to 10 days. Participants are given three-dimensional models (measuring cups and spoons, a ruler, and two household spoons) and/or USDA's Food Model Booklet (containing drawings of various sizes of glasses, mugs, bowls, mounds, circles, and other measures) to estimate food amounts. WWEIA data are collected using USDA's dietary data collection instrument, the Automated Multiple-Pass Method (AMPM). The AMPM is a fully computerized method for collecting 24-hour dietary recalls either in-person or by telephone. For each 2-year data release cycle, the following dietary intake data files are available: Individual Foods File - Contains one record per food for each survey participant. Foods are identified by USDA food codes. Each record contains information about when and where the food was consumed, whether the food was eaten in combination with other foods, amount eaten, and amounts of nutrients provided by the food. Total Nutrient Intakes File - Contains one record per day for each survey participant. Each record contains daily totals of food energy and nutrient intakes, daily intake of water, intake day of week, total number foods reported, and whether intake was usual, much more than usual or much less than usual. The Day 1 file also includes salt use in cooking and at the table; whether on a diet to lose weight or for other health-related reason and type of diet; and frequency of fish and shellfish consumption (examinees one year or older, Day 1 file only). DHHS is responsible for the sample design and data collection, and USDA is responsible for the survey’s dietary data collection methodology, maintenance of the databases used to code and process the data, and data review and processing. USDA also funds the collection and processing of Day 2 dietary intake data, which are used to develop variance estimates and calculate usual nutrient intakes. Resources in this dataset:Resource Title: What We Eat In America (WWEIA) main web page. File Name: Web Page, url: https://www.ars.usda.gov/northeast-area/beltsville-md-bhnrc/beltsville-human-nutrition-research-center/food-surveys-research-group/docs/wweianhanes-overview/ Contains data tables, research articles, documentation data sets and more information about the WWEIA program. (Link updated 05/13/2020)

  4. Weekly United States COVID-19 Cases and Deaths by State - ARCHIVED

    • data.cdc.gov
    • data.virginia.gov
    • +1more
    application/rdfxml +5
    Updated Oct 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CDC COVID-19 Response (2022). Weekly United States COVID-19 Cases and Deaths by State - ARCHIVED [Dataset]. https://data.cdc.gov/Case-Surveillance/Weekly-United-States-COVID-19-Cases-and-Deaths-by-/pwn4-m3yp
    Explore at:
    csv, application/rdfxml, xml, tsv, json, application/rssxmlAvailable download formats
    Dataset updated
    Oct 6, 2022
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Authors
    CDC COVID-19 Response
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Area covered
    United States
    Description

    Reporting of new Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. This dataset will receive a final update on June 1, 2023, to reconcile historical data through May 10, 2023, and will remain publicly available.

    Aggregate Data Collection Process Since the start of the COVID-19 pandemic, data have been gathered through a robust process with the following steps:

    • A CDC data team reviews and validates the information obtained from jurisdictions’ state and local websites via an overnight data review process.
    • If more than one official county data source exists, CDC uses a comprehensive data selection process comparing each official county data source, and takes the highest case and death counts respectively, unless otherwise specified by the state.
    • CDC compiles these data and posts the finalized information on COVID Data Tracker.
    • County level data is aggregated to obtain state and territory specific totals.
    This process is collaborative, with CDC and jurisdictions working together to ensure the accuracy of COVID-19 case and death numbers. County counts provide the most up-to-date numbers on cases and deaths by report date. CDC may retrospectively update counts to correct data quality issues.

    Methodology Changes Several differences exist between the current, weekly-updated dataset and the archived version:

    • Source: The current Weekly-Updated Version is based on county-level aggregate count data, while the Archived Version is based on State-level aggregate count data.
    • Confirmed/Probable Cases/Death breakdown:  While the probable cases and deaths are included in the total case and total death counts in both versions (if applicable), they were reported separately from the confirmed cases and deaths by jurisdiction in the Archived Version.  In the current Weekly-Updated Version, the counts by jurisdiction are not reported by confirmed or probable status (See Confirmed and Probable Counts section for more detail).
    • Time Series Frequency: The current Weekly-Updated Version contains weekly time series data (i.e., one record per week per jurisdiction), while the Archived Version contains daily time series data (i.e., one record per day per jurisdiction).
    • Update Frequency: The current Weekly-Updated Version is updated weekly, while the Archived Version was updated twice daily up to October 20, 2022.
    Important note: The counts reflected during a given time period in this dataset may not match the counts reflected for the same time period in the archived dataset noted above. Discrepancies may exist due to differences between county and state COVID-19 case surveillance and reconciliation efforts.

    Confirmed and Probable Counts In this dataset, counts by jurisdiction are not displayed by confirmed or probable status. Instead, confirmed and probable cases and deaths are included in the Total Cases and Total Deaths columns, when available. Not all jurisdictions report probable cases and deaths to CDC.* Confirmed and probable case definition criteria are described here:

    Council of State and Territorial Epidemiologists (ymaws.com).

    Deaths CDC reports death data on other sections of the website: CDC COVID Data Tracker: Home, CDC COVID Data Tracker: Cases, Deaths, and Testing, and NCHS Provisional Death Counts. Information presented on the COVID Data Tracker pages is based on the same source (total case counts) as the present dataset; however, NCHS Death Counts are based on death certificates that use information reported by physicians, medical examiners, or coroners in the cause-of-death section of each certificate. Data from each of these pages are considered provisional (not complete and pending verification) and are therefore subject to change. Counts from previous weeks are continually revised as more records are received and processed.

    Number of Jurisdictions Reporting There are currently 60 public health jurisdictions reporting cases of COVID-19. This includes the 50 states, the District of Columbia, New York City, the U.S. territories of American Samoa, Guam, the Commonwealth of the Northern Mariana Islands, Puerto Rico, and the U.S Virgin Islands as well as three independent countries in compacts of free association with the United States, Federated States of Micronesia, Republic of the Marshall Islands, and Republic of Palau. New York State’s reported case and death counts do not include New York City’s counts as they separately report nationally notifiable conditions to CDC.

    CDC COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths, available by state and by county. These and other data on COVID-19 are available from multiple public locations, such as:

    https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html

    https://www.cdc.gov/covid-data-tracker/index.html

    https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html

    https://www.cdc.gov/coronavirus/2019-ncov/php/open-america/surveillance-data-analytics.html

    Additional COVID-19 public use datasets, include line-level (patient-level) data, are available at: https://data.cdc.gov/browse?tags=covid-19.

    Archived Data Notes:

    November 3, 2022: Due to a reporting cadence issue, case rates for Missouri counties are calculated based on 11 days’ worth of case count data in the Weekly United States COVID-19 Cases and Deaths by State data released on November 3, 2022, instead of the customary 7 days’ worth of data.

    November 10, 2022: Due to a reporting cadence change, case rates for Alabama counties are calculated based on 13 days’ worth of case count data in the Weekly United States COVID-19 Cases and Deaths by State data released on November 10, 2022, instead of the customary 7 days’ worth of data.

    November 10, 2022: Per the request of the jurisdiction, cases and deaths among non-residents have been removed from all Hawaii county totals throughout the entire time series. Cumulative case and death counts reported by CDC will no longer match Hawaii’s COVID-19 Dashboard, which still includes non-resident cases and deaths. 

    November 17, 2022: Two new columns, weekly historic cases and weekly historic deaths, were added to this dataset on November 17, 2022. These columns reflect case and death counts that were reported that week but were historical in nature and not reflective of the current burden within the jurisdiction. These historical cases and deaths are not included in the new weekly case and new weekly death columns; however, they are reflected in the cumulative totals provided for each jurisdiction. These data are used to account for artificial increases in case and death totals due to batched reporting of historical data.

    December 1, 2022: Due to cadence changes over the Thanksgiving holiday, case rates for all Ohio counties are reported as 0 in the data released on December 1, 2022.

    January 5, 2023: Due to North Carolina’s holiday reporting cadence, aggregate case and death data will contain 14 days’ worth of data instead of the customary 7 days. As a result, case and death metrics will appear higher than expected in the January 5, 2023, weekly release.

    January 12, 2023: Due to data processing delays, Mississippi’s aggregate case and death data will be reported as 0. As a result, case and death metrics will appear lower than expected in the January 12, 2023, weekly release.

    January 19, 2023: Due to a reporting cadence issue, Mississippi’s aggregate case and death data will be calculated based on 14 days’ worth of data instead of the customary 7 days in the January 19, 2023, weekly release.

    January 26, 2023: Due to a reporting backlog of historic COVID-19 cases, case rates for two Michigan counties (Livingston and Washtenaw) were higher than expected in the January 19, 2023 weekly release.

    January 26, 2023: Due to a backlog of historic COVID-19 cases being reported this week, aggregate case and death counts in Charlotte County and Sarasota County, Florida, will appear higher than expected in the January 26, 2023 weekly release.

    January 26, 2023: Due to data processing delays, Mississippi’s aggregate case and death data will be reported as 0 in the weekly release posted on January 26, 2023.

    February 2, 2023: As of the data collection deadline, CDC observed an abnormally large increase in aggregate COVID-19 cases and deaths reported for Washington State. In response, totals for new cases and new deaths released on February 2, 2023, have been displayed as zero at the state level until the issue is addressed with state officials. CDC is working with state officials to address the issue.

    February 2, 2023: Due to a decrease reported in cumulative case counts by Wyoming, case rates will be reported as 0 in the February 2, 2023, weekly release. CDC is working with state officials to verify the data submitted.

    February 16, 2023: Due to data processing delays, Utah’s aggregate case and death data will be reported as 0 in the weekly release posted on February 16, 2023. As a result, case and death metrics will appear lower than expected and should be interpreted with caution.

    February 16, 2023: Due to a reporting cadence change, Maine’s

  5. Z

    INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafiz Sadman (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Nafiz Sadman
    Nishat Anjum
    Kishor Datta Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, United States
    Description

    Introduction

    There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

    However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

    2 Data-set Introduction

    2.1 Data Collection

    We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

    The headline must have one or more words directly or indirectly related to COVID-19.

    The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

    The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

    Avoid taking duplicate reports.

    Maintain a time frame for the above mentioned newspapers.

    To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

    2.2 Data Pre-processing and Statistics

    Some pre-processing steps performed on the newspaper report dataset are as follows:

    Remove hyperlinks.

    Remove non-English alphanumeric characters.

    Remove stop words.

    Lemmatize text.

    While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

    The primary data statistics of the two dataset are shown in Table 1 and 2.

    Table 1: Covid-News-USA-NNK data statistics

    No of words per headline

    7 to 20

    No of words per body content

    150 to 2100

    Table 2: Covid-News-BD-NNK data statistics No of words per headline

    10 to 20

    No of words per body content

    100 to 1500

    2.3 Dataset Repository

    We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

    3 Literature Review

    Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

    Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

    Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

    Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

    4 Our experiments and Result analysis

    We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

    In February, both the news paper have talked about China and source of the outbreak.

    StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

    Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

    Washington Post discussed global issues more than StarTribune.

    StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

    While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

    We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

    where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,

  6. N

    Newberry, FL Non-Hispanic Population Breakdown By Race Dataset: Non-Hispanic...

    • neilsberg.com
    csv, json
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Newberry, FL Non-Hispanic Population Breakdown By Race Dataset: Non-Hispanic Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/newberry-fl-population-by-race/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Newberry, Florida
    Variables measured
    Non-Hispanic Asian Population, Non-Hispanic Black Population, Non-Hispanic White Population, Non-Hispanic Some other race Population, Non-Hispanic Two or more races Population, Non-Hispanic American Indian and Alaska Native Population, Non-Hispanic Native Hawaiian and Other Pacific Islander Population, Non-Hispanic Asian Population as Percent of Total Non-Hispanic Population, Non-Hispanic Black Population as Percent of Total Non-Hispanic Population, Non-Hispanic White Population as Percent of Total Non-Hispanic Population, and 4 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) Non-Hispanic population and (b) population as a percentage of the total Non-Hispanic population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and are part of Non-Hispanic classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Non-Hispanic population of Newberry by race. It includes the distribution of the Non-Hispanic population of Newberry across various race categories as identified by the Census Bureau. The dataset can be utilized to understand the Non-Hispanic population distribution of Newberry across relevant racial categories.

    Key observations

    Of the Non-Hispanic population in Newberry, the largest racial group is White alone with a population of 4,330 (70.48% of the total Non-Hispanic population).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race: This column displays the racial categories (for Non-Hispanic) for the Newberry
    • Population: The population of the racial category (for Non-Hispanic) in the Newberry is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each race as a proportion of Newberry total Non-Hispanic population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Newberry Population by Race & Ethnicity. You can refer the same here

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. Bureau of the Census (2024). Census Data [Dataset]. https://catalog.data.gov/dataset/census-data
Organization logo

Census Data

Explore at:
Dataset updated
Mar 1, 2024
Dataset provided by
United States Census Bureauhttp://census.gov/
Description

The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.

Search
Clear search
Close search
Google apps
Main menu