Facebook
TwitterThe New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Facebook
TwitterDESCRIPTION
Johns Hopkins' county-level COVID-19 case and death data, paired with population and rates per 100,000
SUMMARY Updates April 9, 2020 The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County. April 20, 2020 Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well. April 29, 2020 The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.
Overview The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Queries Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
Interactive Embed Code
Caveats This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website. In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules. In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county" This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members. Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates. Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey. The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories --...
Facebook
TwitterThe following report outlines the workflow used to optimize your Find Hot Spots result:Initial Data Assessment.There were 2933 valid input features.There were 3108 valid input aggregation areas.There were 3108 valid input aggregation areas.There were 66 outlier locations; these will not be used to compute the optimal fixed distance band.Incident AggregationAnalysis was based on the number of points in each polygon cell.Analysis was performed on all aggregation areas.The aggregation process resulted in 3108 weighted areas.Incident Count Properties:Min0.0000Max0.0015Mean0.0001Std. Dev.0.0001Scale of AnalysisThe optimal fixed distance band was based on the average distance to 30 nearest neighbors: 150682.0000 Meters.Hot Spot AnalysisThere are 865 output features statistically significant based on a FDR correction for multiple testing and spatial dependence.OutputRed output features represent hot spots where high incident counts cluster.Blue output features represent cold spots where low incident counts cluster.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Geostatistics analyzes and predicts the values associated with spatial or spatial-temporal phenomena. It incorporates the spatial (and in some cases temporal) coordinates of the data within the analyses. It is a practical means of describing spatial patterns and interpolating values for locations where samples were not taken (and measures the uncertainty of those values, which is critical to informed decision making). This archive contains results of geostatistical analysis of COVID-19 case counts for all available US counties. Test results were obtained with ArcGIS Pro (ESRI). Sources are state health departments, which are scraped and aggregated by the Johns Hopkins Coronavirus Resource Center and then pre-processed by MappingSupport.com.
This update of the Zenodo dataset (version 6) consists of three compressed archives containing geostatistical analyses of SARS-CoV-2 testing data. This dataset utilizes many of the geostatistical techniques used in previous versions of this Zenodo archive, but has been significantly expanded to include analyses of up-to-date U.S. COVID-19 case data (from March 24th to September 8th, 2020):
Archive #1: “1.Geostat. Space-Time analysis of SARS-CoV-2 in the US (Mar24-Sept6).zip” – results of a geostatistical analysis of COVID-19 cases incorporating spatially-weighted hotspots that are conserved over one-week timespans. Results are reported starting from when U.S. COVID-19 case data first became available (March 24th, 2020) for 25 consecutive 1-week intervals (March 24th through to September 6th, 2020). Hotspots, where found, are reported in each individual state, rather than the entire continental United States.
Archive #2: "2.Geostat. Spatial analysis of SARS-CoV-2 in the US (Mar24-Sept8).zip" – the results from geostatistical spatial analyses only of corrected COVID-19 case data for the continental United States, spanning the period from March 24th through September 8th, 2020. The geostatistical techniques utilized in this archive includes ‘Hot Spot’ analysis and ‘Cluster and Outlier’ analysis.
Archive #3: "3.Kriging and Densification of SARS-CoV-2 in LA and MA.zip" – this dataset provides preliminary kriging and densification analysis of COVID-19 case data for certain dates within the U.S. states of Louisiana and Massachusetts.
These archives consist of map files (as both static images and as animations) and data files (including text files which contain the underlying data of said map files [where applicable]) which were generated when performing the following Geostatistical analyses: Hot Spot analysis (Getis-Ord Gi*) [‘Archive #1’: consecutive weeklong Space-Time Hot Spot analysis; ‘Archive #2’: daily Hot Spot Analysis], Cluster and Outlier analysis (Anselin Local Moran's I) [‘Archive #2’], Spatial Autocorrelation (Global Moran's I) [‘Archive #2’], and point-to-point comparisons with Kriging and Densification analysis [‘Archive #3’].
The Word document provided ("Description-of-Archive.Updated-Geostatistical-Analysis-of-SARS-CoV-2 (version 6).docx") details the contents of each file and folder within these three archives and gives general interpretations of these results.
Facebook
TwitterAs of March 10, 2023, the state with the highest number of COVID-19 cases was California. Almost 104 million cases have been reported across the United States, with the states of California, Texas, and Florida reporting the highest numbers.
From an epidemic to a pandemic The World Health Organization declared the COVID-19 outbreak a pandemic on March 11, 2020. The term pandemic refers to multiple outbreaks of an infectious illness threatening multiple parts of the world at the same time. When the transmission is this widespread, it can no longer be traced back to the country where it originated. The number of COVID-19 cases worldwide has now reached over 669 million.
The symptoms and those who are most at risk Most people who contract the virus will suffer only mild symptoms, such as a cough, a cold, or a high temperature. However, in more severe cases, the infection can cause breathing difficulties and even pneumonia. Those at higher risk include older persons and people with pre-existing medical conditions, including diabetes, heart disease, and lung disease. People aged 85 years and older have accounted for around 27 percent of all COVID-19 deaths in the United States, although this age group makes up just two percent of the U.S. population
Facebook
TwitterIn summer 2020, SARS-CoV-2 was detected on mink farms in Utah. An interagency One Health response was initiated to assess the extent of the outbreak and included sampling animals from or near affected mink farms and testing them for SARS-CoV-2 and non-SARS coronaviruses. Among the 365 animals sampled, including domestic cats, mink, rodents, raccoons, and skunks, 261 (72%) of the animals harbored at least one coronavirus at the time. Among the samples which could be further characterized, 126 alphacoronaviruses and 88 betacoronaviruses (including 74 detections of SARS-CoV-2) were identified. Moreover, at least 10% (n=27) of the corona-virus-positive animals were found to be co-infected with more than one coronavirus. Our findings indicate an unexpectedly high prevalence of coronavirus among the domestic and wild animals tested on mink farms and raise the possibility that commercial animal husbandry operations could be potential hot spots for future trans-species viral spillover and the emergence of new pandemic coronaviruses. Figure 1. Phylogenetic relationships of the identified coronaviruses from mink and other animals from mink farms in Utah. The four genera of coronaviruses are highlighted in different colors. AlphaCoV, alkphacoronavirus; BetaCoV, betacoronavirus; DeltaCoV, deltacoronaviruses; and GammaCoV, gammacoronavirus. Type species for the currently recognized subgenera are annotated according to the nomenclature scheme used in this manuscript with the addition of the ICTV subgenus. Additional viruses, including the closest GenBank entry as identified by the BLAST tool, were included to help delineate relationship. Red circles are viruses identified in this study. Panel A. Full phylogenetic tree (A full-size image is included in Supplementary Figure 1). Red arrows designate the group of nearly identical Utah mink coronavirus strains collapsed into the colored triangle in Panel B. Table 1. Coronavirus distribution among species tested. The species are listed by their common names; Total, the total number of animals of each species tested; Negative, number of each species with no coronavirus detected among the tissues tested; Positive, number of animals positive for coronavirus in at least one tissue; % Pos, percentage of coronavirus positives in each species. Table 2. Detailed tissue panel tested for SARS-CoV-2. The distribution of SARS-CoV-2 RNA detection in the first 96 animals is listed. Tissue, tissue or tissue pools received; Total, total number tested in each category; Negative, number of N1 RT-PCR negatives; Posi-tives, number of N1 RT-PCR positives; % Pos, percentage of tissues positive for corona-virus. Table 3. Summary of coronaviruses identified. The distribution of coronaviruses detected and characterized according to their host is listed. Species, common name of animal species tested; AlphaCoV, number of alphacoronaviruses identified; BetaCoV, number of betacoronaviruses identified; Sequenced, number of viruses identified by sequencing, Unchar, number of coronavirus-positive samples not further characterized. Table 4. SARS-CoV-2 coinfections identified in Utah mammals. The individual animals that are both SARS-CoV-2 positive and infected with a second coronavirus are listed. Animal ID, Unique animal identification number; Common name, common name of animal; Scientific name, scientific name of animal; Sex, F, female, M, male. Unk, un-known; Age, A adult, J juvenile, Unk, unknown; SARS-CoV-2, Neg-N1 RT-PCR nega-tive, Pos-N1 RT-PCR positive, Second strain, genus and common name of the coronavirus, Pan-CoV RT-PCR Equivocal, sample is PCR positive but not further characterized. Supplementary Figure 1. Phylogenetic relationships of the identified coronaviruses from mink farms in Utah. The four genera of coronaviruses are highlighted in different colors. AlphaCoV, alkphacoronavirus; BetaCoV, betacoronavirus; DeltaCoV, deltacoronaviruses; and GammaCoV, gammacoronavirus. Type species for the currently recognized subgenera are annotated according to the nomenclature scheme used in this manuscript with the addition of the ICTV subgenus. Additional viruses, including the closest GenBank entry as identified by the BLAST tool were included to help delineate relationship. Red circles are viruses identified in this study. Supplementary Table 1. List of animals and tissues sampled and RT-PCR test results. Animal ID, unique identifier for each animal; Specimen ID, unique identifier for each tissue; Common name, common name of the animal species; Scientific name, scientific name of the animal species, Sex, F-female, M-male, UNK-unknown; Age, J-juvenile, A-adult, UNK-unknown; Tissue, organ or organ pools tested; Tissue study, X denotes the animals and tissues used in the tissue distribution sub-study; N1 PCR, Ct values from the CDC N1 assay; Pan-CoV PCR, Neg, negative, Pos, positive, Equiv, equivocal; * wild mink. Supplementary Table 2. Summary of coronavirus test results. Animal ID, unique identifier for each animal; Common name, common name of the animal species; Scientific name, scientific name of the animal species, Sex, F-female, M-male, UNK-unknown; Age, J-juvenile, A-adult, UNK-unknown; CoV, Neg-negative, Pos-positive on either one or both RT-PCR tests; SARS-CoV-2, animals positive in the CDC N1 test; AlphaCoV, the tissues positive for alphacoronavirus for each animal is listed; BetaCoV, the tissues positive for betacoronavirus for each animal is listed; C-colon, C/R-colon/rectum pool, H-heart, L-lung, L/S-live/spleen pool, S int-small intestine; Co-infections, Y-yes; PCR only, Y-yes; Virus identified by sequencing, brief name of virus identified.
Facebook
TwitterAs of March 10, 2023, the state with the highest rate of COVID-19 cases was Rhode Island followed by Alaska. Around 103.9 million cases have been reported across the United States, with the states of California, Texas, and Florida reporting the highest numbers of infections.
From an epidemic to a pandemic The World Health Organization declared the COVID-19 outbreak as a pandemic on March 11, 2020. The term pandemic refers to multiple outbreaks of an infectious illness threatening multiple parts of the world at the same time; when the transmission is this widespread, it can no longer be traced back to the country where it originated. The number of COVID-19 cases worldwide is roughly 683 million, and it has affected almost every country in the world.
The symptoms and those who are most at risk Most people who contract the virus will suffer only mild symptoms, such as a cough, a cold, or a high temperature. However, in more severe cases, the infection can cause breathing difficulties and even pneumonia. Those at higher risk include older persons and people with pre-existing medical conditions, including diabetes, heart disease, and lung disease. Those aged 85 years and older have accounted for around 27 percent of all COVID deaths in the United States, although this age group makes up just two percent of the total population
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attributes of prospective space-time clusters (hotspots) for COVID-19 from 1/23-5/20/2020 at the county level.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background: In early 2020, the Coronavirus Disease 2019 (COVID-19) rapidly spread across the United States (US), exhibiting significant geographic variability. While several studies have examined the predictive relationships of differing factors on COVID-19, few have looked at spatiotemporal variation of COVID-19 deaths at refined geographic scales. Methods: The objective of this analysis is to examine the spatiotemporal variation in COVID-19 deaths with respect to socioeconomic, health, demographic, and political factors. We use multivariate regression applied to Health and Human Services (HHS) regions as well as nationwide county-level geographically weighted random forest (GWRF) models. Analyses were performed on data from three separate time frames which correspond to the spread of distinct viral variants in the US: pandemic onset until May 2021, May 2021 through November 2021, and December 2021 until April 2022. Spatial autocorrelation was additionally examined using a local and global Moran’s I test statistic. Results: Multivariate regression results for all regions across three time windows suggest that existing measures of social vulnerability for disaster preparedness (SVI) are predictive of a higher degree of mortality from COVID-19. In comparison, GWRF models provide a more robust evaluation of feature importance and prediction, exposing the value of local features for prediction, such as obesity, which is obscured by coarse-grained analysis. Spatial autocorrelation indicates positive spatial clustering,with a progression from positively clustered low deaths for liberal counties (cold spots) to positively clustered high deaths for conservative counties (hot spots). Conclusion: GWRF results indicate that a more nuanced modeling strategy is useful for determining spatial variation versus regional modeling approaches which may not capture feature clustering along border areas. Spatially explicit modeling approaches, such as GWRF, provide a more robust feature importance assessment of sociodemographic risk factors in predicting COVID-19 mortality. Methods The attached zip file contains the full GitHub repository, which includes data, the supplemental code, and an output HTML. The GitHub repository can be additionally viewed at: http://github.com/erichseamon/COVIDriskpaper. A README is provided as part of the repository, which describes each dataset, including all variable names and their unit of measure. All data used to generate the supplemental materials is located in the /data folder.
Facebook
TwitterBy Kristen Honey, Chief Data Scientist and COVID-19 Diagnostics Informatics Lead, COVID-19 Testing and Diagnostics Working Group (TDWG); Joshua Prasad, Director of Health Equity Innovation, Office of the Chief Data Officer (OCDO), Jack Bastian, Data Engineer, HHS Protect, Office of the Chief Data Officer (OCDO)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.
However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.
2 Data-set Introduction
2.1 Data Collection
We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:
The headline must have one or more words directly or indirectly related to COVID-19.
The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.
The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.
Avoid taking duplicate reports.
Maintain a time frame for the above mentioned newspapers.
To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.
2.2 Data Pre-processing and Statistics
Some pre-processing steps performed on the newspaper report dataset are as follows:
Remove hyperlinks.
Remove non-English alphanumeric characters.
Remove stop words.
Lemmatize text.
While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.
The primary data statistics of the two dataset are shown in Table 1 and 2.
Table 1: Covid-News-USA-NNK data statistics
No of words per headline
7 to 20
No of words per body content
150 to 2100
Table 2: Covid-News-BD-NNK data statistics No of words per headline
10 to 20
No of words per body content
100 to 1500
2.3 Dataset Repository
We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.
3 Literature Review
Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.
Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].
Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.
Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.
4 Our experiments and Result analysis
We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:
In February, both the news paper have talked about China and source of the outbreak.
StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.
Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.
Washington Post discussed global issues more than StarTribune.
StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.
While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.
We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases
where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundCoronavirus disease 2019 (COVID-19) emerged in 2019 and has since caused a global pandemic. Since its emergence, COVID-19 has hugely impacted healthcare, including pediatrics. This study aimed to explore the current status and hotspots of pediatric COVID-19 research using bibliometric analysis.MethodsThe Institute for Scientific Information Web of Science core collection database was searched for articles on pediatric COVID-19 to identify original articles that met the criteria. The retrieval period ranged from the creation of the database to September 20, 2021. A total of 3,561 original articles written in English were selected to obtain data, such as author names, titles, source publications, number of citations, author affiliations, and countries where the studies were conducted. Microsoft Excel (Microsoft, Redmond, WA) was used to create charts related to countries, authors, and institutions. VOSviewer (Center for Science and Technology Studies, Leiden, The Netherlands) was used to create visual network diagrams of keyword, author, and country co-occurrence.ResultsWe screened 3,561 publications with a total citation frequency of 30,528. The United States had the most published articles (1188 articles) and contributed the most with author co-occurrences. The author with the most published articles was Villani from the University of Padua, Italy. He also contributed the most co-authored articles. The most productive institution was Huazhong University of Science and Technology in China. The institution with the most frequently cited published articles was Shanghai Jiao Tong University in China. The United States cooperated most with other countries. Research hotspots were divided into two clusters: social research and clinical research. Besides COVID-19 and children, the most frequent keywords were pandemic (251 times), mental health (187 times), health (172 times), impact (148 times), and multisystem inflammatory syndrome in children (MIS-C) (144 times).ConclusionPediatric COVID-19 has attracted considerable attention worldwide, leading to a considerable number of articles published over the past 2 years. The United States, China, and Italy have leading roles in pediatric COVID-19 research. The new research hotspot is gradually shifting from COVID-19 and its related clinical studies to studies of its psychological and social impacts on children.
Facebook
Twitterhttps://www.immport.org/agreementhttps://www.immport.org/agreement
Background: Households are hot spots for severe acute respiratory syndrome coronavirus 2 transmission. Methods: This prospective study enrolled 100 coronavirus disease 2019 (COVID-19) cases and 208 of their household members in North Carolina though October 2020, including 44% who identified as Hispanic or non-White. Households were enrolled a median of 6 days from symptom onset in the index case. Incident secondary cases within the household were detected using quantitative polymerase chain reaction of weekly nasal swabs (days 7, 14, 21) or by seroconversion at day 28. Results: Excluding 73 household contacts who were PCR-positive at baseline, the secondary attack rate (SAR) among household contacts was 32% (33 of 103; 95% confidence interval [CI], 22%-44%). The majority of cases occurred by day 7, with later cases confirmed as household-acquired by viral sequencing. Infected persons in the same household had similar nasopharyngeal viral loads (intraclass correlation coefficient = 0.45; 95% CI, .23-.62). Households with secondary transmission had index cases with a median viral load that was 1.4 log10 higher than those without transmission (P = .03), as well as higher living density (more than 3 persons occupying fewer than 6 rooms; odds ratio, 3.3; 95% CI, 1.02-10.9). Minority households were more likely to experience high living density and had a higher risk of incident infection than did White households (SAR, 51% vs 19%; P = .01). Conclusions: Household crowding in the context of high-inoculum infections may amplify the spread of COVID-19, potentially contributing to disproportionate impact on communities of color.
Facebook
TwitterStamp Out COVID-19An apple a day keeps the doctor away.Linda Angulo LopezDecember 3, 2020https://theconversation.com/coronavirus-where-do-new-viruses-come-from-136105SNAP Participation Rates, was explored and analysed on ArcGIS Pro, the results of which can help decision makers set up further SNAP-D initiatives.In the USA foods are stored in every State and U.S. territory and may be used by state agencies or local disaster relief organizations to provide food to shelters or people who are in need.US Food Stamp Program has been ExtendedThe Supplemental Nutrition Assistance Program, SNAP, is a State Organized Food Stamp Program in the USA and was put in place to help individuals and families during this exceptional time. State agencies may request to operate a Disaster Supplemental Nutrition Assistance Program (D-SNAP) .D-SNAP Interactive DashboardAlmost all States have set up Food Relief Programs, in response to COVID-19.Scroll Down to Learn more about the SNAP Participation Analysis & ResultsSNAP Participation AnalysisInitial results of yearly participation rates to geography show statistically significant trends, to get acquainted with the results, explore the following 3D Time Cube Map:Visualize A Space Time Cube in 3Dhttps://arcg.is/1q8LLPnetCDF ResultsWORKFLOW: a space-time cube was generated as a netCDF structure with the ArcGIS Pro Space-Time Mining Tool : Create a Space Time Cube from Defined Locations, other tools were then used to incorporate the spatial and temporal aspects of the SNAP County Participation Rate Feature to reveal and render statistically significant trends about Nutrition Assistance in the USA.Hot Spot Analysis Explore the results in 2D or 3D.2D Hot Spotshttps://arcg.is/1Pu5WH02D Hot Spot ResultsWORKFLOW: Hot Spot Analysis, with the Hot Spot Analysis Tool shows that there are various trends across the USA for instance the Southeastern States have a mixture of consecutive, intensifying, and oscillating hot spots.3D Hot Spotshttps://arcg.is/1b41T43D Hot Spot ResultsThese trends over time are expanded in the above 3D Map, by inspecting the stacked columns you can see the trends over time which give result to the overall Hot Spot Results.Not all counties have significant trends, symbolized as Never Significant in the Space Time Cubes.Space-Time Pattern Mining AnalysisThe North-central areas of the USA, have mostly diminishing cold spots.2D Space-Time Mininghttps://arcg.is/1PKPj02D Space Time Mining ResultsWORKFLOW: Analysis, with the Emerging Hot Spot Analysis Tool shows that there are various trends across the USA for instance the South-Eastern States have a mixture of consecutive, intensifying, and oscillating hot spots.Results ShowThe USA has counties with persistent malnourished populations, they depend on Food Aide.3D Space-Time Mininghttps://arcg.is/01fTWf3D Space Time Mining ResultsIn addition to obvious planning for consistent Hot-Hot Spot Areas, areas oscillating Hot-Cold and/or Cold-Hot Spots can be identified for further analysis to mitigate the upward trend in food insecurity in the USA, since 2009 which has become even worse since the outbreak of the COVID-19 pandemic.After Notes:(i) The Johns Hopkins University has an Interactive Dashboard of the Evolution of the COVID-19 Pandemic.Coronavirus COVID-19 (2019-nCoV)(ii) Since March 2020 in a Response to COVID-19, SNAP has had to extend its benefits to help people in need. The Food Relief is coordinated within States and by local and voluntary organizations to provide nutrition assistance to those most affected by a disaster or emergency.Visit SNAPs Interactive DashboardFood Relief has been extended, reach out to your state SNAP office, if you are in need.(iii) Follow these Steps to build an ArcGIS Pro StoryMap:Step 1: [Get Data][Open An ArcGIS Pro Project][Run a Hot Spot Analysis][Review analysis parameters][Interpret the results][Run an Outlier Analysis][Interpret the results]Step 2: [Open the Space-Time Pattern Mining 2 Map][Create a space-time cube][Visualize a space-time cube in 2D][Visualize a space-time cube in 3D][Run a Local Outlier Analysis][Visualize a Local Outlier Analysis in 3DStep 3: [Communicate Analysis][Identify your Audience & Takeaways][Create an Outline][Find Images][Prepare Maps & Scenes][Create a New Story][Add Story Elements][Add Maps & Scenes] [Review the Story][Publish & Share]A submission for the Esri MOOCSpatial Data Science: The New Frontier in AnalyticsLinda Angulo LopezLauren Bennett . Shannon Kalisky . Flora Vale . Alberto Nieto . Atma Mani . Kevin Johnston . Orhun Aydin . Ankita Bakshi . Vinay Viswambharan . Jennifer Bell & Nick Giner
Facebook
TwitterAs coronavirus cases have exploded across the country, states have struggled to obtain sufficient personal protective equipment such as masks, face shields, gloves and ventilators to meet the needs of healthcare workers. FEMA began distributing PPE from the national stockpile as well as PPE obtained from private manufacturers to states in March.
Initially, FEMA distributed materials based primarily on population. By late March, Its methods changed to send more PPE to hotspot locations, and FEMA claimed these decisions were data-driven and need-based. By late spring, the agency was considering requests from states as well.
Although all U.S. states and territories have received some amount of PPE from FEMA, the amounts of PPE states have per capita and per positive COVID-19 case vary widely.
The AP used this data in a story that ran July 7.
These numbers include material distributed by FEMA and also those sold by private distributors under direction from FEMA. They include materials both delivered to and en route to states.
States have purchased PPE directly in addition to receiving PPE from FEMA or directed there by the agency, and this data only includes the latter categories.
FEMA also distributed and directed the distribution of gear to U.S. territories in addition to states, which are included in FEMA’s release linked below, but not are not included in this data.
FEMA has publicly distributed its breakdown of PPE delivery by state for May and June. FEMA did not provide comprehensive numbers for each state before May.
These numbers are cumulative, meaning that the numbers for May include items of PPE distributed prior to May 14, dating to when the agency began allocations on March 1. The June numbers include the May numbers and any new PPE distributions since then.
The population column, which was used to calculate the numbers of PPE items per state, came from data from the U.S Census Bureau. Since the Census releases annual population data, population data from 2019 was used for each state.
The numbers of coronavirus cases were pulled from the data released daily by Johns Hopkins University as of the dates that FEMA released its distribution numbers — May 14 and June 10.
The data includes amounts of gear that had been delivered to the states or were en route as of the reporting dates.
All PPE item numbers above 1 million were rounded to the nearest hundred thousand by FEMA, but numbers lower than that were not rounded.
In some cases, gear headed to a state was rerouted because it was needed more somewhere else or a state decided it did not need it. In some instances, that resulted in states having higher numbers for certain supplies in May than in June.
Facebook
TwitterDescription: The COVID-19 dataset used for this EDA project encompasses comprehensive data on COVID-19 cases, deaths, and recoveries worldwide. It includes information gathered from authoritative sources such as the World Health Organization (WHO), the Centers for Disease Control and Prevention (CDC), and national health agencies. The dataset covers global, regional, and national levels, providing a holistic view of the pandemic's impact.
Purpose: This dataset is instrumental in understanding the multifaceted impact of the COVID-19 pandemic through data exploration. It aligns perfectly with the objectives of the EDA project, aiming to unveil insights, patterns, and trends related to COVID-19. Here are the key objectives: 1. Data Collection and Cleaning: • Gather reliable COVID-19 datasets from authoritative sources (such as WHO, CDC, or national health agencies). • Clean and preprocess the data to ensure accuracy and consistency. 2. Descriptive Statistics: • Summarize key statistics: total cases, recoveries, deaths, and testing rates. • Visualize temporal trends using line charts, bar plots, and heat maps. 3. Geospatial Analysis: • Map COVID-19 cases across countries, regions, or cities. • Identify hotspots and variations in infection rates. 4. Demographic Insights: • Explore how age, gender, and pre-existing conditions impact vulnerability. • Investigate disparities in infection rates among different populations. 5. Healthcare System Impact: • Analyze hospitalization rates, ICU occupancy, and healthcare resource allocation. • Assess the strain on medical facilities. 6. Economic and Social Effects: • Investigate the relationship between lockdown measures, economic indicators, and infection rates. • Explore behavioral changes (e.g., mobility patterns, remote work) during the pandemic. 7. Predictive Modeling (Optional): • If data permits, build simple predictive models (e.g., time series forecasting) to estimate future cases.
Data Sources: The primary sources of the COVID-19 dataset include the Johns Hopkins CSSE COVID-19 Data Repository, Google Health’s COVID-19 Open Data, and the U.S. Economic Development Administration (EDA). These sources provide reliable and up-to-date information on COVID-19 cases, deaths, testing rates, and other relevant variables. Additionally, GitHub repositories and platforms like Medium host supplementary datasets and analyses, enriching the available data resources.
Data Format: The dataset is available in various formats, such as CSV and JSON, facilitating easy access and analysis. Before conducting the EDA, the data underwent preprocessing steps to ensure accuracy and consistency. Data cleaning procedures were performed to address missing values, inconsistencies, and outliers, enhancing the quality and reliability of the dataset.
License: The COVID-19 dataset may be subject to specific usage licenses or restrictions imposed by the original data sources. Proper attribution is essential to acknowledge the contributions of the WHO, CDC, national health agencies, and other entities providing the data. Users should adhere to any licensing terms and usage guidelines associated with the dataset.
Attribution: We acknowledge the invaluable contributions of the World Health Organization (WHO), the Centers for Disease Control and Prevention (CDC), national health agencies, and other authoritative sources in compiling and disseminating the COVID-19 data used for this EDA project. Their efforts in collecting, curating, and sharing data have been instrumental in advancing our understanding of the pandemic and guiding public health responses globally.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Last updated April 6rd, 2020
Guatemala is a small, beautiful country in Central America. Although far away from the hotspot in Wuhan, China, the first coronavirus patient was confirmed on March 13th, 2020. Government response was immediate and strong measures were taken from the beginning, but health infrastructure is not as developed as in Spain, France, Italy or US putting citizens at greater risk. Being aware of the havoc and struggle coronavirus has created around the world, we want to:
At the moment we have collected confirmed patient information including: age, sex, nationality, infection cause, infection date and others. Find an full english description of the data in the file README_en.md, and a spanish description in the file README_es.md (una descripción completa en español en el archivo README_es.md)
We hope to add more data as it becomes available from official sources.
We want to thank to all members and volunteers that are taking hours from their busy schedules to put this dataset together.
Banner photo is Semuc Champey, an astonishing natural spot in the northern region of Guatemala. Photo by Christopher Crouzet on Unsplash.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study examines aggregate crime rates and the spatial distribution of violence against women (VAW) both before and after the COVID-19 pandemic. It also explores the influence of socioeconomic and situational factors on these trends. The analysis assesses potential variations across different pandemic phases by addressing the following research questions: (1) Did VAW incident rates change during the different pandemic phases? (2) Did the spatial distribution of VAW incidents per municipality in Colombia change before, during, and after the pandemic? and (3) What key determinants significantly impacted VAW rates during the study period? Given the documented global rise in domestic and intimate partner violence against women during the pandemic, we hypothesized an increase in VAW incidents in Colombia from 2020 to 2022. Furthermore, based on existing literature, we predicted that urban municipalities, poverty, lack of education, coca cultivation, and the presence of non-state armed actors would predict higher VAW at the municipality level. Finally, we expected statistically significant VAW hot spots to remain consistent at the municipality level throughout the pandemic stages, due to persistent underlying risk factors in these areas. The findings revealed a significant post-quarantine decrease in VAW incidents, followed by a significant increase after the economy's gradual reopening in September 2020. Notably, the geographical distribution of VAW remained consistent, with persistent 'hot spot' concentrations in the same areas across all study periods. Furthermore, urbanization and higher general violent crime rates consistently predicted higher VAW rates. Conversely, the presence of armed groups and coca production were significant negative predictors, while education's impact on VAW rates was mixed.
Facebook
Twitterhttps://www.dataflix.com/data360/license/https://www.dataflix.com/data360/license/
The Dataflix COVID dataset is a centralized repository of up-to-date and curated data focused on key tracking metics and U.S. census data. The dataset is publicly-readable & accessible on Google BigQuery – ready for analysis, analytics and machine learning initiatives. The dataset is built on data sourced from trusted sources like CSSE at Johns Hopkins University and government agencies, covering a wide range of metrics including confirmed cases, new cases, % population, mortality rate and deaths, aggregated at various geographic levels including city, county, state and country. New data is published on daily basis. Our objective is to make structured COVID data available for organizations and individuals to help in the fight against COVID-19. Example, health authorities will be able to build reports & dashboards to efficiently deploy vital resources like hospital beds and ventilators as they track the spread of the disease. Or epidemiologists can use the dataset to complement their existing models & datasets, and generate better forecasts of hotspots and trends. Saiba mais
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionAs the first bibliometric analysis of COVID-19 and immune responses, this study will provide a comprehensive overview of the latest research advances. We attempt to summarize the scientific productivity and cooperation across countries and institutions using the bibliometric methodology. Meanwhile, using clustering analysis of keywords, we revealed the evolution of research hotspots and predicted future research focuses, thereby providing valuable information for the follow-up studies.MethodsWe selected publications on COVID-19 and immune response using our pre-designed search strategy. Web of Science was applied to screen the eligible publications for subsequent bibliometric analyses. GraphPad Prism 8.0, VOSviewer, and CiteSpace were applied to analyze the research trends and compared the contributions of countries, authors, institutions, and journals to the global publications in this field.ResultsWe identified 2,200 publications on COVID-19 and immune response published between December 1, 2019, and April 25, 2022, with a total of 3,154 citations. The United States (611), China (353), and Germany (209) ranked the top three in terms of the number of publications, accounting for 53.3% of the total articles. Among the top 15 institutions publishing articles in this area, four were from France, four were from the United States, and three were from China. The journal Frontiers in Immunology published the most articles (178) related to COVID-19 and immune response. Alessandro Sette (31 publications) from the United States were the most productive and influential scholar in this field, whose publications with the most citation frequency (3,633). Furthermore, the development and evaluation of vaccines might become a hotspot in relevant scope.ConclusionsThe United States makes the most indispensable contribution in this field in terms of publication numbers, total citations, and H-index. Although publications from China also take the lead regarding quality and quantity, their international cooperation and preclinical research need to be further strengthened. Regarding the citation frequency and the total number of published articles, the latest research progress might be tracked in the top-ranking journals in this field. By analyzing the chronological order of the appearance of retrieved keywords, we speculated that vaccine-related research might be the novel focus in this field.
Facebook
TwitterThe New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.