Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bangladesh recorded 29223 Coronavirus Deaths since the epidemic began, according to the World Health Organization (WHO). In addition, Bangladesh reported 2038539 Coronavirus Cases. This dataset includes a chart with historical data for Bangladesh Coronavirus Deaths.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
WHO declared COVID-19 as the global pandemic. Data science and research communities all over the world came together to fight against it in this tough time. This dataset contains the datewise updates of the number of confirmed, deaths, recovered, quarantine and released from quarantine cases for Bangladesh. Hopefully it will help the local community to find meaningful insight and find the pattern of the pandemic which may save millions of life.
All of data are taken from the Govt.site, WHO, DGHS and Worldometer open source data. The dataset contains all data from the date of March 1, 2020 to April 3, 2020.
Date- Specific Date
Confirmed - The number of confirmed cases
Recovered - The number of recovered cases
Deaths- The number of death cases
Quarantine - The number of quarantined cases
Released From Quarantine - The number of released quarantine cases
As the dataset contains datewise updates of the coronavirus cases in Bangladesh, feel free to prepare meaningful insights from the data. Share and collaborate to find the factors of pandemic for Bangladesh, make time series calculation and so on. Don't forget to suggest useful dataset to merge along with this dataset. Thanks.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This CSV file contains Confirmed covid cases, covid case rate, covid deaths, and covid death rate of Barisal, Chattogram, Dhaka, Khulna, Mymensingh, Rajshahi, Rangpur, Sylhet a and finally all over Bangladesh. This covid cases dataset of Bangladesh is up to d date till 02 July 2021. I will try to provide a monthly update. You can use this Bangladesh covid dataset in other words covid dataset for Bangladesh in your covid case in Bangladesh-related projects and covid case in Bangladesh analysis.
Checkout my other datasets here
Facebook
Twitterhttps://github.com/disease-sh/API/blob/master/LICENSEhttps://github.com/disease-sh/API/blob/master/LICENSE
In past 24 hours, Bangladesh, Asia had N/A new cases, N/A deaths and N/A recoveries.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
COVID-19 is a novel coronavirus that emerged in China in 2019. However, Coronaviruses are zoonotic viruses that circulate amongst animals and spill ove9r to humans from time to time and have been causing illness ranging from mild symptoms to severe illness. On 7 January 2020, Chinese authorities confirmed COVID-19 and on 30 January 2020, the Director-General of WHO declared the COVID-19 outbreak a Public Health Emergency of International concern. On 8 March, Bangladesh has confirmed 3 laboratories tested coronavirus cases for the very first time. This Dataset file contains the data for analysing different cases of COVID-19 outbreak in Bangladesh. Date in a specific format, Daily new confirmed cases, Total confirmed cases, Daily new deaths, total deaths, Daily new recovered, Total recovered, Daily New Tests, Total Tests, and Active Cases are the vailable data format for this dataset.
This dataset contains every single days data of COVID-19 outbreak in Bangladesh. From the first confirmed case of COVID-19, on 8 March 2020, it contains each confirmed, recovery, and death cases till date, This is a time-series dataset and this dataset will updated in a daily basis.
I would like to acknowldgwe the following organizations for their great efforts to make these data available for the greater community. Institute of Epidemiology, Disease Control and Research (IEDCR): https://www.iedcr.gov.bd/ DGHS:https://dghs.gov.bd/index.php/en/ Official Website of BD Government: http://www.corona.gov.bd/ WHO: https://www.who.int/countries/bgd/en/
As an academician and data science resercher, I feel this is an ample need for the greater data science community all over the world to understand and develop meaningful insights on the outbreak of COVID-19 in Bangladesh. Constructive suggestions and comments are highly appreciated.
Facebook
TwitterCOVID-19 pandemic dataset for Bangladesh.
For the first time, on 8 March 2020, Bangladesh stated 3 (three) confirm COVID-19 positive cases. This dataset contains the daily number of tests, confirmed cases and the number of death in Bangladesh starting from March 3, 2020 to May 26, 2021.
Our World in Data: Bangladesh: Coronavirus Pandemic Country Profile
Banner Image: Shutterstock
To log and track daily cases of COVID-19 in Bangladesh.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
New Covid cases per million people in Bangladesh, March, 2023 The most recent value is 1 new Covid cases per million people as of March 2023, a decline compared to the previous value of 2 new Covid cases per million people. Historically, the average for Bangladesh from February 2020 to March 2023 is 313 new Covid cases per million people. The minimum of 0 new Covid cases per million people was recorded in February 2020, while the maximum of 1964 new Covid cases per million people was reached in July 2021. | TheGlobalEconomy.com
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: To develop an effective countermeasure and determine our susceptibilities to the outbreak of COVID-19 is challenging for a densely populated developing country like Bangladesh and a systematic review of the disease on a continuous basis is necessary.Methods: Publicly available and globally acclaimed datasets (4 March 2020–30 September 2020) from IEDCR, Bangladesh, JHU, and ECDC database are used for this study. Visual exploratory data analysis is used and we fitted a polynomial model for the number of deaths. A comparison of Bangladesh scenario over different time points as well as with global perspectives is made.Results: In Bangladesh, the number of active cases had decreased, after reaching a peak, with a constant pattern of death rate at from July to the end of September, 2020. Seventy-one percent of the cases and 77% of the deceased were males. People aged between 21 and 40 years were most vulnerable to the coronavirus and most of the fatalities (51.49%) were in the 60+ population. A strong positive correlation (0.93) between the number of tests and confirmed cases and a constant incidence rate (around 21%) from June 1 to August 31, 2020 was observed. The case fatality ratio was between 1 and 2. The number of cases and the number of deaths in Bangladesh were much lower compared to other countries.Conclusions: This study will help to understand the patterns of spread and transition in Bangladesh, possible measures, effectiveness of the preparedness, implementation gaps, and their consequences to gather vital information and prevent future pandemics.
Facebook
TwitterTime series data on COVID-19 diagnosis were collected from the daily reported cases published in the website of Institution of Epidemiology, Disease Control and Research (IEDCR) and Corona info, Directorate General of Health Service (DGHS), Ministry of Health and Family Welfare (MoHFW), Bangladesh. All cases were laboratory-confirmed following the suspected case definition of the World Health Organization (WHO) and the National guidelines on clinical management of coronavirus disease 2019 (COVID-19).
Facebook
TwitterOn 31 December 2019, WHO was informed of a cluster of cases of pneumonia of unknown cause detected in Wuhan City, Hubei Province of China. WHO is closely monitoring this event and is in active communication with counterparts in China. The COVID-19 pandemic in Bangladesh is part of the worldwide pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The virus was confirmed to have spread to Bangladesh in March 2020. The first three known cases were reported on 8 March 2020 by the country's epidemiology institute, IEDCR. Since then, the pandemic has spread day by day over the whole nation and the number of affected people has been increasing.
The dataset contains all data from the date of June 1, 2020 to June 30, 2020.
Date- Specific Date Confirmed - The number of confirmed cases Deaths- The number of death cases
As the dataset contains datewise updates of the coronavirus cases in Bangladesh, feel free to prepare meaningful insights from the data. Share and collaborate to find the factors of pandemic for Bangladesh, make time series calculation and so on.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterThe objective of the study was to assess the prevalence and factors associated with household (HH) handwashing practice in Bangladesh and draw a trend of COVID-19 spreads and compare that with the countrywide HH handwashing practice. The study is based on the two national representative publicly available datasets (MICS 2019, and confirmed cases of COVID-19). Of 61,209 (weighted) HH, the overall prevalence of HH handwashing was found 56.3%, and the prevalence was significantly varied across the socio-economic status of the HH. Map comparison suggested that the gradual increasing trend of COVID-19 cases in areas where HH handwashing practice is low. The northern part of Bangladesh had the highest handwashing practice, whereas it had less effected by COVID-19 cases. However, central Bangladesh had the hardest hit by COVID-19 cases, and it had around 50% handwashing practice coverage. Large-scale observational study is necessary to establish the causality.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The covid-19 pandemic has taken over all world. In this dataset, i want to dive in the daily cases of my country, Bangladesh to get more insights about this pandemic.
This dataset consists of 646 rows and 6 columns. The columns contains dates, lab tests of the specific dates, confirmed cases(positive cases), positivity rate, total number of deaths and death rate. The data starts from 3rd April,2020 and 8th January,2022.
I wanna know which upcoming months are risky compared to previous dataset. And which months should be in total lock-down in respect to this dataset.
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Here I upload my all research data of COVID-19 outbreak in Bangladesh from 8th March to 30th July cumulative confirmed cases by districts wise. Here also population density by districts wise
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This Project Tycho dataset includes a CSV file with COVID-19 data reported in BANGLADESH: 2020-01-03 - 2021-07-31. It contains counts of cases and deaths. Data for this Project Tycho dataset comes from: "COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University", "European Centre for Disease Prevention and Control Website", "World Health Organization COVID-19 Dashboard". The data have been pre-processed into the standard Project Tycho data format v1.1.
Facebook
TwitterAccording to WHO Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with the COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illnesses.
This dataset has daily level information on the number of confirmed cases, deaths and recovery cases from 2019 novel coronavirus in Bangladesh. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.
Github repository of this dataset is here
Filename is COVID-19_in_bd.csv(updated) - Date- Daily Cases - Confirmed - Cumulative number of confirmed cases till that date - Recovered - Cumulative number of recovered till that date - Deaths- Cumulative number of deaths till that date
Some insights could be 1. Mortality rate over time 2. Exponential growth 3. Changes in the number of affected cases over time 4. The latest number of affected cases
Facebook
TwitterIntroductionEvery year around 150,000 pilgrims from Bangladesh perform Umrah and Hajj. Emergence and continuous reporting of MERS-CoV infection in Saudi Arabia emphasize the need for surveillance of MERS-CoV in returning pilgrims or travelers from the Middle East and capacity building of health care providers for disease containment. The Institute of Epidemiology, Disease Control & Research (IEDCR) under the Bangladesh Ministry of Health and Family welfare (MoHFW), is responsible for MERS-CoV screening of pilgrims/ travelers returning from the Middle East with respiratory illness as part of its outbreak investigation and surveillance activities.MethodsBangladeshi travelers/pilgrims who returned from the Middle East and presented with fever and respiratory symptoms were studied over the period from October 2013 to June 2016. Patients with respiratory symptoms that fulfilled the WHO MERS-CoV case algorithm were tested for MERS-CoV and other respiratory tract viruses. Beside surveillance, case recognition training was conducted at multiple levels of health care facilities across the country in support of early detection and containment of the disease.ResultsEighty one suspected cases tested by real time PCR resulted in zero detection of MERS-CoV infection. Viral etiology detected in 29.6% of the cases was predominantly influenza A (H1N1 and H3N2), and influenza B infection (22%). Peak testing occurred mostly following the annual Hajj season.ConclusionsRespiratory tract infections in travelers/pilgrims returning to Bangladesh from the Middle East are mainly due to influenza A and influenza B. Though MERS-CoV was not detected in the 81 patients tested, continuous screening and surveillance are essential for early detection of MERS-CoV infection and other respiratory pathogens to prevent transmissions in hospital settings and within communities. Awareness building among healthcare providers will help identify suspected cases.
Facebook
TwitterOn March 10, 2023, the Johns Hopkins Coronavirus Resource Center ceased its collecting and reporting of global COVID-19 data. For updated cases, deaths, and vaccine data please visit the following sources: World Health Organization (WHO)For more information, visit the Johns Hopkins Coronavirus Resource Center.-- Esri COVID-19 Trend Report for 3-9-2023 --0 Countries have Emergent trend with more than 10 days of cases: (name : # of active cases) 41 Countries have Spreading trend with over 21 days in new cases curve tail: (name : # of active cases)Monaco : 13, Andorra : 25, Marshall Islands : 52, Kyrgyzstan : 79, Cuba : 82, Saint Lucia : 127, Cote d'Ivoire : 148, Albania : 155, Bosnia and Herzegovina : 172, Iceland : 196, Mali : 198, Suriname : 246, Botswana : 247, Barbados : 274, Dominican Republic : 304, Malta : 306, Venezuela : 334, Micronesia : 346, Uzbekistan : 356, Afghanistan : 371, Jamaica : 390, Latvia : 402, Mozambique : 406, Kosovo : 412, Azerbaijan : 427, Tunisia : 528, Armenia : 594, Kuwait : 716, Thailand : 746, Norway : 768, Croatia : 847, Honduras : 1002, Zimbabwe : 1067, Saudi Arabia : 1098, Bulgaria : 1148, Zambia : 1166, Panama : 1300, Uruguay : 1483, Kazakhstan : 1671, Paraguay : 2080, Ecuador : 53320 Countries may have Spreading trend with under 21 days in new cases curve tail: (name : # of active cases)61 Countries have Epidemic trend with over 21 days in new cases curve tail: (name : # of active cases)Liechtenstein : 48, San Marino : 111, Mauritius : 742, Estonia : 761, Trinidad and Tobago : 1296, Montenegro : 1486, Luxembourg : 1540, Qatar : 1541, Philippines : 1915, Ireland : 1946, Brunei : 2010, United Arab Emirates : 2013, Denmark : 2111, Sweden : 2149, Finland : 2154, Hungary : 2169, Lebanon : 2208, Bolivia : 2838, Colombia : 3250, Switzerland : 3321, Peru : 3328, Slovakia : 3556, Malaysia : 3608, Indonesia : 3793, Portugal : 4049, Cyprus : 4279, Argentina : 5050, Iran : 5135, Lithuania : 5323, Guatemala : 5516, Slovenia : 5689, South Africa : 6604, Georgia : 7938, Moldova : 8082, Israel : 8746, Bahrain : 8932, Netherlands : 9710, Romania : 12375, Costa Rica : 12625, Singapore : 13816, Serbia : 14093, Czechia : 14897, Spain : 17399, Ukraine : 19568, Canada : 24913, New Zealand : 25136, Belgium : 30599, Poland : 38894, Chile : 41055, Australia : 50192, Mexico : 65453, United Kingdom : 65697, France : 68318, Italy : 70391, Austria : 90483, Brazil : 134279, Korea - South : 209145, Russia : 214935, Germany : 257248, Japan : 361884, US : 6440500 Countries may have Epidemic trend with under 21 days in new cases curve tail: (name : # of active cases) 54 Countries have Controlled trend: (name : # of active cases)Palau : 3, Saint Kitts and Nevis : 4, Guinea-Bissau : 7, Cabo Verde : 8, Mongolia : 8, Benin : 9, Maldives : 10, Comoros : 10, Gambia : 12, Bhutan : 14, Cambodia : 14, Syria : 14, Seychelles : 15, Senegal : 16, Libya : 16, Laos : 17, Sri Lanka : 19, Congo (Brazzaville) : 19, Tonga : 21, Liberia : 24, Chad : 25, Fiji : 26, Nepal : 27, Togo : 30, Nicaragua : 32, Madagascar : 37, Sudan : 38, Papua New Guinea : 38, Belize : 59, Egypt : 60, Algeria : 64, Burma : 65, Ghana : 72, Haiti : 74, Eswatini : 75, Guyana : 79, Rwanda : 83, Uganda : 88, Kenya : 92, Burundi : 94, Angola : 98, Congo (Kinshasa) : 125, Morocco : 125, Bangladesh : 127, Tanzania : 128, Nigeria : 135, Malawi : 148, Ethiopia : 248, Vietnam : 269, Namibia : 422, Cameroon : 462, Pakistan : 660, India : 4290 41 Countries have End Stage trend: (name : # of active cases)Sao Tome and Principe : 1, Saint Vincent and the Grenadines : 2, Somalia : 2, Timor-Leste : 2, Kiribati : 8, Mauritania : 12, Oman : 14, Equatorial Guinea : 20, Guinea : 28, Burkina Faso : 32, North Macedonia : 351, Nauru : 479, Samoa : 554, China : 2897, Taiwan* : 249634 -- SPIKING OF NEW CASE COUNTS --20 countries are currently experiencing spikes in new confirmed cases:Armenia, Barbados, Belgium, Brunei, Chile, Costa Rica, Georgia, India, Indonesia, Ireland, Israel, Kuwait, Luxembourg, Malaysia, Mauritius, Portugal, Sweden, Ukraine, United Kingdom, Uzbekistan 20 countries experienced a spike in new confirmed cases 3 to 5 days ago: Argentina, Bulgaria, Croatia, Czechia, Denmark, Estonia, France, Korea - South, Lithuania, Mozambique, New Zealand, Panama, Poland, Qatar, Romania, Slovakia, Slovenia, Switzerland, Trinidad and Tobago, United Arab Emirates 47 countries experienced a spike in new confirmed cases 5 to 14 days ago: Australia, Austria, Bahrain, Bolivia, Brazil, Canada, Colombia, Congo (Kinshasa), Cyprus, Dominican Republic, Ecuador, Finland, Germany, Guatemala, Honduras, Hungary, Iran, Italy, Jamaica, Japan, Kazakhstan, Lebanon, Malta, Mexico, Micronesia, Moldova, Montenegro, Netherlands, Nigeria, Pakistan, Paraguay, Peru, Philippines, Russia, Saint Lucia, Saudi Arabia, Serbia, Singapore, South Africa, Spain, Suriname, Thailand, Tunisia, US, Uruguay, Zambia, Zimbabwe 194 countries experienced a spike in new confirmed cases over 14 days ago: Afghanistan, Albania, Algeria, Andorra, Angola, Antigua and Barbuda, Argentina, Armenia, Australia, Austria, Azerbaijan, Bahamas, Bahrain, Bangladesh, Barbados, Belarus, Belgium, Belize, Benin, Bhutan, Bolivia, Bosnia and Herzegovina, Botswana, Brazil, Brunei, Bulgaria, Burkina Faso, Burma, Burundi, Cabo Verde, Cambodia, Cameroon, Canada, Central African Republic, Chad, Chile, China, Colombia, Comoros, Congo (Brazzaville), Congo (Kinshasa), Costa Rica, Cote d'Ivoire, Croatia, Cuba, Cyprus, Czechia, Denmark, Djibouti, Dominica, Dominican Republic, Ecuador, Egypt, El Salvador, Equatorial Guinea, Eritrea, Estonia, Eswatini, Ethiopia, Fiji, Finland, France, Gabon, Gambia, Georgia, Germany, Ghana, Greece, Grenada, Guatemala, Guinea, Guinea-Bissau, Guyana, Haiti, Honduras, Hungary, Iceland, India, Indonesia, Iran, Iraq, Ireland, Israel, Italy, Jamaica, Japan, Jordan, Kazakhstan, Kenya, Kiribati, Korea - South, Kosovo, Kuwait, Kyrgyzstan, Laos, Latvia, Lebanon, Lesotho, Liberia, Libya, Liechtenstein, Lithuania, Luxembourg, Madagascar, Malawi, Malaysia, Maldives, Mali, Malta, Marshall Islands, Mauritania, Mauritius, Mexico, Micronesia, Moldova, Monaco, Mongolia, Montenegro, Morocco, Mozambique, Namibia, Nauru, Nepal, Netherlands, New Zealand, Nicaragua, Niger, Nigeria, North Macedonia, Norway, Oman, Pakistan, Palau, Panama, Papua New Guinea, Paraguay, Peru, Philippines, Poland, Portugal, Qatar, Romania, Russia, Rwanda, Saint Kitts and Nevis, Saint Lucia, Saint Vincent and the Grenadines, Samoa, San Marino, Sao Tome and Principe, Saudi Arabia, Senegal, Serbia, Seychelles, Sierra Leone, Singapore, Slovakia, Slovenia, Solomon Islands, Somalia, South Africa, South Sudan, Spain, Sri Lanka, Sudan, Suriname, Sweden, Switzerland, Syria, Taiwan*, Tajikistan, Tanzania, Thailand, Timor-Leste, Togo, Tonga, Trinidad and Tobago, Tunisia, Turkey, Tuvalu, US, Uganda, Ukraine, United Arab Emirates, United Kingdom, Uruguay, Uzbekistan, Vanuatu, Venezuela, Vietnam, West Bank and Gaza, Yemen, Zambia, Zimbabwe Strongest spike in past two days was in US at 64,861 new cases.Strongest spike in past five days was in US at 64,861 new cases.Strongest spike in outbreak was 424 days ago in US at 1,354,505 new cases. Global Total Confirmed COVID-19 Case Rate of 8620.91 per 100,000Global Active Confirmed COVID-19 Case Rate of 37.24 per 100,000Global COVID-19 Mortality Rate of 87.69 per 100,000 21 countries with over 200 per 100,000 active cases.5 countries with over 500 per 100,000 active cases.3 countries with over 1,000 per 100,000 active cases.1 country with over 2,000 per 100,000 active cases.Nauru is worst at 4,354.54 per 100,000.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.
However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.
2 Data-set Introduction
2.1 Data Collection
We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:
The headline must have one or more words directly or indirectly related to COVID-19.
The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.
The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.
Avoid taking duplicate reports.
Maintain a time frame for the above mentioned newspapers.
To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.
2.2 Data Pre-processing and Statistics
Some pre-processing steps performed on the newspaper report dataset are as follows:
Remove hyperlinks.
Remove non-English alphanumeric characters.
Remove stop words.
Lemmatize text.
While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.
The primary data statistics of the two dataset are shown in Table 1 and 2.
Table 1: Covid-News-USA-NNK data statistics
No of words per headline
7 to 20
No of words per body content
150 to 2100
Table 2: Covid-News-BD-NNK data statistics No of words per headline
10 to 20
No of words per body content
100 to 1500
2.3 Dataset Repository
We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.
3 Literature Review
Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.
Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].
Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.
Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.
4 Our experiments and Result analysis
We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:
In February, both the news paper have talked about China and source of the outbreak.
StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.
Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.
Washington Post discussed global issues more than StarTribune.
StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.
While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.
We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases
where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data consist of COVID-19 cases and its relevant parameters for a few countries including: Pakistan, Bangladesh, India and Afghanistan on daily basis from December 31, 2019 to August 19, 2020 acquired from https://ourworldindata.org/coronavirus-source-data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bangladesh recorded 29223 Coronavirus Deaths since the epidemic began, according to the World Health Organization (WHO). In addition, Bangladesh reported 2038539 Coronavirus Cases. This dataset includes a chart with historical data for Bangladesh Coronavirus Deaths.