Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Gross Domestic Product per capita in China was last recorded at 13121.68 US dollars in 2024. The GDP per Capita in China is equivalent to 104 percent of the world's average. This dataset provides - China GDP per capita - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Gross Domestic Product (GDP) in China expanded 5.20 percent in the second quarter of 2025 over the same quarter of the previous year. This dataset provides - China GDP Annual Growth Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Brazilian and Indian share prices became the highest performing of the major developed and emerging economies as of June 2023, with index values of 235.25 and 230.91 respectively in that month. Conversely, the lowest-performing were China and the Germany, both with index values of 86.98 and 113.04 respectively at this time. The index value is calculated with 2015 values as the baseline (i.e. 2015 = 100).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.
However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.
2 Data-set Introduction
2.1 Data Collection
We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:
The headline must have one or more words directly or indirectly related to COVID-19.
The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.
The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.
Avoid taking duplicate reports.
Maintain a time frame for the above mentioned newspapers.
To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.
2.2 Data Pre-processing and Statistics
Some pre-processing steps performed on the newspaper report dataset are as follows:
Remove hyperlinks.
Remove non-English alphanumeric characters.
Remove stop words.
Lemmatize text.
While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.
The primary data statistics of the two dataset are shown in Table 1 and 2.
Table 1: Covid-News-USA-NNK data statistics
No of words per headline
7 to 20
No of words per body content
150 to 2100
Table 2: Covid-News-BD-NNK data statistics No of words per headline
10 to 20
No of words per body content
100 to 1500
2.3 Dataset Repository
We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.
3 Literature Review
Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.
Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].
Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.
Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.
4 Our experiments and Result analysis
We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:
In February, both the news paper have talked about China and source of the outbreak.
StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.
Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.
Washington Post discussed global issues more than StarTribune.
StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.
While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.
We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases
where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
Attitudes to current national and international problems. Topics: effort for world peace; help for FRG in the case of conflict; greatest economic power; greatest military power; trust in USA, China and USSR in the treatment of world problems; agreement of fundamental interests of FRG and China, USA and USSR; threat to FRG from China; trip of US president Nixon in China (knowledge, opinion on it, expected results, effect on FRG); trip of US president Nixon in the USSR (knowledge; opinion on it, expected results, effect on FRG); interest of USA in European questions. Demography: age; marital status; religious denomination; frequency of church attendance; education; occupation; income; sex; city size; state. Also encoded was: length of interview; number of visits; presence of others; willingness to cooperate; degree of difficulty; date of interview. Einstellungen zu aktuellen nationalen und internationalen Problemen. Themen: Bemühen um den Weltfrieden; Hilfe für die BRD im Konfliktfall; größte Wirtschaftsmacht; größte Militärmacht; Vertrauen in USA, China und UdSSR bei der Behandlung der Weltprobleme; Übereinstimmung grundlegender Interessen der BRD und China, der USA und der UdSSR; Bedrohung der BRD durch China; Besuch von US-Präsident Nixon in China (Kenntnis, Meinung dazu, erwartete Ergebnisse, Auswirkung auf BRD); Besuch von US-Präsident Nixon in der UdSSR (Kenntnis; Meinung dazu, erwartete Ergebnisse, Auswirkung auf BRD); Interesse der USA an europäischen Fragen. Demographie: Alter; Familienstand; Konfession; Kirchgangshäufigkeit; Bildung; Beruf; Einkommen; Geschlecht; Ortsgröße; Bundesland. Zusätzlich verkodet wurden: Interviewdauer; Anzahl der Besuche; Anwesenheit anderer; Kooperationsbereitschaft; Schwierigkeitsgrad; Interviewdatum. Quota sample (age; profession; education; size of place of residence; region). Quotenstichprobe (Alter; Berufsstand; Bildung; Wohnortsgröße; Region).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China recorded a trade surplus of 98.24 USD Billion in July of 2025. This dataset provides - China Balance of Trade - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Money Supply M2 in China decreased to 329940 CNY Billion in July from 330332.50 CNY Billion in June of 2025. This dataset provides - China Money Supply M2 - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The USD/CNY exchange rate rose to 7.1389 on September 2, 2025, up 0.08% from the previous session. Over the past month, the Chinese Yuan has strengthened 0.58%, but it's down by 0.29% over the last 12 months. Chinese Yuan - values, historical data, forecasts and news - updated on September of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The United States recorded a trade deficit of 60.18 USD Billion in June of 2025. This dataset provides the latest reported value for - United States Balance of Trade - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Exports to United States was US$525.65 Billion during 2024, according to the United Nations COMTRADE database on international trade. China Exports to United States - data, historical chart and statistics - was last updated on August of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Inflation Rate in China decreased to 0 percent in July from 0.10 percent in June of 2025. This dataset provides - China Inflation Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Foreign Exchange Reserves in China decreased to 3292000 USD Million in July from 3317000 USD Million in June of 2025. This dataset provides - China Foreign Exchange Reserves - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States Imports from China was US$462.62 Billion during 2024, according to the United Nations COMTRADE database on international trade. United States Imports from China - data, historical chart and statistics - was last updated on August of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Exports in China increased to 325.18 USD Billion in June from 316.10 USD Billion in May of 2025. This dataset provides - China Exports - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Imports from United States was US$164.59 Billion during 2024, according to the United Nations COMTRADE database on international trade. China Imports from United States - data, historical chart and statistics - was last updated on September of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Exports to United States in China decreased to 30786817 USD Thousand in February from 42633059 USD Thousand in January of 2024. This dataset includes a chart with historical data for China Exports To Us.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States Exports to China was US$143.55 Billion during 2024, according to the United Nations COMTRADE database on international trade. United States Exports to China - data, historical chart and statistics - was last updated on August of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Imports from China in the United States decreased to 31894.80 USD Million in February from 35793.58 USD Million in January of 2024. This dataset includes a chart with historical data for the United States Imports from China.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for MANUFACTURING PMI reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Aluminum fell to 2,617.40 USD/T on September 1, 2025, down 0.11% from the previous day. Over the past month, Aluminum's price has risen 2.06%, and is up 7.98% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks the benchmark market for this commodity. Aluminum - values, historical data, forecasts and news - updated on September of 2025.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Gross Domestic Product per capita in China was last recorded at 13121.68 US dollars in 2024. The GDP per Capita in China is equivalent to 104 percent of the world's average. This dataset provides - China GDP per capita - actual values, historical data, forecast, chart, statistics, economic calendar and news.