https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
****Dataset Overview**** This dataset contains historical macroeconomic data, featuring key economic indicators in the United States. It includes important metrics such as the Consumer Price Index (CPI), Retail Sales, Unemployment Rate, Industrial Production, Money Supply (M2), and more. The dataset spans from 1993 to the present and includes monthly data on various economic indicators, processed to show their rate of change (either percentage or absolute difference, depending on the indicator).
provenance
The data in this dataset is sourced from the Federal Reserve Economic Data (FRED) database, hosted by the Federal Reserve Bank of St. Louis. FRED provides access to a wide range of economic data, including key macroeconomic indicators for the United States. My work involved calculating the rate of change (ROC) for each indicator and reorganizing the data into a more usable format for analysis. For more information and access to the full database, visit FRED's website.
Purpose and Use for the Kaggle Community:
This dataset is a valuable resource for data scientists, economists, and analysts interested in understanding macroeconomic trends, performing time series analysis, or building predictive models. With the rate of change included, users can quickly assess the growth or contraction in these indicators month-over-month. This dataset can be used for:
****Column Descriptions****
Year: The year of the observation.
Month: The month of the observation (1-12).
Industrial Production: Monthly data on the total output of US factories, mines, and utilities.
Manufacturers' New Orders: Durable Goods: Measures the value of new orders placed with manufacturers for durable goods, indicating future production activity.
Consumer Price Index (CPIAUCSL): A measure of the average change over time in the prices paid by urban consumers for a market basket of consumer goods and services.
Unemployment Rate: The percentage of the total labor force that is unemployed but actively seeking employment.
Retail Sales: The total receipts of retail stores, indicating consumer spending and economic activity.
Producer Price Index: Measures the average change over time in the selling prices received by domestic producers for their output.
Personal Consumption Expenditures (PCE): A measure of the prices paid by consumers for goods and services, used in calculating inflation.
National Home Price Index: A measure of changes in residential real estate prices across the country.
All Employees, Total Nonfarm: The number of nonfarm payroll employees, an important indicator of the labor market.
Labor Force Participation Rate: The percentage of the working-age population that is either employed or actively looking for work.
Federal Funds Effective Rate: The interest rate at which depository institutions lend reserve balances to other depository institutions overnight.
Building Permits: The number of building permits issued for residential and non-residential buildings, a leading indicator of construction activity.
Money Supply (M2): The total money supply, including cash, checking deposits, and easily convertible near money.
Personal Income: The total income received by individuals from all sources, including wages, investments, and government transfers.
Trade Balance: The difference between a country's imports and exports, indicating the net trade flow.
Consumer Sentiment: The index reflecting consumer sentiment and expectations for the future economic outlook.
Consumer Confidence: A measure of how optimistic or pessimistic consumers are regarding their expected financial situation and the economy.
Notes on Interest Rates Please note that for the Federal Funds Effective Rate (FEDFUNDS), the dataset includes the absolute change in basis points (bps), not the rate of change. This means that the dataset reflects the direct change in the interest rate rather than the percentage change month-over-month. The change is represented in basis points, where 1 basis point equals 0.01%.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data file consists of the all tweets made by @realDonald Trump, those determined to be economy-related, their assigned sentiment (positive, negative, neutral).The data file also contains the data in a day-to-day format, with the total tweets made per day, the total economy-related tweets per day, the % of tweets about the economy for each day, and the percent change in the S&P 500 and VIX for each day.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Economic Optimism Index in the United States increased to 49.20 points in June from 47.90 points in May of 2025. This dataset provides the latest reported value for - United States IBD/TIPP Economic Optimism Index - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Consumer Confidence in the United States increased to 60.70 points in June from 52.20 points in May of 2025. This dataset provides the latest reported value for - United States Consumer Sentiment - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Techsalerator's News Events Data for North Macedonia: A Comprehensive Overview
Techsalerator's News Events Data for North Macedonia offers a robust resource for businesses, researchers, and media organizations. This dataset aggregates information on significant news events across North Macedonia, drawing from a diverse range of media sources, including news outlets, online publications, and social platforms. It provides valuable insights for those interested in tracking trends, analyzing public sentiment, or monitoring industry-specific developments.
Key Data Fields - Event Date: Records the exact date of the news event, essential for analyzing trends over time or responding to market changes. - Event Title: A concise headline summarizing the event, allowing users to quickly categorize and evaluate news content based on relevance. - Source: Indicates the news outlet or platform where the event was reported, helping users track credible sources and measure the event’s reach and impact. - Location: Provides geographic information on where the event occurred within North Macedonia, useful for regional analysis or targeted marketing. - Event Description: A detailed summary of the event, including key developments, participants, and potential impacts, aiding in understanding the context and consequences.
Top 5 News Categories in North Macedonia - Politics: Coverage of government decisions, political movements, elections, and policy changes affecting the national landscape. - Economy: Focuses on economic indicators, inflation rates, international trade, and corporate activities influencing the business and finance sectors. - Social Issues: News related to protests, public health, education, and other societal concerns driving public discourse. - Sports: Highlights events in popular sports, often drawing significant attention and engagement across the country. - Technology and Innovation: Reports on tech developments, startups, and advancements in North Macedonia’s growing tech ecosystem.
Top 5 News Sources in North Macedonia - Macedonia.net: A leading news outlet providing comprehensive coverage of national politics, economy, and social issues. - Meta.mk: A prominent online news platform known for timely updates on breaking news, politics, and current affairs. - Dnevnik: A widely-read newspaper offering insights into local politics, economic developments, and societal trends. - Sitel: A significant news source covering a broad range of topics, including politics, economy, and social issues. - 24 Vesti: The national news agency delivering updates on major events, public health, and sports across North Macedonia.
Accessing Techsalerator’s News Events Data for North Macedonia To access Techsalerator’s News Events Data for North Macedonia, please contact info@techsalerator.com with your specific needs. We will provide a customized quote based on the data fields and records you require, with delivery available within 24 hours. Ongoing access options can also be discussed.
Included Data Fields - Event Date - Event Title - Source - Location - Event Description - Event Category (Politics, Economy, Sports, etc.) - Participants (if applicable) - Event Impact (Social, Economic, etc.)
Techsalerator’s dataset is a valuable tool for tracking significant events in North Macedonia. It supports informed decision-making for business strategy, market analysis, and academic research, offering a clear view of the country's news landscape.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Techsalerator's News Events Data for Central African Republic: A Comprehensive Overview
Techsalerator's News Events Data for Central African Republic provides a robust resource for businesses, researchers, and media organizations. This dataset aggregates information on significant news events across the Central African Republic, drawing from diverse media sources, including news outlets, online publications, and social platforms. It offers valuable insights for those aiming to track trends, analyze public sentiment, or monitor industry-specific developments.
Key Data Fields Event Date: Captures the exact date of the news event, crucial for tracking trends over time or for businesses responding to market shifts.
Event Title: A brief headline describing the event, allowing users to quickly categorize and assess news content based on relevance to their interests.
Source: Identifies the news outlet or platform where the event was reported, helping users track credible sources and assess the reach and influence of the event.
Location: Provides geographic information on where the event took place within Central African Republic, valuable for regional analysis or localized marketing efforts.
Event Description: A detailed summary of the event, outlining key developments, participants, and potential impact, helping researchers and businesses understand the context and implications of the event.
Top 5 News Categories in Central African Republic Politics: Major news coverage on government decisions, political movements, elections, and policy changes that affect the national landscape.
Economy: Focuses on Central African Republic’s economic indicators, inflation rates, international trade, and corporate activities influencing business and finance sectors.
Social Issues: News events covering protests, public health, education, and other societal concerns that drive public discourse.
Sports: Highlights events in popular sports, often drawing widespread attention and engagement across the country.
Technology and Innovation: Reports on tech developments, startups, and innovations within the Central African Republic’s emerging tech ecosystem.
Top 5 News Sources in Central African Republic Radio Ndeke Luka: A leading source of news and information, providing extensive coverage of political, economic, and social issues.
Centrafrique Presse: A prominent outlet offering news on national affairs, including politics, economy, and societal concerns.
RFI (Radio France Internationale): An international broadcaster providing updates on major events and developments in Central African Republic.
Le Démocrate: A local publication focusing on national and regional news, including political, economic, and social topics.
TV Centrafrique: A national television network offering coverage of current affairs, including politics, sports, and major events.
Accessing Techsalerator’s News Events Data for Central African Republic To access Techsalerator’s News Events Data for Central African Republic, please contact info@techsalerator.com with your specific needs. We will provide a customized quote based on the data fields and records you require, with delivery available within 24 hours. Ongoing access options can also be discussed.
Included Data Fields Event Date Event Title Source Location Event Description Event Category (Politics, Economy, Sports, etc.) Participants (if applicable) Event Impact (Social, Economic, etc.)
Techsalerator’s dataset is a valuable tool for keeping track of significant events in Central African Republic. It supports informed decision-making for business strategy, market analysis, or academic research, providing a comprehensive view of the country’s news landscape.
The Associated Press is sharing data from the COVID Impact Survey, which provides statistics about physical health, mental health, economic security and social dynamics related to the coronavirus pandemic in the United States.
Conducted by NORC at the University of Chicago for the Data Foundation, the probability-based survey provides estimates for the United States as a whole, as well as in 10 states (California, Colorado, Florida, Louisiana, Minnesota, Missouri, Montana, New York, Oregon and Texas) and eight metropolitan areas (Atlanta, Baltimore, Birmingham, Chicago, Cleveland, Columbus, Phoenix and Pittsburgh).
The survey is designed to allow for an ongoing gauge of public perception, health and economic status to see what is shifting during the pandemic. When multiple sets of data are available, it will allow for the tracking of how issues ranging from COVID-19 symptoms to economic status change over time.
The survey is focused on three core areas of research:
Instead, use our queries linked below or statistical software such as R or SPSS to weight the data.
If you'd like to create a table to see how people nationally or in your state or city feel about a topic in the survey, use the survey questionnaire and codebook to match a question (the variable label) to a variable name. For instance, "How often have you felt lonely in the past 7 days?" is variable "soc5c".
Nationally: Go to this query and enter soc5c as the variable. Hit the blue Run Query button in the upper right hand corner.
Local or State: To find figures for that response in a specific state, go to this query and type in a state name and soc5c as the variable, and then hit the blue Run Query button in the upper right hand corner.
The resulting sentence you could write out of these queries is: "People in some states are less likely to report loneliness than others. For example, 66% of Louisianans report feeling lonely on none of the last seven days, compared with 52% of Californians. Nationally, 60% of people said they hadn't felt lonely."
The margin of error for the national and regional surveys is found in the attached methods statement. You will need the margin of error to determine if the comparisons are statistically significant. If the difference is:
The survey data will be provided under embargo in both comma-delimited and statistical formats.
Each set of survey data will be numbered and have the date the embargo lifts in front of it in the format of: 01_April_30_covid_impact_survey. The survey has been organized by the Data Foundation, a non-profit non-partisan think tank, and is sponsored by the Federal Reserve Bank of Minneapolis and the Packard Foundation. It is conducted by NORC at the University of Chicago, a non-partisan research organization. (NORC is not an abbreviation, it part of the organization's formal name.)
Data for the national estimates are collected using the AmeriSpeak Panel, NORC’s probability-based panel designed to be representative of the U.S. household population. Interviews are conducted with adults age 18 and over representing the 50 states and the District of Columbia. Panel members are randomly drawn from AmeriSpeak with a target of achieving 2,000 interviews in each survey. Invited panel members may complete the survey online or by telephone with an NORC telephone interviewer.
Once all the study data have been made final, an iterative raking process is used to adjust for any survey nonresponse as well as any noncoverage or under and oversampling resulting from the study specific sample design. Raking variables include age, gender, census division, race/ethnicity, education, and county groupings based on county level counts of the number of COVID-19 deaths. Demographic weighting variables were obtained from the 2020 Current Population Survey. The count of COVID-19 deaths by county was obtained from USA Facts. The weighted data reflect the U.S. population of adults age 18 and over.
Data for the regional estimates are collected using a multi-mode address-based (ABS) approach that allows residents of each area to complete the interview via web or with an NORC telephone interviewer. All sampled households are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Interviews are conducted with adults age 18 and over with a target of achieving 400 interviews in each region in each survey.Additional details on the survey methodology and the survey questionnaire are attached below or can be found at https://www.covid-impact.org.
Results should be credited to the COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Consumer Spending in the United States increased to 16291.80 USD Billion in the first quarter of 2025 from 16273.20 USD Billion in the fourth quarter of 2024. This dataset provides the latest reported value for - United States Consumer Spending - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Techsalerator's News Events Data for Rwanda: A Comprehensive Overview
Techsalerator's News Events Data for Rwanda provides a valuable resource for businesses, researchers, and media organizations. This dataset aggregates information on significant news events across Rwanda, sourced from a variety of media outlets, online publications, and social platforms. It offers essential insights for those interested in tracking trends, analyzing public sentiment, or monitoring sector-specific developments.
Key Data Fields - Event Date: Records the exact date of the news event, crucial for tracking trends over time or responding to market changes. - Event Title: Provides a brief headline describing the event, allowing users to quickly assess and categorize news content based on relevance. - Source: Identifies the news outlet or platform where the event was reported, helping users track credible sources and gauge the event's reach and influence. - Location: Offers geographic details indicating where the event occurred within Rwanda, useful for regional analysis or targeted marketing. - Event Description: Includes a detailed summary of the event, outlining key developments, participants, and potential impact. This information aids in understanding the context and implications of the event.
Top 5 News Categories in Rwanda - Politics: Covers significant political events, government decisions, elections, and policy changes affecting the nation. - Economy: Focuses on economic indicators, inflation rates, trade activities, and corporate developments influencing Rwanda’s business and finance sectors. - Social Issues: Includes news on public health, education, social justice, and other societal matters driving public discourse. - Sports: Highlights major sporting events and achievements in football, athletics, and other popular sports, drawing national interest. - Technology and Innovation: Reports on advancements in Rwanda’s tech sector, including startup activities and technological innovations.
Top 5 News Sources in Rwanda - The New Times: A leading news outlet providing extensive coverage of politics, economy, and social issues in Rwanda. - Rwanda Broadcasting Agency (RBA): The national broadcaster known for its comprehensive updates on news, politics, and current affairs. - IGIHE: A prominent online news platform offering timely reports on various topics, including politics and societal issues. - KT Press: A widely-read publication covering news related to politics, economy, and social matters. - Daily Monitor: A key source of news providing insights into Rwandan politics, economy, and regional developments.
Accessing Techsalerator’s News Events Data for Rwanda To access Techsalerator’s News Events Data for Rwanda, please reach out to info@techsalerator.com with your specific requirements. We will provide a customized quote based on the data fields and records needed, with delivery available within 24 hours. Ongoing access options can also be discussed.
Included Data Fields - Event Date - Event Title - Source - Location - Event Description - Event Category (Politics, Economy, Sports, etc.) - Participants (if applicable) - Event Impact (Social, Economic, etc.)
Techsalerator’s dataset is a crucial tool for tracking significant events in Rwanda, aiding in decision-making, market analysis, or academic research by providing a clear overview of the country's news landscape.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Employment Rate in the United States decreased to 59.70 percent in May from 60 percent in April of 2025. This dataset provides - United States Employment Rate- actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
The key arguments for the low utilization of statistical techniques in financial sentiment analysis have been the difficulty of implementation for practical applications and the lack of high quality training data for building such models. Especially in the case of finance and economic texts, annotated collections are a scarce resource and many are reserved for proprietary use only. To resolve the missing training data problem, we present a collection of ∼ 5000 sentences to establish human-annotated standards for benchmarking alternative modeling techniques.
The objective of the phrase level annotation task was to classify each example sentence into a positive, negative or neutral category by considering only the information explicitly available in the given sentence. Since the study is focused only on financial and economic domains, the annotators were asked to consider the sentences from the view point of an investor only; i.e. whether the news may have positive, negative or neutral influence on the stock price. As a result, sentences which have a sentiment that is not relevant from an economic or financial perspective are considered neutral.
This release of the financial phrase bank covers a collection of 4840 sentences. The selected collection of phrases was annotated by 16 people with adequate background knowledge on financial markets. Three of the annotators were researchers and the remaining 13 annotators were master’s students at Aalto University School of Business with majors primarily in finance, accounting, and economics.
Given the large number of overlapping annotations (5 to 8 annotations per sentence), there are several ways to define a majority vote based gold standard. To provide an objective comparison, we have formed 4 alternative reference datasets based on the strength of majority agreement: all annotators agree, >=75% of annotators agree, >=66% of annotators agree and >=50% of annotators agree.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COVID-19 affected the world’s economy severely and increased the inflation rate in both developed and developing countries. COVID-19 also affected the financial markets and crypto markets significantly, however, some crypto markets flourished and touched their peak during the pandemic era. This study performs an analysis of the impact of COVID-19 on public opinion and sentiments regarding the financial markets and crypto markets. It conducts sentiment analysis on tweets related to financial markets and crypto markets posted during COVID-19 peak days. Using sentiment analysis, it investigates the people’s sentiments regarding investment in these markets during COVID-19. In addition, damage analysis in terms of market value is also carried out along with the worse time for financial and crypto markets. For analysis, the data is extracted from Twitter using the SNSscraper library. This study proposes a hybrid model called CNN-LSTM (convolutional neural network-long short-term memory model) for sentiment classification. CNN-LSTM outperforms with 0.89, and 0.92 F1 Scores for crypto and financial markets, respectively. Moreover, topic extraction from the tweets is also performed along with the sentiments related to each topic.
If you’re wondering what happens when you cancel a Frontier Airlines flight booking, it’s best to dial ☎️+1 (888) 462-0708 for clarity. Whether you're seeking a Frontier Airlines refund request or want to understand the full Frontier Airlines cancellation policy, calling ☎️+1 (888) 462-0708 is a smart move.
Frontier's basic economy fares are known for their low cost but come with restrictions. If your plans change, you might not be eligible for a full refund. In most cases, these tickets are non-refundable unless you cancel within 24 hours of booking. If you want to Call Frontier Airlines for cancellation specifics or to speak with a Frontier Airlines representative phone number, remember the policy varies based on when you purchased your ticket. ☎️+1 (888) 462-0708 is your lifeline for immediate information.
You should act fast if you're canceling within 24 hours. This is often your only shot at a refund for cheap flights Frontier Airlines offers. The Frontier Airlines phone number USA is available to clarify exceptions, such as medical emergencies or military orders. Contacting ☎️+1 (888) 462-0708 gives you access to a Frontier Airlines live agent number, who can provide help with the Frontier Airlines cancel flight process. Don’t assume you’re out of options—verify first.
Another avenue is through the Frontier Airlines manage booking portal. If you created an account during the original Frontier Airlines reservation, log in and navigate to your trip. Select the cancel option if eligible. If unsure, ☎️+1 (888) 462-0708 will guide you. The Frontier Airlines customer service team can sometimes offer credit or partial refund depending on fare class. If you're confused about how this works, dial ☎️+1 (888) 462-0708 right away.
Travel plans change often, which is why it's useful to understand how to change Frontier Airlines flight date as an alternative to cancellation. Use the same number—☎️+1 (888) 462-0708—to check if you can rebook. Sometimes, paying a fee to modify Frontier Airlines booking makes more financial sense than canceling altogether. Speak with a Frontier Airlines live agent at ☎️+1 (888) 462-0708 to explore this path.
For those interested in rebooking instead of canceling, Frontier Airlines flight rebooking services are accessible via the same number. A Frontier Airlines live agent number can provide insight into any applicable fare differences. Again, dial ☎️+1 (888) 462-0708 to initiate this process. You’ll likely find that Frontier Airlines flight schedule flexibility is limited, but it’s not impossible to shift dates.
People who booked Frontier Airlines last minute flights often get caught in this situation. The low prices come with rigid rules. That’s why having ☎️+1 (888) 462-0708 on hand helps you deal with unexpected changes. It can also clarify if Frontier Airlines 24/7 support can offer any sort of relief or compensation. Always check with a live agent at ☎️+1 (888) 462-0708 before assuming you’re stuck.
You might qualify for a voucher instead of a refund. The Frontier Airlines complaint contact can process this if the website doesn’t help. Some customers have had luck by directly reaching a supervisor through ☎️+1 (888) 462-0708. These vouchers often have expiration dates and restrictions, so verify with a Frontier Airlines live agent by calling ☎️+1 (888) 462-0708.
If your cancellation is part of a broader itinerary like Frontier Airlines flights to Paris or other Frontier Airlines international flights, rules become stricter. Check if your ticket includes segments with codeshares. In such cases, dialing ☎️+1 (888) 462-0708 is even more critical. A Frontier Airlines representative phone number is your best resource for international fare cancellations. Agents at ☎️+1 (888) 462-0708 can help untangle complex trips.
While Frontier Airlines business class fares tend to be more flexible, basic economy is not. If you're comparing flexibility, call ☎️+1 (888) 462-0708 for a breakdown of policies. You might want to book Frontier Airlines flight in a higher class next time. The team at ☎️+1 (888) 462-0708 can also help you find Frontier Airlines flight deals with better refund options.
Lastly, stay proactive. Set a reminder the moment you make a reservation. If you even think there's a chance you’ll cancel, call ☎️+1 (888) 462-0708 to understand your options early. The Frontier Airlines customer service staff can walk you through protection plans or flexible ticket options. Keep ☎️+1 (888) 462-0708 bookmarked or saved in your contacts for future travel.
In summary, canceling a basic economy Frontier Airlines flight booking is often final unless you're within the 24-hour grace period. If not, consider alternatives like Frontier Airlines change flight or Frontier Airlines flight rebooking. For all of this, ☎️+1 (888) 462-0708 is the central resource. Use the Frontier Airlines phone number USA liberally; it’s your main channel to official support. Don’t guess—☎️+1 (888) 462-0708 ensures you get facts.
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
In 2020, global gross domestic product declined by 6.7 percent as a result of the coronavirus (COVID-19) pandemic outbreak. In Latin America, overall GDP loss amounted to 8.5 percent.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.
However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.
2 Data-set Introduction
2.1 Data Collection
We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:
The headline must have one or more words directly or indirectly related to COVID-19.
The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.
The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.
Avoid taking duplicate reports.
Maintain a time frame for the above mentioned newspapers.
To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.
2.2 Data Pre-processing and Statistics
Some pre-processing steps performed on the newspaper report dataset are as follows:
Remove hyperlinks.
Remove non-English alphanumeric characters.
Remove stop words.
Lemmatize text.
While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.
The primary data statistics of the two dataset are shown in Table 1 and 2.
Table 1: Covid-News-USA-NNK data statistics
No of words per headline
7 to 20
No of words per body content
150 to 2100
Table 2: Covid-News-BD-NNK data statistics No of words per headline
10 to 20
No of words per body content
100 to 1500
2.3 Dataset Repository
We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.
3 Literature Review
Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.
Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].
Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.
Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.
4 Our experiments and Result analysis
We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:
In February, both the news paper have talked about China and source of the outbreak.
StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.
Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.
Washington Post discussed global issues more than StarTribune.
StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.
While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.
We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases
where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Techsalerator's News Events Data for Slovenia: A Comprehensive Overview
Techsalerator's News Events Data for Slovenia provides a valuable resource for businesses, researchers, and media organizations. This dataset aggregates significant news events from a wide range of Slovenian media sources, including news outlets, online publications, and social platforms. It offers crucial insights for tracking trends, analyzing public sentiment, and monitoring industry-specific developments.
Key Data Fields - Event Date: Records the exact date of the news event, essential for tracking trends over time or for businesses adapting to market changes. - Event Title: A concise headline describing the event, allowing users to quickly categorize and assess news relevance. - Source: Indicates the news outlet or platform where the event was reported, helping users evaluate source credibility and event reach. - Location: Provides geographic details about where the event occurred within Slovenia, useful for regional analysis or targeted marketing. - Event Description: A detailed summary of the event, outlining key developments, participants, and potential impacts, aiding in understanding the context and significance.
Top 5 News Categories in Slovenia - Politics: Covers major government decisions, political movements, elections, and policy changes affecting the national landscape. - Economy: Focuses on Slovenia’s economic indicators, inflation rates, international trade, and business activities influencing the financial sector. - Social Issues: Includes news on protests, public health, education, and other societal concerns driving public discourse. - Sports: Highlights events in popular sports such as football, basketball, and skiing, drawing widespread attention and engagement. - Technology and Innovation: Reports on technological advancements, startups, and innovations within Slovenia’s tech ecosystem, featuring emerging companies and new developments.
Top 5 News Sources in Slovenia - RTV Slovenia: A leading public broadcasting service offering extensive coverage of politics, economy, and social issues. - Delo: A major newspaper known for its detailed reporting on national news, politics, and business. - 24ur: A popular news portal providing timely updates on a broad range of topics, including politics, economy, and sports. - Finance: A key source for business news, economic analysis, and financial developments in Slovenia. - Slovenia Times: An English-language news outlet focusing on significant events, business news, and cultural stories in Slovenia.
Accessing Techsalerator’s News Events Data for Slovenia To access Techsalerator’s News Events Data for Slovenia, please contact info@techsalerator.com with your specific requirements. We will provide a customized quote based on the data fields and records you need, with delivery available within 24 hours. Ongoing access options can also be arranged.
Included Data Fields - Event Date - Event Title - Source - Location - Event Description - Event Category (Politics, Economy, Sports, etc.) - Participants (if applicable) - Event Impact (Social, Economic, etc.)
Techsalerator’s dataset is an invaluable tool for tracking significant events in Slovenia, supporting informed decision-making, market analysis, and academic research, and offering a clear view of the country’s news landscape.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Status anxiety, the constant concern about individuals’ position on the social ladder, negatively affects social cohesion, health, and wellbeing (e.g., chronic stress). Given previous findings showing that status anxiety is associated with economic inequality, we aimed in this research to test this association experimentally. A cross-sectional study (Study 1) was run in order to discard confounding effects of the relationship between perceived economic inequality (PEI) and status anxiety, and to explore the mediating role of a competitive climate (N = 297). Then we predicted that people assigned to a condition of high inequality would perceive more status anxiety in their social context, and they would themselves report higher status anxiety. Thus, in an experimental study (Study 2) PEI was manipulated (N = 200). In Study 1, PEI uniquely predicted status anxiety, and perceived competitiveness mediated the relationship. In Study 2 PEI increased perceived contextual status anxiety, a specific form of perceived competitiveness based on socioeconomic status (SES). Moreover, preliminary evidence of an indirect effect was found from PEI to personal status anxiety, through (higher) perceived contextual status anxiety. These preliminary findings provide experimental evidence for the effects of economic inequality on status anxiety and the mechanism involved. Economic inequality makes people feel that they live in a society where they are constantly concerned and competing with each other for their SES. These results could have important implications as health and wellbeing could be promoted by reducing economic inequalities and the competitive and materialistic environments of our societies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Initial Jobless Claims in the United States decreased to 236 thousand in the week ending June 21 of 2025 from 246 thousand in the previous week. This dataset provides the latest reported value for - United States Initial Jobless Claims - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
How people feel about their neighbourhood across the UK. This dataset shows how people feel about their neighbourhood by looking at 5 measures of social capital and shows differences observed between regions,constituent countries and urban and rural areas by economic activity
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
****Dataset Overview**** This dataset contains historical macroeconomic data, featuring key economic indicators in the United States. It includes important metrics such as the Consumer Price Index (CPI), Retail Sales, Unemployment Rate, Industrial Production, Money Supply (M2), and more. The dataset spans from 1993 to the present and includes monthly data on various economic indicators, processed to show their rate of change (either percentage or absolute difference, depending on the indicator).
provenance
The data in this dataset is sourced from the Federal Reserve Economic Data (FRED) database, hosted by the Federal Reserve Bank of St. Louis. FRED provides access to a wide range of economic data, including key macroeconomic indicators for the United States. My work involved calculating the rate of change (ROC) for each indicator and reorganizing the data into a more usable format for analysis. For more information and access to the full database, visit FRED's website.
Purpose and Use for the Kaggle Community:
This dataset is a valuable resource for data scientists, economists, and analysts interested in understanding macroeconomic trends, performing time series analysis, or building predictive models. With the rate of change included, users can quickly assess the growth or contraction in these indicators month-over-month. This dataset can be used for:
****Column Descriptions****
Year: The year of the observation.
Month: The month of the observation (1-12).
Industrial Production: Monthly data on the total output of US factories, mines, and utilities.
Manufacturers' New Orders: Durable Goods: Measures the value of new orders placed with manufacturers for durable goods, indicating future production activity.
Consumer Price Index (CPIAUCSL): A measure of the average change over time in the prices paid by urban consumers for a market basket of consumer goods and services.
Unemployment Rate: The percentage of the total labor force that is unemployed but actively seeking employment.
Retail Sales: The total receipts of retail stores, indicating consumer spending and economic activity.
Producer Price Index: Measures the average change over time in the selling prices received by domestic producers for their output.
Personal Consumption Expenditures (PCE): A measure of the prices paid by consumers for goods and services, used in calculating inflation.
National Home Price Index: A measure of changes in residential real estate prices across the country.
All Employees, Total Nonfarm: The number of nonfarm payroll employees, an important indicator of the labor market.
Labor Force Participation Rate: The percentage of the working-age population that is either employed or actively looking for work.
Federal Funds Effective Rate: The interest rate at which depository institutions lend reserve balances to other depository institutions overnight.
Building Permits: The number of building permits issued for residential and non-residential buildings, a leading indicator of construction activity.
Money Supply (M2): The total money supply, including cash, checking deposits, and easily convertible near money.
Personal Income: The total income received by individuals from all sources, including wages, investments, and government transfers.
Trade Balance: The difference between a country's imports and exports, indicating the net trade flow.
Consumer Sentiment: The index reflecting consumer sentiment and expectations for the future economic outlook.
Consumer Confidence: A measure of how optimistic or pessimistic consumers are regarding their expected financial situation and the economy.
Notes on Interest Rates Please note that for the Federal Funds Effective Rate (FEDFUNDS), the dataset includes the absolute change in basis points (bps), not the rate of change. This means that the dataset reflects the direct change in the interest rate rather than the percentage change month-over-month. The change is represented in basis points, where 1 basis point equals 0.01%.