Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains news headlines relevant to key forex pairs: AUDUSD, EURCHF, EURUSD, GBPUSD, and USDJPY. The data was extracted from reputable platforms Forex Live and FXstreet over a period of 86 days, from January to May 2023. The dataset comprises 2,291 unique news headlines. Each headline includes an associated forex pair, timestamp, source, author, URL, and the corresponding article text. Data was collected using web scraping techniques executed via a custom service on a virtual machine. This service periodically retrieves the latest news for a specified forex pair (ticker) from each platform, parsing all available information. The collected data is then processed to extract details such as the article's timestamp, author, and URL. The URL is further used to retrieve the full text of each article. This data acquisition process repeats approximately every 15 minutes.
To ensure the reliability of the dataset, we manually annotated each headline for sentiment. Instead of solely focusing on the textual content, we ascertained sentiment based on the potential short-term impact of the headline on its corresponding forex pair. This method recognizes the currency market's acute sensitivity to economic news, which significantly influences many trading strategies. As such, this dataset could serve as an invaluable resource for fine-tuning sentiment analysis models in the financial realm.
We used three categories for annotation: 'positive', 'negative', and 'neutral', which correspond to bullish, bearish, and hold sentiments, respectively, for the forex pair linked to each headline. The following Table provides examples of annotated headlines along with brief explanations of the assigned sentiment.
Examples of Annotated Headlines
Forex Pair
Headline
Sentiment
Explanation
GBPUSD
Diminishing bets for a move to 12400
Neutral
Lack of strong sentiment in either direction
GBPUSD
No reasons to dislike Cable in the very near term as long as the Dollar momentum remains soft
Positive
Positive sentiment towards GBPUSD (Cable) in the near term
GBPUSD
When are the UK jobs and how could they affect GBPUSD
Neutral
Poses a question and does not express a clear sentiment
JPYUSD
Appropriate to continue monetary easing to achieve 2% inflation target with wage growth
Positive
Monetary easing from Bank of Japan (BoJ) could lead to a weaker JPY in the short term due to increased money supply
USDJPY
Dollar rebounds despite US data. Yen gains amid lower yields
Neutral
Since both the USD and JPY are gaining, the effects on the USDJPY forex pair might offset each other
USDJPY
USDJPY to reach 124 by Q4 as the likelihood of a BoJ policy shift should accelerate Yen gains
Negative
USDJPY is expected to reach a lower value, with the USD losing value against the JPY
AUDUSD
<p>RBA Governor Lowe’s Testimony High inflation is damaging and corrosive </p>
Positive
Reserve Bank of Australia (RBA) expresses concerns about inflation. Typically, central banks combat high inflation with higher interest rates, which could strengthen AUD.
Moreover, the dataset includes two columns with the predicted sentiment class and score as predicted by the FinBERT model. Specifically, the FinBERT model outputs a set of probabilities for each sentiment class (positive, negative, and neutral), representing the model's confidence in associating the input headline with each sentiment category. These probabilities are used to determine the predicted class and a sentiment score for each headline. The sentiment score is computed by subtracting the negative class probability from the positive one.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The benchmark interest rate in the United States was last recorded at 4.50 percent. This dataset provides the latest reported value for - United States Fed Funds Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
(see https://tblock.github.io/10kGNAD/ for the original dataset page)
This page introduces the 10k German News Articles Dataset (10kGNAD) german topic classification dataset. The 10kGNAD is based on the One Million Posts Corpus and avalaible under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You can download the dataset here.
English text classification datasets are common. Examples are the big AG News, the class-rich 20 Newsgroups and the large-scale DBpedia ontology datasets for topic classification and for example the commonly used IMDb and Yelp datasets for sentiment analysis. Non-english datasets, especially German datasets, are less common. There is a collection of sentiment analysis datasets assembled by the Interest Group on German Sentiment Analysis. However, to my knowlege, no german topic classification dataset is avaliable to the public.
Due to grammatical differences between the English and the German language, a classifyer might be effective on a English dataset, but not as effectiv on a German dataset. The German language has a higher inflection and long compound words are quite common compared to the English language. One would need to evaluate a classifyer on multiple German datasets to get a sense of it's effectivness.
The 10kGNAD dataset is intended to solve part of this problem as the first german topic classification dataset. It consists of 10273 german language news articles from an austrian online newspaper categorized into nine topics. These articles are a till now unused part of the One Million Posts Corpus.
In the One Million Posts Corpus each article has a topic path. For example Newsroom/Wirtschaft/Wirtschaftpolitik/Finanzmaerkte/Griechenlandkrise
.
The 10kGNAD uses the second part of the topic path, here Wirtschaft
, as class label.
In result the dataset can be used for multi-class classification.
I created and used this dataset in my thesis to train and evaluate four text classifyers on the German language. By publishing the dataset I hope to support the advancement of tools and models for the German language. Additionally this dataset can be used as a benchmark dataset for german topic classification.
As in most real-world datasets the class distribution of the 10kGNAD is not balanced. The biggest class Web consists of 1678, while the smalles class Kultur contains only 539 articles. However articles from the Web class have on average the fewest words, while artilces from the culture class have the second most words.
I propose a stratifyed split of 10% for testing and the remaining articles for training.
To use the dataset as a benchmark dataset, please used the train.csv
and test.csv
files located in the project root.
Python scripts to extract the articles and split them into a train- and a testset avaliable in the code directory of this project.
Make sure to install the requirements.
The original corpus.sqlite3
is required to extract the articles (download here (compressed) or here (uncompressed)).
This dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please consider citing the authors of the One Million Post Corpus if you use the dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Newport News by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Newport News. The dataset can be utilized to understand the population distribution of Newport News by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Newport News. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Newport News.
Key observations
Largest age group (population): Male # 20-24 years (8,018) | Female # 30-34 years (7,684). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Newport News Population by Gender. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the data for the Newport News, VA population pyramid, which represents the Newport News population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Newport News Population by Age. You can refer the same here
By Education [source]
Welcome to the U.S. News & World Report's 2017 National Universities Rankings, a comprehensive dataset of over 1,800 schools across the United States providing quality data on admissions criteria, cost of tuition and fees, enrollment numbers, and overall rankings! Here you'll find up-to-date information on institutes of higher learning from Princeton University at the top spot in Best National Universities to Williams College at No. 1 on the Best National Liberal Arts Colleges list.
This collection of data is all that's needed for potential students - parents, counselors and more - to evaluate their choices in selecting a college or university that perfectly meets their needs. For instance: what is the total tuition & fees cost? What are student enrollment numbers? How have students rated this school? Which universities have been recognized as top institutions in academics by U.S. News & World Report? What admissions criteria do these schools evaluate when considering an applicant's profile? The answers lie within this dataset!
Explore each category separately as well as with other considerations through visuals like our scatter plot to get an inside look into collegiate education from enrollment patterns charted against yearly expenses including room & board charges without forgetting several crucial factors such as six-year graduation rates and freshman retention rates measured among nations' universities included here -allowing for comparison and assessment beforehand for a well-rounded experience such that you can find your own path ahead!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains information on the quality, tuition, and enrollment data of 1,800 U.S.-based universities ranked by U.S. News & World Report from 2017. It includes rankings from the National University and Liberal Arts College lists in addition to relevant data points like tuition fees and undergraduate enrollments for each school.
Users can take advantage of this dataset to build models that predict ranking or predicting cost-benefit results for students by using cost-related (tuition) metrics along with quality metrics (rankings). Alternatively users can use it to analyze trends between investments in higher education versus outcomes (ranking), or explore the relationship between enrollments for schools of varying rank tiers, etc...
For more information on how rankings are calculated please refer to this methodology explainer on U.S news website
Here is an overview of all columns included in this dataset:
Columns:Name - institution name,Location - City and state where located,Rank - Ranking according to U.S News & World Report ,Description - Snippet of text overview from U.S News ,Tuition and fees – Combined tuition and fees for out–of–state students ,In–state – Tuition and fees for in–state students ,Undergraduate Enrollment – Number of enrolled undergraduate students .
Using this column detail as a guide we can answer questions like ‘which colleges give highest ROI ?’ or ‘Which college has highest number undergraduates?’ . For statistical analysis such as correlation we may use a visual representation such as a scatter plots or bar graphs accordingly making it easier analyses trends found within our dataset ans well as exploring any relationships between different factors such us tuitions vs ranks
- Developing a searchable database to help high school students identify colleges that match their criteria in terms of tuition, graduation rate, location, and rank.
- Identifying correlations between enrollment numbers and university rank in order to better understand how the number of enrolled students effects the overall ranking of a university.
- Comparing universities with similar rankings in order to highlight differences between programs’ tuition and fees as well as retention rates
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Household Saving Rate in the United States remained unchanged at 4.50 percent in June from 4.50 percent in May of 2025. This dataset provides - United States Personal Savings Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fixed 30-year mortgage rates in the United States averaged 6.67 percent in the week ending August 8 of 2025. This dataset provides the latest reported value for - United States MBA 30-Yr Mortgage Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Hereby I am sharing the data used in the paper: "The words have power: the impact of news on exchange rates". The dataset includes: Taylor Rule Fundamentals: - inflation, - industrial production index (as a high-frequency proxy of GDP), - money market rate from 2000 until 2018. Textual information: - Entropies of news items about the U.S. Dollar from Nexis-Uni database. This is how we get the textual data from Nexis-Uni database: We enter “U.S. Dollar” as a keyword in searching for the news, which gives over 15 Million non-duplicate news. Next, we clean data news and select the relevant news items as follows. We select news about U.S. Dollar with the following criteria: (i) the U.S. Dollar appears in the title of news items, (ii) U.S. Dollar is repeated several times in the news, (iii) the first paragraph of news contains the word “U.S. Dollar”, (iv) U.S. Dollar is the subject of news items which are automatically selected by Nexis-Uni database. - economic policy uncertainty index from https://www.policyuncertainty.com/index.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundPrior research suggests that United States governmental sources documenting the number of law-enforcement-related deaths (i.e., fatalities due to injuries inflicted by law enforcement officers) undercount these incidents. The National Vital Statistics System (NVSS), administered by the federal government and based on state death certificate data, identifies such deaths by assigning them diagnostic codes corresponding to “legal intervention” in accordance with the International Classification of Diseases–10th Revision (ICD-10). Newer, nongovernmental databases track law-enforcement-related deaths by compiling news media reports and provide an opportunity to assess the magnitude and determinants of suspected NVSS underreporting. Our a priori hypotheses were that underreporting by the NVSS would exceed that by the news media sources, and that underreporting rates would be higher for decedents of color versus white, decedents in lower versus higher income counties, decedents killed by non-firearm (e.g., Taser) versus firearm mechanisms, and deaths recorded by a medical examiner versus coroner.Methods and findingsWe created a new US-wide dataset by matching cases reported in a nongovernmental, news-media-based dataset produced by the newspaper The Guardian, The Counted, to identifiable NVSS mortality records for 2015. We conducted 2 main analyses for this cross-sectional study: (1) an estimate of the total number of deaths and the proportion unreported by each source using capture–recapture analysis and (2) an assessment of correlates of underreporting of law-enforcement-related deaths (demographic characteristics of the decedent, mechanism of death, death investigator type [medical examiner versus coroner], county median income, and county urbanicity) in the NVSS using multilevel logistic regression. We estimated that the total number of law-enforcement-related deaths in 2015 was 1,166 (95% CI: 1,153, 1,184). There were 599 deaths reported in The Counted only, 36 reported in the NVSS only, 487 reported in both lists, and an estimated 44 (95% CI: 31, 62) not reported in either source. The NVSS documented 44.9% (95% CI: 44.2%, 45.4%) of the total number of deaths, and The Counted documented 93.1% (95% CI: 91.7%, 94.2%). In a multivariable mixed-effects logistic model that controlled for all individual- and county-level covariates, decedents injured by non-firearm mechanisms had higher odds of underreporting in the NVSS than those injured by firearms (odds ratio [OR]: 68.2; 95% CI: 15.7, 297.5; p < 0.01), and underreporting was also more likely outside of the highest-income-quintile counties (OR for the lowest versus highest income quintile: 10.1; 95% CI: 2.4, 42.8; p < 0.01). There was no statistically significant difference in the odds of underreporting in the NVSS for deaths certified by coroners compared to medical examiners, and the odds of underreporting did not vary by race/ethnicity. One limitation of our analyses is that we were unable to examine the characteristics of cases that were unreported in The Counted.ConclusionsThe media-based source, The Counted, reported a considerably higher proportion of law-enforcement-related deaths than the NVSS, which failed to report a majority of these incidents. For the NVSS, rates of underreporting were higher in lower income counties and for decedents killed by non-firearm mechanisms. There was no evidence suggesting that underreporting varied by death investigator type (medical examiner versus coroner) or race/ethnicity.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Unemployment Rate in the United States increased to 4.20 percent in July from 4.10 percent in June of 2025. This dataset provides the latest reported value for - United States Unemployment Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Text classification problems are quite successfully solved by current machine learning techniques. Text content such as consumer reviews, email content etc. can be classified as favorable/unfavorable, spam/not-spam, etc. with a high success rate. News content too is known to affect human sentiment leading to sharp, short term price movements in stocks that follows a positive/negative news. The attached sample dataset may be used to train a machine learning model to classify news text and predict its influence on stock price, and subsequently to deduce buy/sell recommendations. A predicted downward price movement may also help institutions engaged in lombard lending (securities lending) employ proactive risk mitigation. The dataset contains news articles and the empirical stock price movements following the news publication date. To attribute the stock price move to a specific news incident alone is difficult, as there are several factors influencing the stock price. However, we have selected stocks and incident dates, where the stock has significantly outperformed or underperformed its industry peers. Thus, the effects of broader market and industry factors can be assumed to have less significance, because such factors would cause all industry peers to rise/fall in tandem, if at all any cause-effect relationship exists. In other words, if the company's stock price showed a statistically significant up/downward change relative to its industry peers in the reference time period, only then such data points are taken in consideration. Secondly, earnings related news content (fundamental factor in attractiveness of a stock) is omitted from consideration, to keep the analysis limited in scope to incident news alone. Reference time period for evaluating the under/out performance is kept to a maximum of 10 days, to only capture "short-term" price movements. This helps omit the scenarios where stock price was affected by business operational realities of the company e.g. actual (not reported) success/failure of its product/service, as such events are relatively long term. In short, due care (feature engineering) has been employed to curate this dataset to serve its intended application. Please note that this is only a sample dataset of roughly 100 records. Full dataset can be requested for non commercial use. Please contact me via this platform or via Linkedin.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the data for the Newport News, VA population pyramid, which represents the Newport News population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Newport News Population by Age. You can refer the same here
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
I am hereby sharing the dataset used in the paper titled 'Beyond Tradition: A Hybrid Model Unveiling News Impact on Exchange Rates'. The dataset comprises the following components: Taylor Rule Fundamentals: - Inflation - Industrial production index (as a high-frequency proxy of GDP) - Money market rate spanning from 2000 to 2018. Textual Information: - Economic Policy Uncertainty Index from https://www.policyuncertainty.com/index.html (as of November 9, 2023). - Time series of entropies calculated for U.S. Dollar-related news topics extracted from the Nexis-Uni database. Note: To acquire the textual data from the Nexis-Uni database, we conducted the following steps: We entered "U.S. Dollar" as a keyword in the search for news, resulting in over 15 million non-duplicate news items. Subsequently, we cleaned the news data and selected relevant news items using the following criteria: (i) The U.S. Dollar appears in the title of news items, (ii) The term "U.S. Dollar" is repeated several times in the news, (iii) The first paragraph of the news contains the word "U.S. Dollar", (iv) The news items are automatically selected by the Nexis-Uni database with the U.S. Dollar as the subject. Subsequently, we identified the topics related to the US Dollar from the news using LDA and calculated the Shannon entropies over time for each topic.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Inflation Rate in the United States remained unchanged at 2.70 percent in July. This dataset provides - United States Inflation Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data DescriptionThe DIPSER dataset is designed to assess student attention and emotion in in-person classroom settings, consisting of RGB camera data, smartwatch sensor data, and labeled attention and emotion metrics. It includes multiple camera angles per student to capture posture and facial expressions, complemented by smartwatch data for inertial and biometric metrics. Attention and emotion labels are derived from self-reports and expert evaluations. The dataset includes diverse demographic groups, with data collected in real-world classroom environments, facilitating the training of machine learning models for predicting attention and correlating it with emotional states.Data Collection and Generation ProceduresThe dataset was collected in a natural classroom environment at the University of Alicante, Spain. The recording setup consisted of six general cameras positioned to capture the overall classroom context and individual cameras placed at each student’s desk. Additionally, smartwatches were used to collect biometric data, such as heart rate, accelerometer, and gyroscope readings.Experimental SessionsNine distinct educational activities were designed to ensure a comprehensive range of engagement scenarios:News Reading – Students read projected or device-displayed news.Brainstorming Session – Idea generation for problem-solving.Lecture – Passive listening to an instructor-led session.Information Organization – Synthesizing information from different sources.Lecture Test – Assessment of lecture content via mobile devices.Individual Presentations – Students present their projects.Knowledge Test – Conducted using Kahoot.Robotics Experimentation – Hands-on session with robotics.MTINY Activity Design – Development of educational activities with computational thinking.Technical SpecificationsRGB Cameras: Individual cameras recorded at 640×480 pixels, while context cameras captured at 1280×720 pixels.Frame Rate: 9-10 FPS depending on the setup.Smartwatch Sensors: Collected heart rate, accelerometer, gyroscope, rotation vector, and light sensor data at a frequency of 1–100 Hz.Data Organization and FormatsThe dataset follows a structured directory format:/groupX/experimentY/subjectZ.zip Each subject-specific folder contains:images/ (individual facial images)watch_sensors/ (sensor readings in JSON format)labels/ (engagement & emotion annotations)metadata/ (subject demographics & session details)Annotations and LabelingEach data entry includes engagement levels (1-5) and emotional states (9 categories) based on both self-reported labels and evaluations by four independent experts. A custom annotation tool was developed to ensure consistency across evaluations.Missing Data and Data QualitySynchronization: A centralized server ensured time alignment across devices. Brightness changes were used to verify synchronization.Completeness: No major missing data, except for occasional random frame drops due to embedded device performance.Data Consistency: Uniform collection methodology across sessions, ensuring high reliability.Data Processing MethodsTo enhance usability, the dataset includes preprocessed bounding boxes for face, body, and hands, along with gaze estimation and head pose annotations. These were generated using YOLO, MediaPipe, and DeepFace.File Formats and AccessibilityImages: Stored in standard JPEG format.Sensor Data: Provided as structured JSON files.Labels: Available as CSV files with timestamps.The dataset is publicly available under the CC-BY license and can be accessed along with the necessary processing scripts via the DIPSER GitHub repository.Potential Errors and LimitationsDue to camera angles, some student movements may be out of frame in collaborative sessions.Lighting conditions vary slightly across experiments.Sensor latency variations are minimal but exist due to embedded device constraints.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025dipserdatasetinpersonstudent1, title={DIPSER: A Dataset for In-Person Student Engagement Recognition in the Wild}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Carolina Lorenzo Álvarez and Jorge Fernandez-Herrero and Diego Viejo and Rosabel Roig-Vila and Miguel Cazorla}, year={2025}, eprint={2502.20209}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.20209}, } Usage and ReproducibilityResearchers can utilize standard tools like OpenCV, TensorFlow, and PyTorch for analysis. The dataset supports research in machine learning, affective computing, and education analytics, offering a unique resource for engagement and attention studies in real-world classroom environments.
The "Bangladesh Rape Cases Data" dataset contains detailed information on rape cases reported in various districts of Bangladesh. This dataset is valuable for analyzing trends, patterns, and regional distributions of reported rape cases over a decade. It can be utilized by researchers, policymakers, and social scientists to study and address the issue of rape in Bangladesh. Total Sample Size: This dataset comprises a total of 2,813 rows, each representing an individual case reported in various districts across Bangladesh. Data Description: headline: Type: String Description: The headline of the news article reporting the rape case. It provides a brief summary of the incident. district-tag: Type: String Description: The district where the incident occurred. This helps in identifying the geographical distribution of the cases. division-tag: Type: String Description: The division of Bangladesh to which the district belongs. This is useful for broader regional analysis. subdistrict-tag: Type: String Description: The specific subdistrict or locality within the district where the incident occurred. This column may contain missing values if the subdistrict is not specified. id: Type: String (UUID format) Description: A unique identifier for each news article, ensuring that each entry can be distinctly referenced. url: Type: String Description: The web link to the original news article, allowing users to access the full report for more detailed information. last-published-at: Type: DateTime Description: The date and time when the news article was last published, helping to understand the timeline of the reported cases. offset: Type: Integer Description: An offset value for the article, potentially indicating its position in a larger dataset or the order of processing. content: Type: String Description: The main content of the news article, providing detailed information about the incident. Temporal Coverage: Minimum Date: February 22, 2013 Maximum Date: April 10, 2023 The dataset spans over a decade, allowing for a comprehensive temporal analysis of the reported cases. Potential Uses: Trend Analysis: Analyze how the frequency of reported cases changes over time. Geographical Analysis: Identify regions with higher or lower reporting rates. Content Analysis: Examine the language and details provided in the headlines and content to understand the nature of reporting. Correlation Studies: Investigate possible correlations between reported cases and other socio-economic factors. Data Quality and Considerations: Missing Values: Some columns, such as subdistrict-tag, may contain missing values where specific information was not provided. Data Source: The data is sourced from news articles, so it may be influenced by reporting biases and the availability of news coverage.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The benchmark interest rate in Japan was last recorded at 0.50 percent. This dataset provides - Japan Interest Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The benchmark interest rate In the Euro Area was last recorded at 2.15 percent. This dataset provides - Euro Area Interest Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains news headlines relevant to key forex pairs: AUDUSD, EURCHF, EURUSD, GBPUSD, and USDJPY. The data was extracted from reputable platforms Forex Live and FXstreet over a period of 86 days, from January to May 2023. The dataset comprises 2,291 unique news headlines. Each headline includes an associated forex pair, timestamp, source, author, URL, and the corresponding article text. Data was collected using web scraping techniques executed via a custom service on a virtual machine. This service periodically retrieves the latest news for a specified forex pair (ticker) from each platform, parsing all available information. The collected data is then processed to extract details such as the article's timestamp, author, and URL. The URL is further used to retrieve the full text of each article. This data acquisition process repeats approximately every 15 minutes.
To ensure the reliability of the dataset, we manually annotated each headline for sentiment. Instead of solely focusing on the textual content, we ascertained sentiment based on the potential short-term impact of the headline on its corresponding forex pair. This method recognizes the currency market's acute sensitivity to economic news, which significantly influences many trading strategies. As such, this dataset could serve as an invaluable resource for fine-tuning sentiment analysis models in the financial realm.
We used three categories for annotation: 'positive', 'negative', and 'neutral', which correspond to bullish, bearish, and hold sentiments, respectively, for the forex pair linked to each headline. The following Table provides examples of annotated headlines along with brief explanations of the assigned sentiment.
Examples of Annotated Headlines
Forex Pair
Headline
Sentiment
Explanation
GBPUSD
Diminishing bets for a move to 12400
Neutral
Lack of strong sentiment in either direction
GBPUSD
No reasons to dislike Cable in the very near term as long as the Dollar momentum remains soft
Positive
Positive sentiment towards GBPUSD (Cable) in the near term
GBPUSD
When are the UK jobs and how could they affect GBPUSD
Neutral
Poses a question and does not express a clear sentiment
JPYUSD
Appropriate to continue monetary easing to achieve 2% inflation target with wage growth
Positive
Monetary easing from Bank of Japan (BoJ) could lead to a weaker JPY in the short term due to increased money supply
USDJPY
Dollar rebounds despite US data. Yen gains amid lower yields
Neutral
Since both the USD and JPY are gaining, the effects on the USDJPY forex pair might offset each other
USDJPY
USDJPY to reach 124 by Q4 as the likelihood of a BoJ policy shift should accelerate Yen gains
Negative
USDJPY is expected to reach a lower value, with the USD losing value against the JPY
AUDUSD
<p>RBA Governor Lowe’s Testimony High inflation is damaging and corrosive </p>
Positive
Reserve Bank of Australia (RBA) expresses concerns about inflation. Typically, central banks combat high inflation with higher interest rates, which could strengthen AUD.
Moreover, the dataset includes two columns with the predicted sentiment class and score as predicted by the FinBERT model. Specifically, the FinBERT model outputs a set of probabilities for each sentiment class (positive, negative, and neutral), representing the model's confidence in associating the input headline with each sentiment category. These probabilities are used to determine the predicted class and a sentiment score for each headline. The sentiment score is computed by subtracting the negative class probability from the positive one.