Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A daily data ranging from January 2014 until December 2018 is employed. The period between January, 1, 2014 until November 7, 2016 refers to the pre-election period. The period ranging from November 8, 2016, until December, 31 2018 defines the post-election period. Four U.S stock price indices are retrieved from DataStream: The standard and Poor’s 500 index (S&P 500) covers the performance of 500 largest capitalization stocks. The Dow Jones Industrial Average (DJIA) index tracks the prices of the top 30 US companies. The NASDAQ 100 measures the performance of the 100 largest non-financial stocks traded on NASDAQ. The Russell 2000 index covers the performance of 2.000 lowest capitalization stocks. A daily political risk index is calculated for each period using Google trends and the principal component analysis.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In this dataset you can find the Top 100 companies in the technology sector. You can also find 5 of the most important and used indices in the financial market as well as a list of all the companies in the S&P 500 index and in the technology sector.
The Global Industry Classification Standard also known as GICS is the primary financial industry standard for defining sector classifications. The Global Industry Classification Standard was developed by index providers MSCI and Standard and Poor’s. Its hierarchy begins with 11 sectors which can be further delineated to 24 industry groups, 69 industries, and 158 sub-industries.
You can read the definition of each sector here.
The 11 broad GICS sectors commonly used for sector breakdown reporting include the following: Energy, Materials, Industrials, Consumer Discretionary, Consumer Staples, Health Care, Financials, Information Technology, Telecommunication Services, Utilities and Real Estate.
In this case we will focuse in the Technology Sector. You can see all the sectors and industry groups here.
To determine which companies, correspond to the technology sector, we use Yahoo Finance, where we rank the companies according to their “Market Cap”. After having the list of the Top 100 best valued companies in the sector, we proceeded to download the historical data of each of the companies using the NASDAQ website.
Regarding to the indices, we searched various sources to find out which were the most used and determined that the 5 most frequently used indices are: Dow Jones Industrial Average (DJI), S&P 500 (SPX), NASDAQ Composite (IXIC), Wilshire 5000 Total Market Inde (W5000) and to specifically view the technology sector SPDR Select Sector Fund - Technology (XLK). Historical data for these indices was also obtained from the NASDQ website.
In total there are 107 files in csv format. They are composed as follows:
Every company and index file has the same structure with the same columns:
Date: It is the date on which the prices were recorded. High: Is the highest price at which a stock traded during the course of the trading day. Low: Is the lowest price at which a stock traded during the course of the trading day. Open: Is the price at which a stock started trading when the opening bell rang. Close: Is the last price at which a stock trades during a regular trading session. Volume: Is the number of shares that changed hands during a given day. Adj Close: The adjusted closing price factors in corporate actions, such as stock splits, dividends, and rights offerings.
The two other files have different columns names:
List of S&P 500 companies
Symbol: Ticker symbol of the company. Name: Name of the company. Sector: The sector to which the company belongs.
Technology Sector Companies List
Symbol: Ticker symbol of the company. Name: Name of the company. Price: Current price at which a stock can be purchased or sold. (11/24/20) Change: Net change is the difference between closing prices from one day to the next. % Change: Is the difference between closing prices from one day to the next in percentage. Volume: Is the number of shares that changed hands during a given day. Avg Vol: Is the daily average of the cumulative trading volume during the last three months. Market Cap (Billions): Is the total value of a company’s shares outstanding at a given moment in time. It is calculated by multiplying the number of shares outstanding by the price of a single share. PE Ratio: Is the ratio of a company's share (stock) price to the company's earnings per share. The ratio is used for valuing companies and to find out whether they are overvalued or undervalued.
SEC EDGAR | Company Filings NASDAQ | Historical Quotes Yahoo Finance | Technology Sector Wikipedia | List of S&P 500 companies S&P Dow Jones Indices | S&P 500 [S&P Dow Jones Indices | DJI](https://www.spglobal.com/spdji/en/i...
https://fred.stlouisfed.org/legal/#copyright-pre-approvalhttps://fred.stlouisfed.org/legal/#copyright-pre-approval
View data of the S&P 500, an index of the stocks of 500 leading companies in the US economy, which provides a gauge of the U.S. equity market.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Standard & Poor's 500 Index (hereafter S&P500), Nasdaq Composite Index, and Down Jones Industrial Average Index (hereafter DJIA). Among them, the S&P500 Index is the best representative index of the US stock market. The Shanghai Composite Index (hereafter SSEC) and Shenzhen Composite Index (hereafter SZSC)
https://www.icpsr.umich.edu/web/ICPSR/studies/3974/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/3974/terms
This research project explored when governments call elections and how the timing of elections influences the electoral result. In many parliamentary systems, the timing of the next election is at the discretion of the current government. Rather than waiting for the end of their term, leaders are free to call elections when it is advantageous to them and when they expect to win. This project was designed to use game theory to model how leaders decide whether to call elections based on their expectations about future performance. The data collected for this study reflect the timing of the British General Elections. In particular, this study addressed five research questions: (1) When are elections called? (2) What are the electoral implications of the timing of an election? (3) How are election timing and subsequent post-electoral economic performance related? (4) How does the election timing affect the length of the campaign? and (5) How does the London stock market respond to the announcement of elections? The data cover the time span from 1900 to 2001, although most of the files focus on the period from August 1, 1945, to June 13, 2001. Part 1 (Dates of Key Political Events Data) contains the dates of key political events, such as elections, first meetings of parliament, dissolutions, announcements of an election, by-elections, shifts in party allegiances, confidence votes, or changes in Prime Minister. Additional variables in Part 1 include whether there is a minority government or coalition government, percentage share of the vote by party type, number of seats by party type, and election turnout. Part 2 (By-Elections Data) includes the change in seats as a result of by-elections. Variables include the date of the by-election, electoral district, and change in seats by political parties. Part 3 (Change in Party Allegiance Data) contains information about the date of the allegiance shift, the electoral district, and defections to and from various political parties. Part 4 (Public Opinion Data) includes Gallup public opinion data on voting intentions, approval of government record, and approval of Prime Minister and opposition leader. Part 5 (Basic Economic Variables) contains basic economic data for the United Kingdom, such as various measures of gross domestic product and change in retail price index. Part 6 (Monthly Inflation Data) contains monthly inflation data as measured by the percentage change in retail price index. Part 7 (Unemployment Data) consists of monthly, quarterly, and yearly unemployment data. Part 8 (Stock Market Data) includes data on the United Kingdom market index, United States Dow Jones industrial average, Standard and Poors' composite index, the Financial Times 500 stock index, and Datastream's measure of British funds on the London Exchange. Part 9 (Financial Times 30 Share Index Data) contains the Financial Times 30 close and the volume of bargains. Lastly, Part 10 (Newspaper Stories Data) consists of counts of newspaper stories relating to the next general election.
Collected from Yahoo Finance, Investing.com and WJS, this dataset consists of the following indices ranging from July 17, 2017 to July 22, 2022:
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Context Predicting stock market movements is a classic challenge in machine learning. While raw Open, High, Low, Close, and Volume (OHLCV) data is the standard starting point, its predictive power is often limited. To build robust models, data scientists require a much richer feature set that captures different aspects of market dynamics, from technical patterns to sentiment hidden in financial news.
This dataset was created to bridge that gap. It provides a highly-enriched, pre-processed collection of features for the Dow Jones Industrial Average (DJIA), designed to accelerate research and modeling for stock price prediction.
Content The dataset is organized into several files, each representing a distinct category of engineered features. This modular structure allows you to easily select, combine, or test the importance of different feature types.
Description: Each day's top 25 news headlines have been transformed into a sophisticated knowledge graph. These graphs, enriched with data from Wikidata, are then encoded into 128-dimensional vectors using a Graph Convolutional Network (GCN). This file captures the semantic meaning and relationships within the news, providing a powerful non-price-based feature.
Description: Contains fundamental features derived directly from OHLCV data. These are crucial for capturing intraday volatility and price action.
Example Features: intraday_range, body_size, price_change, simple_return, log_return, price_volume_interaction.
Description: A wide array of popular technical indicators calculated using the pandas-ta library. These features are staples of financial analysis and help identify trends, momentum, and volatility.
Example Features: Simple Moving Averages (SMA_20, SMA_50, SMA_200), Exponential Moving Averages (EMA_12, EMA_26), MACD, RSI, Bollinger Bands (BBL, BBM, BBU), On-Balance Volume (OBV), and more.
Description: This file includes features based on the statistical properties of returns over an optimized rolling window, as well as cyclical time-based features. The optimal window was determined by finding the period with the highest correlation to future returns.
Example Features: rolling_mean, rolling_std (volatility), rolling_skew, rolling_kurt, day_of_week_sin, day_of_week_cos, is_month_end.
Description: More complex and transformational features designed to capture deeper market dynamics.
Example Features: Lagged returns and RSI, quantitative candlestick pattern features, wavelet transform coefficients (to decompose price signals into different frequencies), and the Hurst Exponent (to measure long-term memory in the time series).
Methodology The features were systematically generated using a series of Python scripts.
News Embeddings: Headlines were processed to extract named entities. These entities were used to build knowledge subgraphs from Wikidata. Finally, a Graph Convolutional Network (GCN) model encoded these graphs into dense vectors.
Tabular Features: All other features were generated from the raw DJIA price and volume data. The process involved several stages, from basic price calculations to advanced transformations. For features requiring a lookback period (e.g., rolling statistics, Hurst exponent), an optimal window length was programmatically determined to maximize its correlation with the target variable.
Acknowledgements The raw OHLCV and news data was originally sourced from: https://www.kaggle.com/datasets/aaron7sun/stocknews. We thank them for making the data available.
Inspiration This dataset is perfect for a variety of financial machine learning tasks:
Can you build a model to predict the next day's market direction (Up/Down)?
Which feature set is the most powerful? The technical indicators, the news embeddings, or a combination of all?
How do advanced features like the Hurst exponent or wavelet coefficients contribute to model performance?
Can you use these features to build a profitable trading strategy (backtesting required)?
This table contains 25 series, with data for years 1956 - present (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (Not all combinations are available): Geography (1 items: Canada ...), Toronto Stock Exchange Statistics (25 items: Standard and Poor's/Toronto Stock Exchange Composite Index; high; Standard and Poor's/Toronto Stock Exchange Composite Index; close; Toronto Stock Exchange; oil and gas; closing quotations; Standard and Poor's/Toronto Stock Exchange Composite Index; low ...).
In 2020, the average lifespan of a company on Standard and Poor's 500 Index was just over ** years, compared with ** years in 1965. There is a clear long-term trend of declining corporate longevity with regards to companies on the S&P 500 Index, with this expected to fall even further throughout the 2020s.
Enterprise value to earnings before interest, taxes, depreciation and amortization (EV/EBITDA) is a key measurement ratio used as a metric of valuing whether a company is under or overvalued as compared to a historical industry average. The S&P 500 (Standard & Poor’s) is an index of the 500 largest U.S. publicly traded companies by market capitalization. In 2023, the consumer staples sector displayed the highest EV/EBITDA multiple with *****.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A daily data ranging from January 2014 until December 2018 is employed. The period between January, 1, 2014 until November 7, 2016 refers to the pre-election period. The period ranging from November 8, 2016, until December, 31 2018 defines the post-election period. Four U.S stock price indices are retrieved from DataStream: The standard and Poor’s 500 index (S&P 500) covers the performance of 500 largest capitalization stocks. The Dow Jones Industrial Average (DJIA) index tracks the prices of the top 30 US companies. The NASDAQ 100 measures the performance of the 100 largest non-financial stocks traded on NASDAQ. The Russell 2000 index covers the performance of 2.000 lowest capitalization stocks. A daily political risk index is calculated for each period using Google trends and the principal component analysis.