Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides foundational factor and portfolio return data used in empirical finance and asset pricing research. It contains: - Fama–French 3-Factor and 5-Factor models - Size (ME), Book-to-Market (B/M), Operating Profitability (OP), and Investment (Inv) portfolios - Bivariate portfolios (e.g., 2x3 Size-B/M sorts) - Industry portfolio returns All data originate from the Kenneth R. French Data Library and are based on CRSP and Compustat databases. Data are value-weighted and expressed in percentages.
Some files in this dataset contain header comments describing data sources and methodology (as shown below):
This file was created using the 202508 CRSP database.
The 1-month TBill rate data until 202405 are from Ibbotson Associates.
Starting from 202406, the 1-month TBill rate is from ICE BofA US 1-Month Treasury Bill Index.
To correctly read such files in Python (pandas), use the comment parameter — it automatically ignores all lines starting with a specific symbol (e.g., none here, so you can skip manually):
import pandas as pd
# Detect the first numeric line to find where data starts
file_path = "F-F_Research_Data_5_Factors_2x3.csv"
with open(file_path) as f:
lines = f.readlines()
# Find where the header line (column names) appears
for i, line in enumerate(lines):
if "Mkt-RF" in line:
skip_rows = i
break
df = pd.read_csv(file_path, skiprows=skip_rows, sep=r"\s+")
print(df.head())
df = pd.read_csv("F-F_Research_Data_5_Factors_2x3.csv", skiprows=3, sep=r"\s+")
#):df = pd.read_csv("F-F_Research_Data_5_Factors_2x3.csv", comment="#", sep=",")
| Column | Description |
|---|---|
Mkt-RF | Market excess return |
SMB | Small minus Big (size factor) |
HML | High minus Low (book-to-market factor) |
RMW | Robust minus Weak (profitability factor) |
CMA | Conservative minus Aggressive (investment factor) |
RF | Risk-free rate (1-month Treasury Bill) |
Facebook
TwitterDatasets containing the Daily, Monthly and Annual SMB, HML and momentum factors for the UK market 1980OCT-2015JUN (daily from 1988OCT to 2015JUN) and datasets containing the Fama-french and momentum portfolios used to create the SMB, HML and UMD factors and other benchmark portfolios. For the benchmark portfolios, equal and value weighted returns data files are available and a file containing information on the number of portfolios per year and the cutoffs points used to create the portfolios is also included.
The twin aims of this research project are first to provide a more satisfactory model of the cost of capital and asset pricing in the UK and second to facilitate the creation and maintenance of high quality, survivorship bias free, standardised and regularly updated set of specific financial data for free use by academics, researchers and also potentially by regulatory bodies such as the Competition Commission, Office of Fair Trading (OFT), Water Services Regulation Authority (OFWAT), communications regulator(OFCOM) and other regulators. This will build on the Fama-French and Momentum portfolios and factors for the UK market established by Gregory, Tharyan and Huang (2009). Our first objective is to expand this dataset to include the full range of portfolios available for the US. Second, we will expand the available factor and portfolio data to encompass ongoing developments in literature relating to returns and asset pricing. Third, we will undertake a comprehensive range of asset pricing model tests to develop a more convincing model of the cross-section of UK stock returns. Lastly we will develop the UK literature on implied cost of capital (ICC). With reference to the latter, we will both test alternative models of Implied cost of capital and examine the role of implied, rather than realised, returns in asset pricing tests.
Facebook
TwitterThis dataset was created by Irwindeep
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a intermediate result for research with Professor Jay M. Chung, Soongsil Univ. This dataset consist of 6 columns: Mrt-Rf, HML, SMB, CMA, RMW, and RF.
RF is the investment return of 91 day monetary stabilization bond of South Korea, works as risk-free rate in South Korean stock market.
For calculating the other factors I use the same or similar method of Fama&French (2015). I will leave the link for the original paper. Link
This data could be use to evaluate the performance or style of South Korean stock portfolios using the linear regression model that have HML, SMB, CMA, and RMW, Mrt-RF as independent vairables and Monthly portfolio return - RF as dependent variables. I will make a introductory notebook both in English and Korean.
The dataset's caveat is as follows.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Using 5-minute high frequency data from the Chinese stock market, we employ a non-parametric method to estimate Fama-French portfolio realized jumps and investigate whether the estimated positive, negative and sign realized jumps could forecast or explain the cross-sectional stock returns. The Fama-MacBeth regression results show that not only have the realized jump components and the continuous volatility been compensated with risk premium, but also that the negative jump risk, the positive jump risk and the sign jump risk, to some extent, could explain the return of the stock portfolios. Therefore, we should pay high attention to the downside tail risk and the upside tail risk.
Facebook
TwitterS&P 500 and 20-(Fama and French)-portfolios performance depending on size and book-to-market.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistic for portfolio return.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics of the jump components for s1b1-s1b5.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Significant number of months with different regression equations.
Facebook
TwitterABSTRACT The purpose of this work is to present the Weighted Forward Search (FSW) method for the detection of outliers in asset pricing data. This new estimator, which is based on an algorithm that downweights the most anomalous observations of the dataset, is tested using both simulated and empirical asset pricing data. The impact of outliers on the estimation of asset pricing models is assessed under different scenarios, and the results are evaluated with associated statistical tests based on this new approach. Our proposal generates an alternative procedure for robust estimation of portfolio betas, allowing for the comparison between concurrent asset pricing models. The algorithm, which is both efficient and robust to outliers, is used to provide robust estimates of the models’ parameters in a comparison with traditional econometric estimation methods usually used in the literature. In particular, the precision of the alphas is highly increased when the Forward Search (FS) method is used. We use Monte Carlo simulations, and also the well-known dataset of equity factor returns provided by Prof. Kenneth French, consisting of the 25 Fama-French portfolios on the United States of America equity market using single and three-factor models, on monthly and annual basis. Our results indicate that the marginal rejection of the Fama-French three-factor model is influenced by the presence of outliers in the portfolios, when using monthly returns. In annual data, the use of robust methods increases the rejection level of null alphas in the Capital Asset Pricing Model (CAPM) and the Fama-French three-factor model, with more efficient estimates in the absence of outliers and consistent alphas when outliers are present.
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
We compile all return and macroeconomic data from Kenneth French's website and the OECD statistical data warehouse, respectively, for the period from January 1990 to December 2018. All return and macroeconomic data include the following countries: Austria, Belgium, Denmark, Finland, France, Germany, Greece, Ireland, Italy, Netherlands, Norway, Portugal, Spain, Sweden, Switzerland and United Kingdom.The dataset comprises the following series:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cross-sectional regression results.
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Using all stocks listed in the London Stock Exchange for the period from January 1989 to December 2018, the dataset comprises the following series:
We have produced these series using the following data from Thomson Reuters Datastream: (i) total return index (RI series), (ii) market value (MV series), (iii) market-to-book equity (PTBV series), (iv) total assets (WC02999 series), (v) return on equity (WC08301 series), (vi) tax rate (WC08346 series), (vii) primary SIC codes, (viii) turnover by volume (VO series), and (ix) the market price (P series). Following Griffin et al. (2010), we use the generic rules provided by the authors for excluding non-common equity securities from Datastream data.
REFERENCES: Amihud, Y. (2002). Illiquidity and stock returns: Cross-section and time-series effects. Journal of Financial Markets, 5, 31–56. Fama, E. F. and French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33, 3–56. Fama, E. F. and French, K. R. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116, 1–22. Griffin, J. M., Kelly, P., and Nardari, F. (2010). Do market efficiency measures yield correct inferences? A comparison of developed and emerging markets. Review of Financial Studies, 23, 3225–3277.
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Using all stocks listed in the Tokyo Stock Exchange and macroeconomic data for Japan, the dataset comprises the following series:
We have produced all return series using the following data from Datastream: (i) total return index (RI series), (ii) market value (MV series), (iii) market-to-book equity (PTBV series), (iv) total assets (WC02999 series), (v) return on equity (WC08301 series), (vi) price-to-cash flow ratio (PC series), and (vii) dividend yield (DY series). We have used the generic rules suggested by Griffin, Kelly, & Nardari (2010) for excluding non-common equity securities from Datastream data. We also exclude stocks with less than twelve observations in the period from July 1992 to June 2018. Accordingly, our sample comprises a total number of 5,312 stocks.
REFERENCES:
Fama, E. F. and French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33, 3–56. Fama, E. F. and French, K. R. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116, 1–22. Griffin, J. M., Kelly, P., and Nardari, F. (2010). Do market efficiency measures yield correct inferences? A comparison of developed and emerging markets. Review of Financial Studies, 23, 3225–3277. Hou K, Xue C, Zhang L. (2014). Digesting anomalies: An investment approach. Review of Financial Studies, 28, 650-705.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Regression results under different linear combination of the jump components.
Facebook
TwitterWe compile raw data from the Datastream database for all stocks traded on the Spanish equity market. Particularly, we compile the following data series: (i) total return index (RI series), (ii) market value (MV series), (iii) market-to-book equity (PTBV series), (iv) total assets (WC02999 series), (v) return on equity (WC08301 series), (vi) dividend yield (DY series), (vii) price-to-earnings ratio (PE series), and (viii) effective tax rate (WC08346 series). We use the filters suggested by Griffin, Kelly, and Nardari (2010) for the Datastream database to exclude assets other than ordinary shares from our sample. Hence, our sample comprises 443 companies, including all firms that started trading within the time interval under study, as well as those that were delisted. As a proxy for the risk-free rate, we use the three-month Treasury Bill rate for Spain, as provided by the OECD. Accordingly, the dataset comprises the following series:
REFERENCES:
Fama, E. F. and French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33, 3–56. Fama, E. F. and French, K. R. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116, 1–22. Griffin, J. M., Kelly, P., and Nardari, F. (2010). Do market efficiency measures yield correct inferences? A comparison of developed and emerging markets. Review of Financial Studies, 23, 3225–3277.
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Using all stocks listed in the Tokyo Stock Exchange and macroeconomic data for Japan, the dataset comprises the following series:
We have produced all return series using the following data from Datastream: (i) total return index (RI series), (ii) market value (MV series), (iii) market-to-book equity (PTBV series), (iv) total assets (WC02999 series), (v) return on equity (WC08301 series), (vi) price-to-earnings ratio (PE series), and (vii) industry (SECTOR series). We have used the generic rules suggested by Griffin, Kelly, & Nardari (2010) for excluding non-common equity securities from Datastream data. We also exclude stocks with less than twelve observations. Accordingly, our sample comprises a total number of 5,212 stocks.
REFERENCES:
Fama, E. F. and French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33, 3–56. Fama, E. F. and French, K. R. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116, 1–22. Griffin, J. M., Kelly, P., and Nardari, F. (2010). Do market efficiency measures yield correct inferences? A comparison of developed and emerging markets. Review of Financial Studies, 23, 3225–3277.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Regressions of megatrend portfolios on pure factor portfolios via OLS.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data contains information from ESG Corporate Ranking by RAEX from December 2020 to February 2022, its dynamics, RSPP indices "Vector of sustainable development" and "Responsibility and openness", data for the Fama-French model, market capitalization and assets of companies and daily stock closing prices.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Regressions of megatrend portfolios on pure factor portfolios via GMM-IVd.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides foundational factor and portfolio return data used in empirical finance and asset pricing research. It contains: - Fama–French 3-Factor and 5-Factor models - Size (ME), Book-to-Market (B/M), Operating Profitability (OP), and Investment (Inv) portfolios - Bivariate portfolios (e.g., 2x3 Size-B/M sorts) - Industry portfolio returns All data originate from the Kenneth R. French Data Library and are based on CRSP and Compustat databases. Data are value-weighted and expressed in percentages.
Some files in this dataset contain header comments describing data sources and methodology (as shown below):
This file was created using the 202508 CRSP database.
The 1-month TBill rate data until 202405 are from Ibbotson Associates.
Starting from 202406, the 1-month TBill rate is from ICE BofA US 1-Month Treasury Bill Index.
To correctly read such files in Python (pandas), use the comment parameter — it automatically ignores all lines starting with a specific symbol (e.g., none here, so you can skip manually):
import pandas as pd
# Detect the first numeric line to find where data starts
file_path = "F-F_Research_Data_5_Factors_2x3.csv"
with open(file_path) as f:
lines = f.readlines()
# Find where the header line (column names) appears
for i, line in enumerate(lines):
if "Mkt-RF" in line:
skip_rows = i
break
df = pd.read_csv(file_path, skiprows=skip_rows, sep=r"\s+")
print(df.head())
df = pd.read_csv("F-F_Research_Data_5_Factors_2x3.csv", skiprows=3, sep=r"\s+")
#):df = pd.read_csv("F-F_Research_Data_5_Factors_2x3.csv", comment="#", sep=",")
| Column | Description |
|---|---|
Mkt-RF | Market excess return |
SMB | Small minus Big (size factor) |
HML | High minus Low (book-to-market factor) |
RMW | Robust minus Weak (profitability factor) |
CMA | Conservative minus Aggressive (investment factor) |
RF | Risk-free rate (1-month Treasury Bill) |