A daily emerging stock market dataset (Chinese CSI 300 dataset) including 300 stocks and 5,088 time steps from the CSMAR database. We construct our stock dataset using a pool of stocks from the CSI 300 index for the last 21 years, from 01/02/2000 to 12/31/2020. Instead of all stocks in the market, we select the stocks that used to belong to the major market index CSI 300, and filter out stocks that have missing price data over the period.
For each trading day, we use the fundamental price features as the features of stocks, including open price, close price, and volume. Additionally, we normalize price features such as open price and close price with logarithm.
The dataset randomly splits stocks into five non-overlapping sub-datasets. For each subset, the first 90% of trading days are used as train data, the following 5% as validation data, and the rest 5% as test data.
Baidu Search Index is a big data analytics tool developed by Baidu to track changes in keyword search popularity within its search engine. By analyzing trends in the Baidu Search Index for specific keywords, users can effectively monitor public interest in topics, companies, or brands.
As an ecosystem partner of Baidu Index, Datago has direct access to keyword search index data from Baidu's database, leveraging this information to build the BSIA-Consumer. This database encompasses popular brands that are actively searched by Chinese consumers, along with their commonly used names. By tracking Baidu Index search trends for these keywords, Datago precisely maps them to their corresponding publicly listed stocks.
The database covers over 1,100 consumer stocks and 3,000+ brand keywords across China, the United States, Europe, and Japan, with a particular focus on popular sectors like luxury goods and vehicles. Through its analysis of Chinese consumer search interest, this database offers investors a unique perspective on market sentiment, consumer preferences, and brand influence, including:
Brand Influence Tracking – By leveraging Baidu Search Index data, investors can assess the level of consumer interest in various brands, helping to evaluate their influence and trends within the Chinese market.
Consumer Stock Mapping – BSIA-consumer provides an accurate linkage between brand keywords and their associated consumer stocks, enabling investor analysis driven by consumer interest.
Coverage of Popular Consumer Goods – BSIA-consumer focuses specifically on trending sectors like luxury goods and vehicles, offering valuable insights into these industries.
Coverage: 1000+ consumer stocks
History: 2016-01-01
Update Frequency: Daily
Not seeing a result you expected?
Learn how you can add new datasets to our index.
A daily emerging stock market dataset (Chinese CSI 300 dataset) including 300 stocks and 5,088 time steps from the CSMAR database. We construct our stock dataset using a pool of stocks from the CSI 300 index for the last 21 years, from 01/02/2000 to 12/31/2020. Instead of all stocks in the market, we select the stocks that used to belong to the major market index CSI 300, and filter out stocks that have missing price data over the period.
For each trading day, we use the fundamental price features as the features of stocks, including open price, close price, and volume. Additionally, we normalize price features such as open price and close price with logarithm.
The dataset randomly splits stocks into five non-overlapping sub-datasets. For each subset, the first 90% of trading days are used as train data, the following 5% as validation data, and the rest 5% as test data.