47 datasets found
  1. P

    Weibo NER Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated May 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nanyun Peng; Mark Dredze (2021). Weibo NER Dataset [Dataset]. https://paperswithcode.com/dataset/weibo-ner
    Explore at:
    Dataset updated
    May 14, 2021
    Authors
    Nanyun Peng; Mark Dredze
    Description

    The Weibo NER dataset is a Chinese Named Entity Recognition dataset drawn from the social media website Sina Weibo.

  2. Number of social media users in China 2018-2027

    • statista.com
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of social media users in China 2018-2027 [Dataset]. https://www.statista.com/statistics/277586/number-of-social-network-users-in-china/
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    China
    Description

    In 2022, there were around **** billion social media users in China. Despite Facebook, YouTube, and Twitter being blocked in the country, local social networking sites such as WeChat and Weibo have been attracting millions of users, making China the world’s biggest social media market.

    What is the role of social media in China? Around ** percent of the Chinese population use internet. Social networking plays a huge role among netizens, especially the younger generation. Chinese social media, just like Western equivalents, not only serves as a way to communicate online, but also as one of the main sources of news and entertainment, e-payments, shopping advisors, and dating channels. In 2021, over ** percent of surveyed social media users said they mostly appreciated that social networks help them to keep in touch with friends and family, but also share their life moments and thoughts.

    What are the most popular social media platforms?

    WeChat (Weixin in Chinese) is by far the most commonly seen social app in the country, used for anything from texting/calling to photo and video sharing, dating, financial services, game-playing, shopping, ride hailing, and so on. However, Chinese social media scene is quite diverse and dynamic, therefore, it is not just about WeChat. Instant messaging app Tencent QQ, microblogging site Weibo, video sharing app Youku Tudou, short-form video app Douyin (aka TikTok), photo editing and sharing app Meitu, restaurant recommendation and food ordering platform Meituan, Quora equivalent Zhihu, and dating app Momo are just a few among the most popular Chinese social media examples.

  3. H

    Replication Data for: Capturing Clicks: How the Chinese Government Uses...

    • dataverse.harvard.edu
    Updated Apr 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer Pan (2020). Replication Data for: Capturing Clicks: How the Chinese Government Uses Clickbait to Compete for Visibility [Dataset]. http://doi.org/10.7910/DVN/TALJOT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 18, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Jennifer Pan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    China
    Description

    The proliferation of social media and digital technologies has made it necessary for governments to expand their focus beyond propaganda content in order to disseminate propaganda effectively. We identify a strategy of using clickbait to increase the visibility of political propaganda. We show that such a strategy is used across China by combining ethnography with a computational analysis of a novel dataset of the titles of 197,303 propaganda posts made by 213 Chinese city-level governments on WeChat. We find that Chinese propagandists face intense pressures to demonstrate their effectiveness on social media because their work is heavily quantified---measured, analyzed, and ranked---with metrics such as views and likes. Propagandists use both clickbait and non-propaganda content (e.g., lifestyle tips) to capture clicks, but rely more heavily on clickbait because it does not decrease space available for political propaganda. Government propagandists use clickbait at a rate commensurate with commercial and celebrity social media accounts. Clickbait is associated with more views and likes, and greater reach of government propaganda outlets and messages. These results reveal how the advertising-based business model and affordances of social media influence political propaganda and how government strategies to control information are moving beyond censorship, propaganda, and disinformation.

  4. Hong Kong SAR, China Internet Usage: Social Media Market Share: All...

    • ceicdata.com
    Updated May 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2024). Hong Kong SAR, China Internet Usage: Social Media Market Share: All Platforms: Mixi [Dataset]. https://www.ceicdata.com/en/hong-kong/internet-usage-social-media-market-share
    Explore at:
    Dataset updated
    May 25, 2024
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 18, 2024 - May 25, 2024
    Area covered
    Hong Kong
    Description

    Internet Usage: Social Media Market Share: All Platforms: Mixi data was reported at 0.000 % in 25 May 2024. This stayed constant from the previous number of 0.000 % for 24 May 2024. Internet Usage: Social Media Market Share: All Platforms: Mixi data is updated daily, averaging 0.000 % from May 2024 (Median) to 25 May 2024, with 8 observations. The data reached an all-time high of 0.060 % in 22 May 2024 and a record low of 0.000 % in 25 May 2024. Internet Usage: Social Media Market Share: All Platforms: Mixi data remains active status in CEIC and is reported by Statcounter Global Stats. The data is categorized under Global Database’s Hong Kong SAR (China) – Table HK.SC.IU: Internet Usage: Social Media Market Share.

  5. h

    Data in the paper titled "#WuhanDiary and #WuhanLockdown: gendered posting...

    • datahub.hku.hk
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    King Wa Fu; Sara Davies; Karen Ann Grépin; Clare Wenham; Connie Gan; Feng Shuo (2023). Data in the paper titled "#WuhanDiary and #WuhanLockdown: gendered posting patterns and behaviours on Weibo during the COVID-19 pandemic" [Dataset]. http://doi.org/10.25442/hku.19487396.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    HKU Data Repository
    Authors
    King Wa Fu; Sara Davies; Karen Ann Grépin; Clare Wenham; Connie Gan; Feng Shuo
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Social media can be both a source of information and misinformation during health emergencies. During the COVID-19 pandemic, social media became a ubiquitous tool for people to communicate and represents a rich source of data researchers can use to analyse users’ experiences, knowledge and sentiments. Research on social media posts during COVID-19 has identified, to date, the perpetuity of traditional gendered norms and experiences. Yet these studies are mostly based on Western social media platforms. Little is known about gendered experiences of lockdown communicated on non-Western social media platforms. Using data from Weibo, China’s leading social media platform, we examine gendered user patterns and sentiment during the first wave of the pandemic between 1 January 2020 and 1 July 2020. We find that Weibo posts by self-identified women and men conformed with some gendered norms identified on other social media platforms during the COVID-19 pandemic (posting patterns and keyword usage) but not all (sentiment). This insight may be important for targeted public health messaging on social media during future health emergencies.To cite: Gan CCR, Feng SA, Feng H, et al. #WuhanDiary and #WuhanLockdown: gendered posting patterns and behaviours on Weibo during the COVID-19 pandemic. BMJ Global Health 2022;0:e008149. doi:10.1136/bmjgh-2021-008149

  6. H

    Replication data for: How Censorship in China Allows Government Criticism...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 4, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gary King; Jennifer Pan; Margaret E. Roberts (2017). Replication data for: How Censorship in China Allows Government Criticism but Silences Collective Expression [Dataset]. http://doi.org/10.7910/DVN1/22691
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 4, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Gary King; Jennifer Pan; Margaret E. Roberts
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.1/customlicense?persistentId=doi:10.7910/DVN1/22691https://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.1/customlicense?persistentId=doi:10.7910/DVN1/22691

    Area covered
    China
    Description

    We offer the first large scale, multiple source analysis of the outcome of what may be the most extensive effort to selectively censor human expression ever implemented. To do this, we have devised a system to locate, download, and analyze the content of millions of social media posts originating from nearly 1,400 different social media services all over China before the Chinese government is able to find, evaluate, and censor (i.e., remove from the Internet) the large subset they deem objectionable. Using modern computer-assisted text analytic methods that we adapt to and validate in the Chinese language, we compare the substantive content of posts censored to those not censored over time in each of 85 topic areas. Contrary to previous understandings, posts with negative, even vitriolic, criticism of the state, its leaders, and its policies are not more likely to be censored. Instead, we show that the censorship program is aimed at curtailing collective action by silencing comments that represent, reinforce, or spur social mobilization, regardless of content. Censorship is oriented toward attempting to forestall collective activities that are occurring now or may occur in the future --- and, as such, seem to clearly expose government intent. Notes: Please see our followup article published in Science, "Reverse-Engineering Censorship In China: Randomized Experimentation And Participant Observation." See also: Automated Text Analysis

  7. Covid19_ChineseSocialMedia_Hotspots

    • kaggle.com
    Updated Apr 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hirsch (2020). Covid19_ChineseSocialMedia_Hotspots [Dataset]. https://www.kaggle.com/hirschsun/covid19-chinesesocialmedia-hotspots/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hirsch
    Description

    Context

    From the beginning of 2020 to April 8th (the day Wuhan reopened), this dataset summarizes the social media hotspots and what people focused in the mainland of China, as well as the epidemic development trend during this period. The dataset containing four .csv files covers most social media platforms in the mainland: Sina Weibo, TikTok, Toutiao and Douban.

    Sina Weibo

    a platform based on fostering user relationships to share, disseminate and receive information. Through either the website or the mobile app, users can upload pictures and videos publicly for instant sharing, with other users being able to comment with text, pictures and videos, or use a multimedia instant messaging service. The company initially invited a large number of celebrities to join the platform at the beginning, and has since invited many media personalities, government departments, businesses and non-governmental organizations to open accounts as well for the purpose of publishing and communicating information. To avoid the impersonation of celebrities, Sina Weibo uses verification symbols; celebrity accounts have an orange letter "V" and organizations' accounts have a blue letter "V". Sina Weibo has more than 500 million registered users;[12] out of these, 313 million are monthly active users, 85% use the Weibo mobile app, 70% are college-aged, 50.10% are male and 49.90% are female. There are over 100 million messages posted by users each day. With 90 million followers, actress Xie Na holds the record for the most followers on the platform. Despite fierce competition among Chinese social media platforms, Sina Weibo has proven to be the most popular; part of this success may be attributable to the wider use of mobile technologies in China.[https://en.wikipedia.org/wiki/Sina_Weibo]

    Douyin

    Douyin (English: TikTok), referred to as TikTok, is a short-video social application on mobile phones. Users can record 15-second short videos, which can easily complete mouth-to-mouth (to mouth), and built-in special effects The user can leave a message to the video. Since September 2016, Toutiao has been launched online and is positioned as a short music video community suitable for Chinese young people. The application is vertical music UGC short videos, and the number of users has grown rapidly since 2017. In June 2018, Douyin reached 500 million monthly active users worldwide and 150 million daily active users in China. [https://zh.wikipedia.org/wiki/%E6%8A%96%E9%9F%B3]

    Toutiao

    Toutiao or Jinri Toutiao is a Chinese news and information content platform, a core product of the Beijing-based company ByteDance. By analyzing the features of content, users and users’ interaction with content, the company's algorithm models generate a tailored feed list of content for each user. Toutiao is one of China's largest mobile platforms of content creation, aggregation and distribution underpinned by machine learning techniques, with 120 million daily active users as of September 2017. [https://en.wikipedia.org/wiki/Toutiao]

    Douban

    Douban.com (Chinese: 豆瓣; pinyin: Dòubàn), launched on March 6, 2005, is a Chinese social networking service website that allows registered users to record information and create content related to film, books, music, recent events, and activities in Chinese cities. It could be seen as one of the most influential web 2.0 websites in China. Douban also owns an internet radio station, which ranks No.1 in the iOS App Store in 2012. Douban was formerly open to both registered and unregistered users. For registered users, the site recommends potentially interesting books, movies, and music to them in addition to serving as a social network website such as WeChat, Weibo and record keeper; for unregistered users, the site is a place to find ratings and reviews of media. Douban has about 200 million registered users as of 2013. The site serves pan-Chinese users, and its contents are in Chinese. It covers works and media in Chinese and in foreign languages. Some Chinese authors and critics register their official personal pages on the site. [https://en.wikipedia.org/wiki/Douban]

    Content

    Weibo realTimeHotSearchList can be regarded as a platform for gathering celebrity gossip, social life and major news. In this document, I collect the top 50 topics of the hot search list every 12 hours during the day, so there are 100 hot topics each day. These topics are converted into English by Google translation, although the translation effect is not ideal due to sentence segmentation and language background deviation. In this document, I created a new column ['Coron-Related ( 1 yes, 0 not ) '] to mark topics related to the new crown, if relevant, it is marked as 1, if not then marked empty or 0. The google translation is extremely inaccurate (so maybe google the Chinese title to confirm is the best bet...

  8. h

    Supporting data for: "The Role of “State Endorsers” in Extending Chinese...

    • datahub.hku.hk
    zip
    Updated Sep 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marie Brockling; Haohan Lily Hu; King Wa Fu (2023). Supporting data for: "The Role of “State Endorsers” in Extending Chinese Propaganda: Evaluating the Reach of Pro-Regime YouTubers" [Dataset]. http://doi.org/10.25442/hku.19548076.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 19, 2023
    Dataset provided by
    HKU Data Repository
    Authors
    Marie Brockling; Haohan Lily Hu; King Wa Fu
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Supporting data for the paper "The Role of 'State Endorsers' in Extending Chinese Propaganda: Evaluating the Reach of Pro-Regime YouTubers" published in the International Journal of Communication (IJOC) in 2023.A previous version of this paper was presented at the 72nd Annual International Communication Association (ICA) Conference on 26-30 May 2022.The dataset contains the code and raw data used in the project, as well as all the resulting graphics.Please feel free to reach out with any questions. Also please let me know if you use the data and send me a link to your publication.

  9. f

    Social media images of China's terraces

    • figshare.com
    zip
    Updated Apr 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Song Chen (2025). Social media images of China's terraces [Dataset]. http://doi.org/10.6084/m9.figshare.28813259.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 19, 2025
    Dataset provided by
    figshare
    Authors
    Song Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    This dataset contains geotagged social media images of China's terraces, sourced from the Sina Weibo microblogging platform (https://weibo.com). Geo-tagged images were collected using Weibo cookies and Python-based scraping tools (available at: https://github.com/dataabc/weibo-search). The search keyword used was "terraces", and the collection timeframe spanned from July 2022 to June 2024. We included only images with clear geographic information located within China. Images of poor quality (e.g., synthesized from multiple images, excessively cluttered, or blurry) and irrelevant content such as advertisements, paintings, or text were removed.This dataset classified the images into seven distinct categories to represent different types of cultural ecosystem services (CES): landscape, species, structures, indoor, food, activities, and posing. Specifically: (1) Landscape images depict open natural landscapes, such as rice terraces, often with a visible sky. (2) Species images consist of close-up shots of animals or plants. (3) Structures images mainly feature man-made structures, often traditional houses. (4) Indoor images show the interiors of buildings, including dining rooms, bedrooms, etc.. (5) Food images are classified as images depicting food, dishes, and beverages. (6) Activities images capture people physically interacting with the environment, including group photos and folkloric activities. (7) Posing images show people looking into the camera.This dataset includes a subset of 2,720 randomly selected and manually labeled images, accounting for approximately 5% of the total collected images. Among them, landscape images were the most numerous (1,347), followed by structures (408), activities (480), posing (146), food (153), species (111), and indoor (75).These images can be used for training classification models. All code used for model training and testing is available at: https://github.com/chen7092/Deep-learning-for-cultural-ecosystem-services-of-terraces.

  10. Explaining the Trust Paradox: How Foreign Media Strengthens Government...

    • zenodo.org
    bin, csv
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhilong Zhao; Zhilong Zhao (2025). Explaining the Trust Paradox: How Foreign Media Strengthens Government Confidence via Political–Economic Awareness in China [Dataset]. http://doi.org/10.5281/zenodo.15703594
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zhilong Zhao; Zhilong Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    This repository contains the replication-ready survey dataset used in Zhao & Liu (2025): *Explaining the Trust Paradox: How Foreign Media Strengthens Government Confidence via Political–Economic Awareness in China*.

    * `data.csv` – Cleaned respondent-level dataset (N = 3,788; 110 variables) used for all analyses reported in the article. Personally identifiable information has been removed in compliance with the CNSDA license.
    * `codebook.pdf` – Variable names, wording, scales, and basic descriptive statistics. *(to be added by uploader if needed)*

    The dataset originates from the publicly available "2021 Internet Users' Social Awareness Survey" released by the Chinese National Survey Data Archive (CNSDA). We performed basic cleaning (variable renaming, numeric recoding, and removal of direct identifiers). Cleaning scripts are available upon request.

  11. Explaining the Trust Paradox: How Foreign Media Strengthens Government...

    • zenodo.org
    zip
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhilong Zhao; Zhilong Zhao (2025). Explaining the Trust Paradox: How Foreign Media Strengthens Government Confidence via Political–Economic Awareness in China [Dataset]. http://doi.org/10.5281/zenodo.15745886
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 26, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zhilong Zhao; Zhilong Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    This dataset accompanies the manuscript “Explaining the Trust Paradox: How Foreign Media Strengthens Government Confidence via Political–Economic Awareness—and for Whom—in China” (Zhao & Liu, 2025, submitted to Political Communication). It contains the fully de-identified, replication-ready microdata from the 2021 Internet Users’ Social Awareness Survey (N = 3,788), as well as metadata and documentation files.

    Contents:

    • data.csv: Cleaned, de-identified respondent-level dataset (all direct identifiers, IP addresses, and timestamps removed).
    • README.md: Detailed description of variables, data provenance, and usage instructions.
    • LICENSE.txt: Licensing information (CC-BY-4.0 for data, MIT for code).
    • CITATION.cff: Citation metadata for automated referencing.

    Provenance:

    The original survey was conducted by the Chinese National Survey Data Archive (CNSDA). This version has been processed to ensure full compliance with data protection and journal transparency requirements.

  12. S

    Gansu Online Media Dataset for Research on Language Life in Ethnic Areas

    • scidb.cn
    Updated Oct 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhu Dengyun; Huang Rui; Wan Fucheng (2024). Gansu Online Media Dataset for Research on Language Life in Ethnic Areas [Dataset]. http://doi.org/10.57760/sciencedb.14567
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Zhu Dengyun; Huang Rui; Wan Fucheng
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    Gansu
    Description

    This study constructed a dataset of online media in Gansu Province from 2013 to 2022, with data from six major online media platforms in Linxia Hui Autonomous Prefecture and Gannan Tibetan Autonomous Prefecture, including Linxia Prefecture Government Website, Ethnic Daily, China Linxia Website, Shambhala Online, and China Gannan Website. The dataset covers a wide range of social, cultural, and linguistic aspects of the ethnic areas in Gansu, spanning a decade, and all the data are Chinese-language news reports and commentaries. Neologism extraction was carried out for each year's dataset, and the extracted neologisms were analyzed for their characteristics in terms of word frequency, lexicality, word number, cohesion, degrees of freedom, and neologism probability. The dataset was constructed with strict quality control measures, including manual proofreading, noise filtering, de-emphasis processing and language annotation, to ensure the accuracy and completeness of the data. This dataset is an important basic data for the study of language use, social and cultural dynamics and bilingual education development in ethnic areas, and has the value of being widely used in policy analysis, social opinion monitoring and language policy research.

  13. H

    Data from: Impact of ByteDance crisis communication strategies on different...

    • dataverse.harvard.edu
    Updated Sep 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ShaoPeng Che (2023). Impact of ByteDance crisis communication strategies on different social media users [Dataset]. http://doi.org/10.7910/DVN/DXSSZH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    ShaoPeng Che
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    On July 30, 2020, the US President Donald Trump announced his plan to use executive orders or emergency economic powers to ban TikTok and disagreed with Microsoft’s acquisition of TikTok in the US. ByteDance, TikTok’s parent company, subsequently conducted several Chinese crisis communications on Toutiao — a platform owned by ByteDance that provides information to Chinese people. However, these announcements were reposted, sometimes rephrased or reformatted by third-party users on other Chinese social media platforms. These third-party users included both well-known influencers and general users. For example, the discussions became more salient on Sina Weibo, China’s largest online social media platform, than on any other platform, including Toutiao. Therefore, comparing crisis communications across different social media platforms is necessary. 50,702 data points were obtained for the entire dataset. Considering the efficiency of the manually labeled data, 8,793 data points were obtained after stratified random sampling of the dataset.

  14. d

    China Retail Investor Sentiment Analytics | Alternative Data | Social Media...

    • datarade.ai
    .json, .csv
    Updated Apr 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datago Technology Limited (2024). China Retail Investor Sentiment Analytics | Alternative Data | Social Media | China, Hong Kong, US stocks | Intra-day Update [Dataset]. https://datarade.ai/data-products/china-retail-investor-sentiment-analytics-alternative-data-datago-technology-limited
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Apr 1, 2024
    Dataset authored and provided by
    Datago Technology Limited
    Area covered
    United States, Hong Kong, China
    Description

    China Retail Investor Sentiment Analytics provides sentiment analytics of Chinese retail investors based on 2 stock forums, Guba (GACRIS dataset) and Xueqiu (XACRIS dataset), the most popular stock forums in China from 2007.

    By utilizing in-house NLP models which are dedicatedly optimized for Chinese stock forum posts and trained on a proprietary manually labeled and cross-checked training data, the dataset provides accurate text analytics of post content, including but not limited to quality, sentiment, and relevant stocks with relevance score. In addition to the aggregated statistics of stock sentiment and popularity, the dataset also provides rich and fine-grained information for each user/post in record level. For example, it reports the registration time, number of followers for each user, and also the replies/readings and province being published for each post. Moreover, these meta data are processed in point-in-Time (PIT) manner since 2019.

    The dataset could help clients easily capture the sentiment and popularity among millions of Chinese retail investors. On the other hand, it also offers flexibility for clients to customize novel analytics, such as studying the sentiment (conformity/divergence) of users of different level of influence or posts of different hotness, or simply filtering the posts published by users which are too active/positive/negative in a time window when aggregating the statistics.

    Coverage: All A-share and Hong Kong stocks, 300+ popular US stocks Update Frequency: Daily or intra-day

  15. h

    Weiboscope Open Data

    • datahub.hku.hk
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    King Wa Fu (2023). Weiboscope Open Data [Dataset]. http://doi.org/10.25442/hku.16674565.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    HKU Data Repository
    Authors
    King Wa Fu
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Welcome to the Open Weiboscope Data Access website. Weiboscope is a data collection and visualization project developed by the research team at the Journalism and Media Studies Centre, The University of Hong Kong (JMSC). One of the objectives of the project is to make censored Sina Weibo posts of a selected group of Chinese microbloggers publicly accessible, which enables academic use of the data for better understanding of the social media in China and making the Chinese media system more transparent. Since January 2011, the project has been regularly sampling timelines of more than 350,000 Chinese microbloggers who have more than 1,000 followers. The methodology has been detailed in an IEEE Internet Computing article (Fu, Chan, Chau, 2013). Besides, we have sampled Sina Weibo accounts randomly since 2012 and the samples' most recent timeline were collected and stored into the dataset. Our sampling approach is reported in a PLOS ONE article (Fu, Chau, 2013). This site contains all the Weiboscope data collected in the year 2012. We are delighted to share the data for open access. But for ethical reason, the data are anonymized, i.e. real user and message id are replaced by pseudo ID. When using the data, please cite the paper below. King-wa Fu, CH Chan, Michael Chau. Assessing Censorship on Microblogs in China: Discriminatory Keyword Analysis and Impact Evaluation of the 'Real Name Registration' Policy. IEEE Internet Computing. 2013; 17(3): 42-50. http://doi.ieeecomputersociety.org/10.1109/MIC.2013.28 Data Set Statistics: Number of weibo messages: 226841122 Number of deleted messages: 10865955 Number of censored ('Permission Denied') messages: 86083 Number of unique weibo users: 14387628 Enquiry: Send your question/comment to weiboscope@gmail.com. The project is funded by the University of Hong Kong Seed Funding Program for Basic Research.Citation:Fu KW, Chan CH, Chau M. Assessing Censorship on Microblogs in China: Discriminatory Keyword Analysis and the Real-Name Registration Policy. Internet Computing, IEEE. 2013; 17(3): 42-50.

  16. r

    Visualisations for China’s news media tweeting, competing with US source

    • researchdata.edu.au
    • commons.datacite.org
    Updated May 9, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joyce Nip; Chao Sun (2018). Visualisations for China’s news media tweeting, competing with US source [Dataset]. http://doi.org/10.4227/11/5af26eb0e787d
    Explore at:
    Dataset updated
    May 9, 2018
    Dataset provided by
    The University of Sydney
    Authors
    Joyce Nip; Chao Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2016 - Oct 24, 2017
    Area covered
    China, United States
    Description

    This dataset includes six figures used for the journal paper that paper examines China's recent initiative on international social media and assesses its effectiveness in counteracting Western dominance in international communication. The figures are interactive D3.JS visualisations written in javascript and displayed in html files. The intention of uploading this dataset is enabling the readers of the paper to be able to access this tool for better understanding of the methodology described in the paper.

  17. Impacts of Generative AI on Social Capital in China - Survey Data

    • figshare.com
    xlsx
    Updated Aug 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yin Liu; ZAOZAO ZHANG; Yingkai Wu (2024). Impacts of Generative AI on Social Capital in China - Survey Data [Dataset]. http://doi.org/10.6084/m9.figshare.26830381.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 26, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Yin Liu; ZAOZAO ZHANG; Yingkai Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    This dataset comprises responses from 1,053 online participants surveyed across China. Collected through social media and email distributions, it includes detailed queries about the usage and perceptions of generative AI, categorized into task-oriented and social-oriented applications. The dataset features demographic variables (age, gender, education, location) alongside indicators measuring both offline and online social capital. Analysis reveals that generative AI users generally possess higher social capital than non-users, with task-oriented usage enhancing offline social capital yet detracting from online interactions. Conversely, socially-oriented usage boosts social capital across both spheres. This data, rich with insights on the interplay between technology use and social structures, is pivotal for understanding technological impacts on societal dynamics and could guide future technology policy and integration strategies.

  18. o

    TapTap User Feedback Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). TapTap User Feedback Dataset [Dataset]. https://www.opendatabay.com/data/consumer/1dcc1820-93e8-4499-a0bc-09774d5b03ac
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Reviews & Ratings
    Description

    This dataset provides user reviews for mobile games, collected from TapTap, a popular mobile game community and distribution platform in China. Its primary purpose is to facilitate sentiment analysis of Chinese game reviews. The reviews are mostly in Chinese and cover the 1,000 most recent comments for 20 popular games up until April 5, 2025. Each entry includes the user's rating, the text content of the review, the number of likes the review received, the publication timestamp, the device model used (where available), the name of the game reviewed, and a sentiment label. User identifiers have been removed to protect privacy.

    Columns

    • rating: User-provided star rating for the game, ranging from 1 to 5.
    • review_content: The text content of the user review, written in Chinese.
    • likes: The number of likes received by the review on TapTap.
    • publish_time: The date and time the review was published, formatted as YYYY-MM-DD HH:MM:SS. This column has been converted to datetime format.
    • device_model: The device model reported by the user when posting the review. This column may contain inconsistencies and 'unknown' values.
    • game_name: The name of the mobile game being reviewed. There are 20 distinct game names in the dataset.
    • sentiment: A binary sentiment label, where 0 indicates negative sentiment and 1 indicates positive sentiment. This label is directly converted from the star rating (1-2 stars typically map to 0, and 3-5 stars map to 1).

    Distribution

    The dataset is typically provided in CSV format. It contains reviews for 20 distinct mobile games. While the source mentions it covers the 1,000 most recent comments, other details indicate larger counts for specific fields, with ratings having around 39,592 unique entries and sentiment labels totalling approximately 39,985 entries across positive and negative categories. The number of likes varies, with a large majority in the 0-142.75 range. The dataset structure has user identifiers removed for privacy.

    Usage

    This dataset is ideal for conducting sentiment analysis on Chinese mobile game reviews. It can also be used for understanding user feedback trends, identifying common issues or praises within game reviews, and developing natural language processing models tailored to Chinese text.

    Coverage

    The dataset's geographic scope is China, as the data is collected from the TapTap platform, which is popular there. The time range for reviews spans from June 6, 2017, up to April 5, 2025, according to the latest comments. Device model information is included, though it contains inconsistencies and 'unknown' entries. No specific demographic information about the users is available.

    License

    CC-BY

    Who Can Use It

    This dataset is suitable for a variety of users, including: * Data scientists and machine learning engineers: For building and testing sentiment analysis models for Chinese text. * Game developers: To gain insights into player satisfaction, identify areas for improvement in their games, and understand market reception in China. * Market researchers: For analysing trends in the mobile gaming industry and understanding consumer behaviour in the Chinese market. * Academics and students: For research projects involving natural language processing, data analysis, and social media sentiment.

    Dataset Name Suggestions

    • TapTap Chinese Mobile Game Reviews
    • Chinese Mobile Game User Sentiment Dataset
    • TapTap Game Review Analysis Data
    • Mobile Gaming Reviews (Chinese)
    • TapTap User Feedback Dataset

    Attributes

    Original Data Source: TapTap Mobile Game Reviews (Chinese)

  19. H

    Database of Chinese Elite Politics Networking

    • dataverse.harvard.edu
    • dataone.org
    Updated Dec 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daqi (Reinhardt) Fang (2022). Database of Chinese Elite Politics Networking [Dataset]. http://doi.org/10.7910/DVN/ZN8VOZ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 2, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Daqi (Reinhardt) Fang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    China
    Description

    This is an Open-Access database that scholars could find the primary-hand source of the networking of China Politics. If you want to update or revise the data, please contact me (Reinhardt114514@outlook.com). Contributor: Daqi (Reinhardt) Fang, Student at Hangzhou Yungu School

  20. d

    Replication Data for: Accountability from Cyberspace? Scandal Exposure on...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LI, YIRAN (2023). Replication Data for: Accountability from Cyberspace? Scandal Exposure on the Internet and Official Governance in China [Dataset]. http://doi.org/10.7910/DVN/WF7TLZ
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    LI, YIRAN
    Description

    This article explores the effects of social media on government accountability under authoritarian regimes. It examines whether online discussions have a disciplining effect on officials’ scandals. We use a unique dataset containing records of scandals discussed on microblogs in China to systematically study their effects on the government response process and officials’ disciplining. We find that the government employs clear strategies: higher levels of online discussion lead to quicker government responses and more severe punishment of the officials involved. Scandals involving sexual and economic factors, which initially capture more attention, involve quicker responses and more severe punishments. Even when we exploit rainfall as the instrumental variable to mitigate the endogeneity, the results are still robust. Our findings highlight the accountability mechanism facilitated by social media and the power of social media empowerment.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nanyun Peng; Mark Dredze (2021). Weibo NER Dataset [Dataset]. https://paperswithcode.com/dataset/weibo-ner

Weibo NER Dataset

Explore at:
190 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
May 14, 2021
Authors
Nanyun Peng; Mark Dredze
Description

The Weibo NER dataset is a Chinese Named Entity Recognition dataset drawn from the social media website Sina Weibo.

Search
Clear search
Close search
Google apps
Main menu