47 datasets found

P
Weibo NER Dataset
paperswithcode.com
opendatalab.com
Updated May 14, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nanyun Peng; Mark Dredze (2021). Weibo NER Dataset [Dataset]. https://paperswithcode.com/dataset/weibo-ner
Explore at:
Dataset updated
May 14, 2021
Authors
Nanyun Peng; Mark Dredze
Description
The Weibo NER dataset is a Chinese Named Entity Recognition dataset drawn from the social media website Sina Weibo.
Number of social media users in China 2018-2027
statista.com
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of social media users in China 2018-2027 [Dataset]. https://www.statista.com/statistics/277586/number-of-social-network-users-in-china/
Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
China
Description
In 2022, there were around **** billion social media users in China. Despite Facebook, YouTube, and Twitter being blocked in the country, local social networking sites such as WeChat and Weibo have been attracting millions of users, making China the world’s biggest social media market.

What is the role of social media in China? Around ** percent of the Chinese population use internet. Social networking plays a huge role among netizens, especially the younger generation. Chinese social media, just like Western equivalents, not only serves as a way to communicate online, but also as one of the main sources of news and entertainment, e-payments, shopping advisors, and dating channels. In 2021, over ** percent of surveyed social media users said they mostly appreciated that social networks help them to keep in touch with friends and family, but also share their life moments and thoughts.

What are the most popular social media platforms?

WeChat (Weixin in Chinese) is by far the most commonly seen social app in the country, used for anything from texting/calling to photo and video sharing, dating, financial services, game-playing, shopping, ride hailing, and so on. However, Chinese social media scene is quite diverse and dynamic, therefore, it is not just about WeChat. Instant messaging app Tencent QQ, microblogging site Weibo, video sharing app Youku Tudou, short-form video app Douyin (aka TikTok), photo editing and sharing app Meitu, restaurant recommendation and food ordering platform Meituan, Quora equivalent Zhihu, and dating app Momo are just a few among the most popular Chinese social media examples.
H
Replication Data for: Capturing Clicks: How the Chinese Government Uses...
dataverse.harvard.edu
Updated Apr 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jennifer Pan (2020). Replication Data for: Capturing Clicks: How the Chinese Government Uses Clickbait to Compete for Visibility [Dataset]. http://doi.org/10.7910/DVN/TALJOT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/TALJOT
Dataset updated
Apr 18, 2020
Dataset provided by
Harvard Dataverse
Authors
Jennifer Pan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
China
Description
The proliferation of social media and digital technologies has made it necessary for governments to expand their focus beyond propaganda content in order to disseminate propaganda effectively. We identify a strategy of using clickbait to increase the visibility of political propaganda. We show that such a strategy is used across China by combining ethnography with a computational analysis of a novel dataset of the titles of 197,303 propaganda posts made by 213 Chinese city-level governments on WeChat. We find that Chinese propagandists face intense pressures to demonstrate their effectiveness on social media because their work is heavily quantified---measured, analyzed, and ranked---with metrics such as views and likes. Propagandists use both clickbait and non-propaganda content (e.g., lifestyle tips) to capture clicks, but rely more heavily on clickbait because it does not decrease space available for political propaganda. Government propagandists use clickbait at a rate commensurate with commercial and celebrity social media accounts. Clickbait is associated with more views and likes, and greater reach of government propaganda outlets and messages. These results reveal how the advertising-based business model and affordances of social media influence political propaganda and how government strategies to control information are moving beyond censorship, propaganda, and disinformation.
Hong Kong SAR, China Internet Usage: Social Media Market Share: All...
ceicdata.com
Updated May 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hong Kong SAR, China Internet Usage: Social Media Market Share: All Platforms: Mixi [Dataset]. https://www.ceicdata.com/en/hong-kong/internet-usage-social-media-market-share
Explore at:
Dataset updated
May 25, 2024
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 18, 2024 - May 25, 2024
Area covered
Hong Kong
Description
Internet Usage: Social Media Market Share: All Platforms: Mixi data was reported at 0.000 % in 25 May 2024. This stayed constant from the previous number of 0.000 % for 24 May 2024. Internet Usage: Social Media Market Share: All Platforms: Mixi data is updated daily, averaging 0.000 % from May 2024 (Median) to 25 May 2024, with 8 observations. The data reached an all-time high of 0.060 % in 22 May 2024 and a record low of 0.000 % in 25 May 2024. Internet Usage: Social Media Market Share: All Platforms: Mixi data remains active status in CEIC and is reported by Statcounter Global Stats. The data is categorized under Global Database’s Hong Kong SAR (China) – Table HK.SC.IU: Internet Usage: Social Media Market Share.
h
Data in the paper titled "#WuhanDiary and #WuhanLockdown: gendered posting...
datahub.hku.hk
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
King Wa Fu; Sara Davies; Karen Ann Grépin; Clare Wenham; Connie Gan; Feng Shuo (2023). Data in the paper titled "#WuhanDiary and #WuhanLockdown: gendered posting patterns and behaviours on Weibo during the COVID-19 pandemic" [Dataset]. http://doi.org/10.25442/hku.19487396.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25442/hku.19487396.v1
Dataset updated
May 31, 2023
Dataset provided by
HKU Data Repository
Authors
King Wa Fu; Sara Davies; Karen Ann Grépin; Clare Wenham; Connie Gan; Feng Shuo
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Social media can be both a source of information and misinformation during health emergencies. During the COVID-19 pandemic, social media became a ubiquitous tool for people to communicate and represents a rich source of data researchers can use to analyse users’ experiences, knowledge and sentiments. Research on social media posts during COVID-19 has identified, to date, the perpetuity of traditional gendered norms and experiences. Yet these studies are mostly based on Western social media platforms. Little is known about gendered experiences of lockdown communicated on non-Western social media platforms. Using data from Weibo, China’s leading social media platform, we examine gendered user patterns and sentiment during the first wave of the pandemic between 1 January 2020 and 1 July 2020. We find that Weibo posts by self-identified women and men conformed with some gendered norms identified on other social media platforms during the COVID-19 pandemic (posting patterns and keyword usage) but not all (sentiment). This insight may be important for targeted public health messaging on social media during future health emergencies.To cite: Gan CCR, Feng SA, Feng H, et al. #WuhanDiary and #WuhanLockdown: gendered posting patterns and behaviours on Weibo during the COVID-19 pandemic. BMJ Global Health 2022;0:e008149. doi:10.1136/bmjgh-2021-008149
H
Replication data for: How Censorship in China Allows Government Criticism...
dataverse.harvard.edu
search.dataone.org
Updated Apr 4, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gary King; Jennifer Pan; Margaret E. Roberts (2017). Replication data for: How Censorship in China Allows Government Criticism but Silences Collective Expression [Dataset]. http://doi.org/10.7910/DVN1/22691
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN1/22691
Dataset updated
Apr 4, 2017
Dataset provided by
Harvard Dataverse
Authors
Gary King; Jennifer Pan; Margaret E. Roberts
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.1/customlicense?persistentId=doi:10.7910/DVN1/22691https://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.1/customlicense?persistentId=doi:10.7910/DVN1/22691
Area covered
China
Description
We offer the first large scale, multiple source analysis of the outcome of what may be the most extensive effort to selectively censor human expression ever implemented. To do this, we have devised a system to locate, download, and analyze the content of millions of social media posts originating from nearly 1,400 different social media services all over China before the Chinese government is able to find, evaluate, and censor (i.e., remove from the Internet) the large subset they deem objectionable. Using modern computer-assisted text analytic methods that we adapt to and validate in the Chinese language, we compare the substantive content of posts censored to those not censored over time in each of 85 topic areas. Contrary to previous understandings, posts with negative, even vitriolic, criticism of the state, its leaders, and its policies are not more likely to be censored. Instead, we show that the censorship program is aimed at curtailing collective action by silencing comments that represent, reinforce, or spur social mobilization, regardless of content. Censorship is oriented toward attempting to forestall collective activities that are occurring now or may occur in the future --- and, as such, seem to clearly expose government intent. Notes: Please see our followup article published in Science, "Reverse-Engineering Censorship In China: Randomized Experimentation And Participant Observation." See also: Automated Text Analysis
h
Supporting data for: "The Role of “State Endorsers” in Extending Chinese...
datahub.hku.hk
zip
Updated Sep 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marie Brockling; Haohan Lily Hu; King Wa Fu (2023). Supporting data for: "The Role of “State Endorsers” in Extending Chinese Propaganda: Evaluating the Reach of Pro-Regime YouTubers" [Dataset]. http://doi.org/10.25442/hku.19548076.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25442/hku.19548076.v3
Dataset updated
Sep 19, 2023
Dataset provided by
HKU Data Repository
Authors
Marie Brockling; Haohan Lily Hu; King Wa Fu
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Supporting data for the paper "The Role of 'State Endorsers' in Extending Chinese Propaganda: Evaluating the Reach of Pro-Regime YouTubers" published in the International Journal of Communication (IJOC) in 2023.A previous version of this paper was presented at the 72nd Annual International Communication Association (ICA) Conference on 26-30 May 2022.The dataset contains the code and raw data used in the project, as well as all the resulting graphics.Please feel free to reach out with any questions. Also please let me know if you use the data and send me a link to your publication.
f
Social media images of China's terraces
figshare.com
zip
Updated Apr 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Song Chen (2025). Social media images of China's terraces [Dataset]. http://doi.org/10.6084/m9.figshare.28813259.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28813259.v2
Dataset updated
Apr 19, 2025
Dataset provided by
figshare
Authors
Song Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
This dataset contains geotagged social media images of China's terraces, sourced from the Sina Weibo microblogging platform (https://weibo.com). Geo-tagged images were collected using Weibo cookies and Python-based scraping tools (available at: https://github.com/dataabc/weibo-search). The search keyword used was "terraces", and the collection timeframe spanned from July 2022 to June 2024. We included only images with clear geographic information located within China. Images of poor quality (e.g., synthesized from multiple images, excessively cluttered, or blurry) and irrelevant content such as advertisements, paintings, or text were removed.This dataset classified the images into seven distinct categories to represent different types of cultural ecosystem services (CES): landscape, species, structures, indoor, food, activities, and posing. Specifically: (1) Landscape images depict open natural landscapes, such as rice terraces, often with a visible sky. (2) Species images consist of close-up shots of animals or plants. (3) Structures images mainly feature man-made structures, often traditional houses. (4) Indoor images show the interiors of buildings, including dining rooms, bedrooms, etc.. (5) Food images are classified as images depicting food, dishes, and beverages. (6) Activities images capture people physically interacting with the environment, including group photos and folkloric activities. (7) Posing images show people looking into the camera.This dataset includes a subset of 2,720 randomly selected and manually labeled images, accounting for approximately 5% of the total collected images. Among them, landscape images were the most numerous (1,347), followed by structures (408), activities (480), posing (146), food (153), species (111), and indoor (75).These images can be used for training classification models. All code used for model training and testing is available at: https://github.com/chen7092/Deep-learning-for-cultural-ecosystem-services-of-terraces.
Explaining the Trust Paradox: How Foreign Media Strengthens Government...
zenodo.org
bin, csv
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhilong Zhao; Zhilong Zhao (2025). Explaining the Trust Paradox: How Foreign Media Strengthens Government Confidence via Political–Economic Awareness in China [Dataset]. http://doi.org/10.5281/zenodo.15703594
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15703594
Dataset updated
Jun 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zhilong Zhao; Zhilong Zhao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
This repository contains the replication-ready survey dataset used in Zhao & Liu (2025): *Explaining the Trust Paradox: How Foreign Media Strengthens Government Confidence via Political–Economic Awareness in China*.

* `data.csv` – Cleaned respondent-level dataset (N = 3,788; 110 variables) used for all analyses reported in the article. Personally identifiable information has been removed in compliance with the CNSDA license.
* `codebook.pdf` – Variable names, wording, scales, and basic descriptive statistics. *(to be added by uploader if needed)*

The dataset originates from the publicly available "2021 Internet Users' Social Awareness Survey" released by the Chinese National Survey Data Archive (CNSDA). We performed basic cleaning (variable renaming, numeric recoding, and removal of direct identifiers). Cleaning scripts are available upon request.
Explaining the Trust Paradox: How Foreign Media Strengthens Government...
zenodo.org
zip
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhilong Zhao; Zhilong Zhao (2025). Explaining the Trust Paradox: How Foreign Media Strengthens Government Confidence via Political–Economic Awareness in China [Dataset]. http://doi.org/10.5281/zenodo.15745886
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15745886
Dataset updated
Jun 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zhilong Zhao; Zhilong Zhao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
This dataset accompanies the manuscript “Explaining the Trust Paradox: How Foreign Media Strengthens Government Confidence via Political–Economic Awareness—and for Whom—in China” (Zhao & Liu, 2025, submitted to Political Communication). It contains the fully de-identified, replication-ready microdata from the 2021 Internet Users’ Social Awareness Survey (N = 3,788), as well as metadata and documentation files.

Contents:

data.csv: Cleaned, de-identified respondent-level dataset (all direct identifiers, IP addresses, and timestamps removed).

README.md: Detailed description of variables, data provenance, and usage instructions.

LICENSE.txt: Licensing information (CC-BY-4.0 for data, MIT for code).

CITATION.cff: Citation metadata for automated referencing.

Provenance:

The original survey was conducted by the Chinese National Survey Data Archive (CNSDA). This version has been processed to ensure full compliance with data protection and journal transparency requirements.
S
Gansu Online Media Dataset for Research on Language Life in Ethnic Areas
scidb.cn
Updated Oct 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhu Dengyun; Huang Rui; Wan Fucheng (2024). Gansu Online Media Dataset for Research on Language Life in Ethnic Areas [Dataset]. http://doi.org/10.57760/sciencedb.14567
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.14567
Dataset updated
Oct 12, 2024
Dataset provided by
Science Data Bank
Authors
Zhu Dengyun; Huang Rui; Wan Fucheng
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Area covered
Gansu
Description
This study constructed a dataset of online media in Gansu Province from 2013 to 2022, with data from six major online media platforms in Linxia Hui Autonomous Prefecture and Gannan Tibetan Autonomous Prefecture, including Linxia Prefecture Government Website, Ethnic Daily, China Linxia Website, Shambhala Online, and China Gannan Website. The dataset covers a wide range of social, cultural, and linguistic aspects of the ethnic areas in Gansu, spanning a decade, and all the data are Chinese-language news reports and commentaries. Neologism extraction was carried out for each year's dataset, and the extracted neologisms were analyzed for their characteristics in terms of word frequency, lexicality, word number, cohesion, degrees of freedom, and neologism probability. The dataset was constructed with strict quality control measures, including manual proofreading, noise filtering, de-emphasis processing and language annotation, to ensure the accuracy and completeness of the data. This dataset is an important basic data for the study of language use, social and cultural dynamics and bilingual education development in ethnic areas, and has the value of being widely used in policy analysis, social opinion monitoring and language policy research.
H
Data from: Impact of ByteDance crisis communication strategies on different...
dataverse.harvard.edu
Updated Sep 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ShaoPeng Che (2023). Impact of ByteDance crisis communication strategies on different social media users [Dataset]. http://doi.org/10.7910/DVN/DXSSZH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/DXSSZH
Dataset updated
Sep 22, 2023
Dataset provided by
Harvard Dataverse
Authors
ShaoPeng Che
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
On July 30, 2020, the US President Donald Trump announced his plan to use executive orders or emergency economic powers to ban TikTok and disagreed with Microsoft’s acquisition of TikTok in the US. ByteDance, TikTok’s parent company, subsequently conducted several Chinese crisis communications on Toutiao — a platform owned by ByteDance that provides information to Chinese people. However, these announcements were reposted, sometimes rephrased or reformatted by third-party users on other Chinese social media platforms. These third-party users included both well-known influencers and general users. For example, the discussions became more salient on Sina Weibo, China’s largest online social media platform, than on any other platform, including Toutiao. Therefore, comparing crisis communications across different social media platforms is necessary. 50,702 data points were obtained for the entire dataset. Considering the efficiency of the manually labeled data, 8,793 data points were obtained after stratified random sampling of the dataset.
d
China Retail Investor Sentiment Analytics | Alternative Data | Social Media...
datarade.ai
.json, .csv
Updated Apr 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datago Technology Limited (2024). China Retail Investor Sentiment Analytics | Alternative Data | Social Media | China, Hong Kong, US stocks | Intra-day Update [Dataset]. https://datarade.ai/data-products/china-retail-investor-sentiment-analytics-alternative-data-datago-technology-limited
Explore at:
.json, .csvAvailable download formats
Dataset updated
Apr 1, 2024
Dataset authored and provided by
Datago Technology Limited
Area covered
Hong Kong, United States, China
Description
China Retail Investor Sentiment Analytics provides sentiment analytics of Chinese retail investors based on 2 stock forums, Guba (GACRIS dataset) and Xueqiu (XACRIS dataset), the most popular stock forums in China from 2007.

By utilizing in-house NLP models which are dedicatedly optimized for Chinese stock forum posts and trained on a proprietary manually labeled and cross-checked training data, the dataset provides accurate text analytics of post content, including but not limited to quality, sentiment, and relevant stocks with relevance score. In addition to the aggregated statistics of stock sentiment and popularity, the dataset also provides rich and fine-grained information for each user/post in record level. For example, it reports the registration time, number of followers for each user, and also the replies/readings and province being published for each post. Moreover, these meta data are processed in point-in-Time (PIT) manner since 2019.

The dataset could help clients easily capture the sentiment and popularity among millions of Chinese retail investors. On the other hand, it also offers flexibility for clients to customize novel analytics, such as studying the sentiment (conformity/divergence) of users of different level of influence or posts of different hotness, or simply filtering the posts published by users which are too active/positive/negative in a time window when aggregating the statistics.

Coverage: All A-share and Hong Kong stocks, 300+ popular US stocks Update Frequency: Daily or intra-day
h
Weiboscope Open Data
datahub.hku.hk
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
King Wa Fu (2023). Weiboscope Open Data [Dataset]. http://doi.org/10.25442/hku.16674565.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25442/hku.16674565.v1
Dataset updated
May 30, 2023
Dataset provided by
HKU Data Repository
Authors
King Wa Fu
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Welcome to the Open Weiboscope Data Access website. Weiboscope is a data collection and visualization project developed by the research team at the Journalism and Media Studies Centre, The University of Hong Kong (JMSC). One of the objectives of the project is to make censored Sina Weibo posts of a selected group of Chinese microbloggers publicly accessible, which enables academic use of the data for better understanding of the social media in China and making the Chinese media system more transparent. Since January 2011, the project has been regularly sampling timelines of more than 350,000 Chinese microbloggers who have more than 1,000 followers. The methodology has been detailed in an IEEE Internet Computing article (Fu, Chan, Chau, 2013). Besides, we have sampled Sina Weibo accounts randomly since 2012 and the samples' most recent timeline were collected and stored into the dataset. Our sampling approach is reported in a PLOS ONE article (Fu, Chau, 2013). This site contains all the Weiboscope data collected in the year 2012. We are delighted to share the data for open access. But for ethical reason, the data are anonymized, i.e. real user and message id are replaced by pseudo ID. When using the data, please cite the paper below. King-wa Fu, CH Chan, Michael Chau. Assessing Censorship on Microblogs in China: Discriminatory Keyword Analysis and Impact Evaluation of the 'Real Name Registration' Policy. IEEE Internet Computing. 2013; 17(3): 42-50. http://doi.ieeecomputersociety.org/10.1109/MIC.2013.28 Data Set Statistics: Number of weibo messages: 226841122 Number of deleted messages: 10865955 Number of censored ('Permission Denied') messages: 86083 Number of unique weibo users: 14387628 Enquiry: Send your question/comment to weiboscope@gmail.com. The project is funded by the University of Hong Kong Seed Funding Program for Basic Research.Citation:Fu KW, Chan CH, Chau M. Assessing Censorship on Microblogs in China: Discriminatory Keyword Analysis and the Real-Name Registration Policy. Internet Computing, IEEE. 2013; 17(3): 42-50.
h
Supporting data for "A Meta-Intervention: Quantifying the Impact of Social...
datahub.hku.hk
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mingzhe Quan (2025). Supporting data for "A Meta-Intervention: Quantifying the Impact of Social Media Information on Adherence to Non-Pharmaceutical Interventions" [Dataset]. http://doi.org/10.25442/hku.29068061.v1
Explore at:
Unique identifier
https://doi.org/10.25442/hku.29068061.v1
Dataset updated
May 23, 2025
Dataset provided by
HKU Data Repository
Authors
Mingzhe Quan
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset supports a research project in the field of digital medicine, which aims to quantify the impact of disseminating scientific information on social media—as a form of "meta-intervention"—on public adherence to Non-Pharmaceutical Interventions (NPIs) during health crises such as the COVID-19 pandemic. The research encompasses multiple sub-studies and pilot experiments, drawing data from various global and China-specific social media platforms.The data included in this submission has been collected from several sources:From Sina Weibo and Tencent WeChat, 189 online poll datasets were collected, involving a total of 1,391,706 participants. These participants are users of Sina Weibo or Tencent WeChat.From Twitter, 187 tweets published by scientists (verified with a blue checkmark) related to COVID-19 were collected.From Xiaohongshu and Bilibili, textual content from 143 user posts/videos concerning COVID-19, along with associated user comments and specific user responses to a question, were gathered.It is important to note that while the broader research project also utilized a 3TB Reddit corpus hosted on Academic Torrents (academictorrents.com), this specific Reddit dataset is publicly available directly from Academic Torrents and is not included in this particular DataHub submission. The submitted dataset comprises publicly available data, formatted as Excel files (.xlsx), and includes the following:Filename: scientists' discourse (source from screenshot of tweets)Description: This file contains screenshots of tweets published by scientists on Twitter concerning COVID-19 research, its current status, and related topics. It also includes a coded analysis of the textual content from these tweets. Specific details regarding the coding scheme can be found in the readme.txt file.Filename: The links of online polls (Weibo & WeChat)Description: This data file includes information from online polls conducted on Weibo and WeChat after December 7, 2022. These polls, often initiated by verified users (who may or may not be science popularizers), aimed to track the self-reported proportion of participants testing positive for COVID-19 (via PCR or rapid antigen test) or remaining negative, particularly during periods of rapid Omicron infection spread. The file contains links to the original polls, links to the social media accounts that published these polls, and relevant metadata about both the poll-creating accounts and the online polls themselves.Filename: Online posts & comments (From Xiaohongshu & Bilibili)Description: This file contains textual content from COVID-19 related posts and videos published by users on the Xiaohongshu and Bilibili platforms. It also includes user-generated comments reacting to these posts/videos, as well as user responses to a specific question posed within the context of the original content.Key Features of this Dataset:Data Type: Mixed, including textual data, screenshots of social media posts, web links to original sources, and coded metadata.Source Platforms: Twitter (global), Weibo/WeChat (primarily China), Xiaohongshu (China), and Bilibili (video-sharing platform, primarily China).Use Case: This dataset is intended for the analysis of public discourse, the dissemination of scientific information, and user engagement patterns across different cultural contexts and social media platforms, particularly in relation to public health information.
r
Visualisations for China’s news media tweeting, competing with US source
researchdata.edu.au
commons.datacite.org
Updated May 9, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joyce Nip; Chao Sun (2018). Visualisations for China’s news media tweeting, competing with US source [Dataset]. http://doi.org/10.4227/11/5af26eb0e787d
Explore at:
Unique identifier
https://doi.org/10.4227/11/5af26eb0e787d
Dataset updated
May 9, 2018
Dataset provided by
The University of Sydney
Authors
Joyce Nip; Chao Sun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 2016 - Oct 24, 2017
Area covered
United States, China
Description
This dataset includes six figures used for the journal paper that paper examines China's recent initiative on international social media and assesses its effectiveness in counteracting Western dominance in international communication. The figures are interactive D3.JS visualisations written in javascript and displayed in html files. The intention of uploading this dataset is enabling the readers of the paper to be able to access this tool for better understanding of the methodology described in the paper.
Impacts of Generative AI on Social Capital in China - Survey Data
figshare.com
xlsx
Updated Aug 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yin Liu; ZAOZAO ZHANG; Yingkai Wu (2024). Impacts of Generative AI on Social Capital in China - Survey Data [Dataset]. http://doi.org/10.6084/m9.figshare.26830381.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26830381.v1
Dataset updated
Aug 26, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Yin Liu; ZAOZAO ZHANG; Yingkai Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
This dataset comprises responses from 1,053 online participants surveyed across China. Collected through social media and email distributions, it includes detailed queries about the usage and perceptions of generative AI, categorized into task-oriented and social-oriented applications. The dataset features demographic variables (age, gender, education, location) alongside indicators measuring both offline and online social capital. Analysis reveals that generative AI users generally possess higher social capital than non-users, with task-oriented usage enhancing offline social capital yet detracting from online interactions. Conversely, socially-oriented usage boosts social capital across both spheres. This data, rich with insights on the interplay between technology use and social structures, is pivotal for understanding technological impacts on societal dynamics and could guide future technology policy and integration strategies.
d
Replication Data for: Effects of local government social media use on...
dataone.org
dataverse.harvard.edu
+1more
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiang, Hanchen; Tang, Xiao (2024). Replication Data for: Effects of local government social media use on citizen compliance during a crisis [Dataset]. http://doi.org/10.7910/DVN/9SVDSC
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/9SVDSC
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
Jiang, Hanchen; Tang, Xiao
Description
Improving citizen compliance is a major goal of public administration, especially during crises. Although social media are widely used by government agencies across the globe, it is still unclear that whether the use of social media can help local governments improve citizen compliance especially during crises. Based on an original daily panel dataset of 189 cities in China during COVID-19, this study provides empirical evidence for the positive effect that crisis-related social media posts published by local government agencies has on citizen compliance. In addition, this effect is mediated by the topic of prevention measures in social media posts, and is stronger in cities with higher GDP per capita, better educated citizens and wider internet coverage. The findings imply that social media is an efficient and low-cost tool to assist local government agencies to achieve public administration objectives during crises, and its efficacy is largely dependent on regional socioeconomic status.
o
TapTap User Feedback Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). TapTap User Feedback Dataset [Dataset]. https://www.opendatabay.com/data/consumer/1dcc1820-93e8-4499-a0bc-09774d5b03ac
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
Area covered
Reviews & Ratings
Description
This dataset provides user reviews for mobile games, collected from TapTap, a popular mobile game community and distribution platform in China. Its primary purpose is to facilitate sentiment analysis of Chinese game reviews. The reviews are mostly in Chinese and cover the 1,000 most recent comments for 20 popular games up until April 5, 2025. Each entry includes the user's rating, the text content of the review, the number of likes the review received, the publication timestamp, the device model used (where available), the name of the game reviewed, and a sentiment label. User identifiers have been removed to protect privacy.

Columns

rating: User-provided star rating for the game, ranging from 1 to 5.

review_content: The text content of the user review, written in Chinese.

likes: The number of likes received by the review on TapTap.

publish_time: The date and time the review was published, formatted as YYYY-MM-DD HH:MM:SS. This column has been converted to datetime format.

device_model: The device model reported by the user when posting the review. This column may contain inconsistencies and 'unknown' values.

game_name: The name of the mobile game being reviewed. There are 20 distinct game names in the dataset.

sentiment: A binary sentiment label, where 0 indicates negative sentiment and 1 indicates positive sentiment. This label is directly converted from the star rating (1-2 stars typically map to 0, and 3-5 stars map to 1).

Distribution

The dataset is typically provided in CSV format. It contains reviews for 20 distinct mobile games. While the source mentions it covers the 1,000 most recent comments, other details indicate larger counts for specific fields, with ratings having around 39,592 unique entries and sentiment labels totalling approximately 39,985 entries across positive and negative categories. The number of likes varies, with a large majority in the 0-142.75 range. The dataset structure has user identifiers removed for privacy.

Usage

This dataset is ideal for conducting sentiment analysis on Chinese mobile game reviews. It can also be used for understanding user feedback trends, identifying common issues or praises within game reviews, and developing natural language processing models tailored to Chinese text.

Coverage

The dataset's geographic scope is China, as the data is collected from the TapTap platform, which is popular there. The time range for reviews spans from June 6, 2017, up to April 5, 2025, according to the latest comments. Device model information is included, though it contains inconsistencies and 'unknown' entries. No specific demographic information about the users is available.

License

CC-BY

Who Can Use It

This dataset is suitable for a variety of users, including: * Data scientists and machine learning engineers: For building and testing sentiment analysis models for Chinese text. * Game developers: To gain insights into player satisfaction, identify areas for improvement in their games, and understand market reception in China. * Market researchers: For analysing trends in the mobile gaming industry and understanding consumer behaviour in the Chinese market. * Academics and students: For research projects involving natural language processing, data analysis, and social media sentiment.

Dataset Name Suggestions

TapTap Chinese Mobile Game Reviews

Chinese Mobile Game User Sentiment Dataset

TapTap Game Review Analysis Data

Mobile Gaming Reviews (Chinese)

TapTap User Feedback Dataset

Attributes

Original Data Source: TapTap Mobile Game Reviews (Chinese)
H
Database of Chinese Elite Politics Networking
dataverse.harvard.edu
dataone.org
Updated Dec 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daqi (Reinhardt) Fang (2022). Database of Chinese Elite Politics Networking [Dataset]. http://doi.org/10.7910/DVN/ZN8VOZ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZN8VOZ
Dataset updated
Dec 2, 2022
Dataset provided by
Harvard Dataverse
Authors
Daqi (Reinhardt) Fang
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
China
Description
This is an Open-Access database that scholars could find the primary-hand source of the networking of China Politics. If you want to update or revise the data, please contact me (Reinhardt114514@outlook.com). Contributor: Daqi (Reinhardt) Fang, Student at Hangzhou Yungu School