Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China SPI: Pillar 4 Data Sources Score: Scale 0-100 data was reported at 41.575 NA in 2023. This stayed constant from the previous number of 41.575 NA for 2022. China SPI: Pillar 4 Data Sources Score: Scale 0-100 data is updated yearly, averaging 41.575 NA from Dec 2015 (Median) to 2023, with 9 observations. The data reached an all-time high of 47.492 NA in 2015 and a record low of 33.992 NA in 2016. China SPI: Pillar 4 Data Sources Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s China – Table CN.World Bank.WDI: Governance: Policy and Institutions. The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Techsalerator's Job Openings Data for China: A Comprehensive Resource for Employment Insights
Techsalerator's Job Openings Data for China offers a detailed and essential resource for businesses, job seekers, and labor market analysts. This dataset provides an in-depth view of job openings across various industries in China, collating information from numerous sources such as company websites, job boards, and recruitment agencies.
To access Techsalerator’s Job Openings Data for China, please contact info@techsalerator.com with your specific data requirements. We will provide a customized quote based on the data fields and records you need, with delivery available within 24 hours. Ongoing access options can also be discussed.
Techsalerator’s dataset serves as a valuable tool for tracking employment trends and job opportunities in China, empowering businesses, job seekers, and analysts to make informed decisions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MCGD_Data_V2.2 contains all the data that we have collected on locations in modern China, plus a number of locations outside of China that we encounter frequently in historical sources on China. All further updates will appear under the name "MCGD_Data" with a time stamp (e.g., MCGD_Data2023-06-21)
You can also have access to this dataset and all the datasets that the ENP-China makes available on GitLab: https://gitlab.com/enpchina/IndexesEnp
Altogether there are 464,970 entries. The data include the name of locations and their variants in Chinese, pinyin, and any recorded transliteration; the name of the province in Chinese and in pinyin; Province ID; the latitude and longitude; the Name ID and Location ID, and NameID_Legacy. The Name IDs all start with H followed by seven digits. This is the internal ID system of MCGD (the NameID_Legacy column records the Name IDs in their original format depending on the source). Locations IDs that start with "DH" are data points extracted from China Historical GIS (Harvard University); those that start with "D" are locations extracted from the data points in Geonames; those that have only digits (8 digits) are data points we have added from various map sources.
One of the main features of the MCGD Main Dataset is the systematic collection and compilation of place names from non-Chinese language historical sources. Locations were designated in transliteration systems that are hardly comprehensible today, which makes it very difficult to find the actual locations they correspond to. This dataset allows for the conversion from these obsolete transliterations to the current names and geocoordinates.
From June 2021 onward, we have adopted a different file naming system to keep track of versions. From MCGD_Data_V1 we have moved to MCGD_Data_V2. In June 2022, we introduced time stamps, which result in the following naming convention: MCGD_Data_YYYY.MM.DD.
UPDATES
MCGD_Data2025_02_28 includes a major change with the duplication of all the locations listed under Beijing, Shanghai, Tianjin, and Chongqing (北京, 上海, 天津, 重慶) and their listing under the name of the provinces to which they belonge origially before the creation of the four special municipalities after 1949. This is meant to facilitate the matching of data from historical sources. Each location has a unique NameID. Altogether there are 472,818 entries
MCGD_Data2025_02_27 inclues an update on locations extracted from Minguo zhengfu ge yuanhui keyuan yishang zhiyuanlu 國民政府各院部會科員以上職員錄 (Directory of staff members and above in the ministries and committees of the National Government). Nanjing: Guomin zhengfu wenguanchu yinzhuju 國民政府文官處印鑄局國民政府文官處印鑄局, 1944). We also made corrections in the Prov_Py and Prov_Zh columns as there were some misalignments between the pinyin name and the name in Chines characters. The file now includes 465,128 entries.
MCGD_Data2024_03_23 includes an update on locations in Taiwan from the Asia Directories. Altogether there are 465,603 entries (of which 187 place names without geocoordinates, labelled in the Lat Long columns as "Unknown").
MCGD_Data2023.12.22 contains all the data that we have collected on locations in China, whatever the period. Altogether there are 465,603 entries (of which 187 place names without geocoordinates, labelled in the Lat Long columns as "Unknown"). The dataset also includes locations outside of China for the purpose of matching such locations to the place names extracted from historical sources. For example, one may need to locate individuals born outside of China. Rather than maintaining two separate files, we made the decision to incorporate all the place names found in historical sources in the gazetteer. Such place names can easily be removed by selecting all the entries where the 'Province' data is missing.
Facebook
Twitterhttps://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
The Study of Data Center Water Consumption in China Report is Segmented by Source of Water Procurement (Potable Water, Non-Potable Water, Alternate Sources), by Data Center Type (Enterprise, Colocation, Cloud Service Providers), and by Data Center Size (Mega, Massive, Large, Medium, Small). The Market Sizes and Forecasts are Provided in Terms of Volume (Billion Liters).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Flow of Funds: Domestic Sector: Source: Social Transfers In Kind data was reported at 7,332.098 RMB bn in 2021. This records an increase from the previous number of 6,801.915 RMB bn for 2020. China Flow of Funds: Domestic Sector: Source: Social Transfers In Kind data is updated yearly, averaging 853.707 RMB bn from Dec 1992 (Median) to 2021, with 30 observations. The data reached an all-time high of 7,332.098 RMB bn in 2021 and a record low of 78.232 RMB bn in 1992. China Flow of Funds: Domestic Sector: Source: Social Transfers In Kind data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s National Accounts – Table CN.AD: Flow of Funds Accounts: Physical Transaction: Domestic Sector.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Chinese domestic database market, fueled by robust government initiatives promoting digitalization and a growing need for data security, is experiencing significant growth. While precise figures for market size and CAGR are not provided, observing global database market trends and considering China's substantial digital economy, a reasonable estimate would place the 2025 market size at approximately $5 billion USD, with a projected Compound Annual Growth Rate (CAGR) of 15-20% for the forecast period 2025-2033. This growth is driven by several key factors. Firstly, the increasing adoption of smart government initiatives and the digital transformation of various industries necessitates robust and reliable domestic database solutions. Secondly, escalating concerns over data sovereignty and national security are propelling demand for domestically developed databases to mitigate risks associated with reliance on foreign technologies. Furthermore, the expansion of cloud computing and the rise of open-source database solutions contribute to market diversification and broader adoption. Key market segments include applications like smart government affairs, information security, and industrial digitalization. The types of databases deployed are diverse, encompassing traditional, open-source, and cloud-based solutions. Leading players like Alibaba Ocean Base, Tencent Cloud Computing, and Huawei Causes DB are driving innovation and competition, while emerging companies are also contributing to the vibrant market ecosystem. While challenges like technological maturity and the competitive landscape with established international players exist, the strong government support, rising digitalization needs, and the focus on data security position the Chinese domestic database market for continued robust expansion in the coming years. The market's future growth depends on continued innovation, improvements in data security measures, and the successful integration of domestic solutions into crucial national infrastructure projects.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:
Facebook
TwitterProject Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Chinese Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Chinese language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Chinese people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
This fully labeled Chinese Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Chinese version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Chinese Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Alternative Data Market Size 2025-2029
The alternative data market size is valued to increase USD 60.32 billion, at a CAGR of 52.5% from 2024 to 2029. Increased availability and diversity of data sources will drive the alternative data market.
Major Market Trends & Insights
North America dominated the market and accounted for a 56% growth during the forecast period.
By Type - Credit and debit card transactions segment was valued at USD 228.40 billion in 2023
By End-user - BFSI segment accounted for the largest market revenue share in 2023
Market Size & Forecast
Market Opportunities: USD 6.00 million
Market Future Opportunities: USD 60318.00 million
CAGR from 2024 to 2029 : 52.5%
Market Summary
The market represents a dynamic and rapidly expanding landscape, driven by the increasing availability and diversity of data sources. With the rise of alternative data-driven investment strategies, businesses and investors are increasingly relying on non-traditional data to gain a competitive edge. Core technologies, such as machine learning and natural language processing, are transforming the way alternative data is collected, analyzed, and utilized. Despite its potential, the market faces challenges related to data quality and standardization. According to a recent study, alternative data accounts for only 10% of the total data used in financial services, yet 45% of firms surveyed reported issues with data quality.
Service types, including data providers, data aggregators, and data analytics firms, are addressing these challenges by offering solutions to ensure data accuracy and reliability. Regional mentions, such as North America and Europe, are leading the adoption of alternative data, with Europe projected to grow at a significant rate due to increasing regulatory support for alternative data usage. The market's continuous evolution is influenced by various factors, including technological advancements, changing regulations, and emerging trends in data usage.
What will be the Size of the Alternative Data Market during the forecast period?
Get Key Insights on Market Forecast (PDF) Request Free Sample
How is the Alternative Data Market Segmented ?
The alternative data industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
Type
Credit and debit card transactions
Social media
Mobile application usage
Web scrapped data
Others
End-user
BFSI
IT and telecommunication
Retail
Others
Geography
North America
US
Canada
Mexico
Europe
France
Germany
Italy
UK
APAC
China
India
Japan
Rest of World (ROW)
By Type Insights
The credit and debit card transactions segment is estimated to witness significant growth during the forecast period.
Alternative data derived from credit and debit card transactions plays a significant role in offering valuable insights for market analysts, financial institutions, and businesses. This data category is segmented into credit card and debit card transactions. Credit card transactions serve as a rich source of information on consumers' discretionary spending, revealing their luxury spending tendencies and credit management skills. Debit card transactions, on the other hand, shed light on essential spending habits, budgeting strategies, and daily expenses, providing insights into consumers' practical needs and lifestyle choices. Market analysts and financial institutions utilize this data to enhance their strategies and customer experiences.
Natural language processing (NLP) and sentiment analysis tools help extract valuable insights from this data. Anomaly detection systems enable the identification of unusual spending patterns, while data validation techniques ensure data accuracy. Risk management frameworks and hypothesis testing methods are employed to assess potential risks and opportunities. Data visualization dashboards and machine learning models facilitate data exploration and trend analysis. Data quality metrics and signal processing methods ensure data reliability and accuracy. Data governance policies and real-time data streams enable timely access to data. Time series forecasting, clustering techniques, and high-frequency data analysis provide insights into trends and patterns.
Model training datasets and model evaluation metrics are essential for model development and performance assessment. Data security protocols are crucial to protect sensitive financial information. Economic indicators and compliance regulations play a role in the context of this market. Unstructured data analysis, data cleansing pipelines, and statistical significance are essential for deriving meaningful insights from this data. New
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here we used remote sensing data from multiple sources (time-series of Landsat and Sentinel images) to map the impervious surface area (ISA) at five-year intervals from 1990 to 2015, and then converted the results into a standardized dataset of the built-up area for 433 Chinese cities with 300,000 inhabitants or more, which were listed in the United Nations (UN) World Urbanization Prospects (WUP) database (including Mainland China, Hong Kong, Macao and Taiwan). We employed a range of spectral indices to generate the 1990–2015 ISA maps in urban areas based on remotely sensed data acquired from multiple sources. In this process, various types of auxiliary data were used to create the desired products for urban areas through manual segmentation of peri-urban and rural areas together with reference to several freely available products of urban extent derived from ISA data using automated urban–rural segmentation methods. After that, following the well-established rules adopted by the UN, we carried out the conversion to the standardized built-up area products from the 1990–2015 ISA maps in urban areas, which conformed to the definition of urban agglomeration area (UAA). Finally, we implemented data postprocessing to guarantee the spatial accuracy and temporal consistency of the final product.The standardized urban built-up area dataset (SUBAD–China) introduced here is the first product using the same definition of UAA adopted by the WUP database for 433 county and higher-level cities in China. The comparisons made with contemporary data produced by the National Bureau of Statistics of China, the World Bank and UN-habitat indicate that our results have a high spatial accuracy and good temporal consistency and thus can be used to characterize the process of urban expansion in China.The SUBAD–China contains 2,598 vector files in shapefile format containing data for all China's cities listed in the WUP database that have different urban sizes and income levels with populations over 300,000. Attached with it, we also provided the distribution of validation points for the 1990–2010 ISA products of these 433 Chinese cities in shapefile format and the confusion matrices between classified data and reference data during different time periods as a Microsoft Excel Open XML Spreadsheet (XLSX) file.Furthermore, The standardized built-up area products for such cities will be consistently updated and refined to ensure the quality of their spatiotemporal coverage and accuracy. The production of this dataset together with the usage of population counts derived from the WUP database will close some of the data gaps in the calculation of SDG11.3.1 and benefit other downstream applications relevant to a combined analysis of the spatial and socio-economic domains in urban areas.
Facebook
Twitterhttps://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
The China Data Center Market Report is Segmented by Data Center Size (Large, Massive, Medium, Mega, and Small), Tier Type (Tier 1 and 2, Tier 3, and Tier 4), Data Center Type (Hyperscale/Self-Built, and Enterprise/Edge, and Colocation), End User (BFSI, IT and ITES, E-Commerce, Government, Manufacturing, Media and Entertainment, and More), and Hotspot. The Market Forecasts are Provided in Terms of IT Load Capacity (MW).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Understanding the food trading network is crucial for optimizing resource allocation, maintaining stability in the food system, and safeguarding food security. However, as one of the world's largest agricultural producers, there is a dearth of publicly available data sources pertaining to China's interprovincial physical food trade. Here, we developed a dataset of interprovincial physical food flows in mainland China for the period 2000-2022, using the trade gravity model with the integration of food supply, food demand, and transportation data. This dataset includes 15 key types of plant-based and animal-based food products and represents the longest time series covering the most extensive variety of food products to date on China's food trade patterns. The dataset reveals changes in dietary structures and trade patterns across regions within China since the 21st century. This work provides a methodological framework and dataset that could support studies on virtual resources flow, agri-environmental impact assessment, and food policy formulation across various food categories and regions.
Facebook
Twitterhttps://www.icpsr.umich.edu/web/ICPSR/studies/3012/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/3012/terms
The purpose of this project was to measure and estimate the distribution of personal income in both rural and urban areas of the People's Republic of China. The principal investigators based their definition of income on cash payments and on a broad range of additional components: payments in kind valued at market prices, agricultural output produced for self-consumption valued at market prices, the value of food and other direct subsidies, and the imputed value of housing services. The rural component of this collection consists of two data files, one in which the individual is the unit of analysis (Part 1) and a second in which the household is the unit of analysis (Part 2). Individual rural respondents reported on their employment status, level of education, Communist Party membership, type of employer (e.g., public, private, or foreign), type of economic sector in which they were employed, occupation, whether they held a second job, retirement status, monthly pension, monthly wage, and other sources of income. Demographic variables include relationship to householder, gender, age, and student status. Rural households reported extensively on the character of the household and residence. Information was elicited on type of terrain surrounding the house, geographic position, type of house, and availability of electricity. Also reported were sources of household income (e.g., farming, industry, government, rents, and interest), taxes paid, value of farm, total amount and type of cultivated land, financial assets and debts, quantity and value of various crops, amount of grain purchased or provided by a collective, use of chemical fertilizers, gasoline, and oil, quantity and value of agricultural machinery, and all household expenditures (e.g., food, fuel, medicine, education, transportation, and electricity). The urban component of this collection also consists of two data files, one in which the individual is the unit of analysis (Part 3) and a second in which the household is the unit of analysis (Part 4). Individual urban respondents reported on their economic status within the household, Communist Party membership, sex, age, nature of employment, and relationship to the household head. Information was collected on all types and sources of income from each member of the household whether working, nonworking, or retired, all revenue received by owners of private or individual enterprises, and all in-kind payments (e.g., food, durable goods, and nondurable goods). Urban households reported total income (including salaries, interest on savings and bonds, dividends, rent, leases, alimony, gifts, and boarding fees), all types and values of food subsidies received, and total debt. Information was also gathered on household accommodations and living conditions, including number of rooms, total living area in square meters, availability and cost of running water, sanitary facilities, heating and air-conditioning equipment, kitchen availability, location of residence, ownership of home, and availability of electricity and telephone. Households reported on all their expenditures including amounts spent on food items such as wheat, rice, edible oils, pork, beef and mutton, poultry, fish and seafood, sugar, and vegetables by means of coupons in state-owned stores and at free market prices. Information was also collected on rents paid by the households, fuel available, type of transportation used, and availability and use of medical and child care. The Chinese Household Income Project collected data in 1988, 1995, 2002, and 2007. ICPSR holds data from the first three collections, and information about these can be found on the series description page. Data collected in 2007 are available through the China Institute for Income Distribution.
Facebook
TwitterThe fourth edition of the Global Findex offers a lens into how people accessed and used financial services during the COVID-19 pandemic, when mobility restrictions and health policies drove increased demand for digital services of all kinds.
The Global Findex is the world's most comprehensive database on financial inclusion. It is also the only global demand-side data source allowing for global and regional cross-country analysis to provide a rigorous and multidimensional picture of how adults save, borrow, make payments, and manage financial risks. Global Findex 2021 data were collected from national representative surveys of about 128,000 adults in more than 120 economies. The latest edition follows the 2011, 2014, and 2017 editions, and it includes a number of new series measuring financial health and resilience and contains more granular data on digital payment adoption, including merchant and government payments.
The Global Findex is an indispensable resource for financial service practitioners, policy makers, researchers, and development professionals.
Tibet was excluded from the sample. The excluded areas represent less than 1 percent of the total population of China.
Individual
Observation data/ratings [obs]
In most developing economies, Global Findex data have traditionally been collected through face-to-face interviews. Surveys are conducted face-to-face in economies where telephone coverage represents less than 80 percent of the population or where in-person surveying is the customary methodology. However, because of ongoing COVID-19 related mobility restrictions, face-to-face interviewing was not possible in some of these economies in 2021. Phone-based surveys were therefore conducted in 67 economies that had been surveyed face-to-face in 2017. These 67 economies were selected for inclusion based on population size, phone penetration rate, COVID-19 infection rates, and the feasibility of executing phone-based methods where Gallup would otherwise conduct face-to-face data collection, while complying with all government-issued guidance throughout the interviewing process. Gallup takes both mobile phone and landline ownership into consideration. According to Gallup World Poll 2019 data, when face-to-face surveys were last carried out in these economies, at least 80 percent of adults in almost all of them reported mobile phone ownership. All samples are probability-based and nationally representative of the resident adult population. Phone surveys were not a viable option in 17 economies that had been part of previous Global Findex surveys, however, because of low mobile phone ownership and surveying restrictions. Data for these economies will be collected in 2022 and released in 2023.
In economies where face-to-face surveys are conducted, the first stage of sampling is the identification of primary sampling units. These units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day and, where possible, on different days. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households. Each eligible household member is listed, and the hand-held survey device randomly selects the household member to be interviewed. For paper surveys, the Kish grid method is used to select the respondent. In economies where cultural restrictions dictate gender matching, respondents are randomly selected from among all eligible adults of the interviewer's gender.
In traditionally phone-based economies, respondent selection follows the same procedure as in previous years, using random digit dialing or a nationally representative list of phone numbers. In most economies where mobile phone and landline penetration is high, a dual sampling frame is used.
The same respondent selection procedure is applied to the new phone-based economies. Dual frame (landline and mobile phone) random digital dialing is used where landline presence and use are 20 percent or higher based on historical Gallup estimates. Mobile phone random digital dialing is used in economies with limited to no landline presence (less than 20 percent).
For landline respondents in economies where mobile phone or landline penetration is 80 percent or higher, random selection of respondents is achieved by using either the latest birthday or household enumeration method. For mobile phone respondents in these economies or in economies where mobile phone or landline penetration is less than 80 percent, no further selection is performed. At least three attempts are made to reach a person in each household, spread over different days and times of day.
Sample size for China is 3500.
Mobile telephone
Questionnaires are available on the website.
Estimates of standard errors (which account for sampling error) vary by country and indicator. For country-specific margins of error, please refer to the Methodology section and corresponding table in Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar. 2022. The Global Findex Database 2021: Financial Inclusion, Digital Payments, and Resilience in the Age of COVID-19. Washington, DC: World Bank.
Facebook
TwitterThis dataset covers relevant data on consumption driven water consumption, external contributions and transmission pathways of water pressure, and future changes in implicit water use in the water-energy-food (WEF) system of various provinces, covering 31 provinces across the country. The main sources of data are the China Environmental Statistics Yearbook, Water Resources Bulletin, China Energy Statistics Yearbook, and the World Resources Institute. The dataset mainly includes three types of data: (1) Embodied blue water use in WEF nexus by province in China; (2) External contribution of water stress and energy- food competition for water in WEF nexus by province in China; (3) Embodied water use statistics for high and low water pressure pathways in the WEF sector by province in China.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the China median household income by race. The dataset can be utilized to understand the racial distribution of China income.
The dataset will have the following datasets when applicable
Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Explore our comprehensive data analysis and visual representations for a deeper understanding of China median household income by race. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensively grasping the high spatiotemporal dynamic information of industrial heat sources (IHS) in China is of great significance for the green, high-quality and sustainable development of industry under the background of "dual carbon". At present, there is still a lack of dynamic data on industrial heat sources for large regions, long time series, and high spatiotemporal, and measures such as structural adjustment and capacity reduction in China have not been effectively tracked and monitored in space. This article utilizes a long time sequence of 375m NPP VIIRS(United States Suomi National Polar-orbiting Partnership, Visible Infrared Imaging Radiometer Suite) Active fire/hotspot data (ACF), based on an improved Kmeans industrial heat source identification method, combined with POI topology analysis and high-resolution remote sensing image features of different types of factories and mines, is used to identify and classify industrial heat sources in China from 2021 to 2023. A dataset of industrial heat sources in China from 2021 to 2023 including type information in vector format is first constructed and public free. The dynamic remote sensing monitoring results of industrial heat sources in China from 2021 to 2023 provide independent scientific basis for China to actively respond to the upgrading of the "structural adjustment and capacity reduction" industrial model, domestic and international carbon tax trading, and improvement of atmospheric environment and other sustainable development processes. The results show that extending the time span to 2012-2023, combined with POI topology analysis based industrial heat source category recognition, can effectively reveal the spatiotemporal evolution laws of different types of industrial heat sources during the critical period of industrial transformation and upgrading; Based on the improved Kmeans industrial heat source recognition method, while ensuring recognition accuracy (98.14%), the number, accuracy, average particle size, and spatial coverage of industrial heat source recognition have been effectively improved; The dataset includes 20 characteristic parameters such as factory and mine locations, annual operating conditions, and categories, which fully record the radiation flux characteristics and production activity intensity of different types of industrial heat sources, providing richer data support for industrial carbon emission estimation and regional economic development assessment.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Chinese Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the Chinese language, advancing the field of artificial intelligence.
This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in Chinese. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Chinese people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.
This fully labeled Chinese Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
Both the question and answers in Chinese are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Chinese Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview: This dataset provides a bilingual list of academic disciplines, extracted from various sources in Chinese and translated into English. It includes a broad range of terms beyond strictly defined academic disciplines, encompassing subfields, interdisciplinary categories, and related academic domains. The dataset is designed to be reusable for various applications, such as term matching across corpora and extracting relevant terms from new datasets.
Structure: The dataset consists of multiple columns, each representing different levels of classification for academic disciplines. The primary columns include:
Discipline (Chinese Simplified & Traditional): The name of the discipline in Chinese.
Discipline (English): The corresponding English translation of the discipline.
Discipline Level 2 (Chinese & English): A more specific categorization within a broader academic category.
Discipline Level 1 (Chinese & English): A higher-level classification grouping multiple related disciplines.
Discipline Level 0 (Chinese & English): The broadest classification, representing major academic fields.
Level1_code: A numerical or coded identifier for Level 1 disciplines, which may be useful for structured data processing.
Purpose & Applications:
Term Matching: The dataset can be used to match extracted terms from other corpora, ensuring consistency across multilingual sources.
Hierarchical Classification: The multi-level structure allows users to analyze disciplines at different granularities.
Corpus Analysis & Text Mining: The dataset facilitates term extraction and standardization in computational text analysis projects.
Cross-Linguistic Comparisons: Researchers can use the bilingual nature of the dataset to study the relationship between Chinese and English academic terminologies.
Potential Use Cases:
Automated classification of academic articles based on discipline.
Developing bilingual glossaries for research institutions.
Improving machine learning models for academic domain recognition.
Data Quality Considerations:
Some terms may appear at multiple levels, reflecting differences in classification across sources.
The dataset structure should be checked for delimiter consistency before processing in automated systems.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China SPI: Pillar 4 Data Sources Score: Scale 0-100 data was reported at 41.575 NA in 2023. This stayed constant from the previous number of 41.575 NA for 2022. China SPI: Pillar 4 Data Sources Score: Scale 0-100 data is updated yearly, averaging 41.575 NA from Dec 2015 (Median) to 2023, with 9 observations. The data reached an all-time high of 47.492 NA in 2015 and a record low of 33.992 NA in 2016. China SPI: Pillar 4 Data Sources Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s China – Table CN.World Bank.WDI: Governance: Policy and Institutions. The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;