95 datasets found
  1. C

    China CN: SPI: Pillar 4 Data Sources Score: Scale 0-100

    • ceicdata.com
    Updated Apr 4, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2021). China CN: SPI: Pillar 4 Data Sources Score: Scale 0-100 [Dataset]. https://www.ceicdata.com/en/china/governance-policy-and-institutions/cn-spi-pillar-4-data-sources-score-scale-0100
    Explore at:
    Dataset updated
    Apr 4, 2021
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2016 - Dec 1, 2019
    Area covered
    China
    Variables measured
    Money Market Rate
    Description

    China SPI: Pillar 4 Data Sources Score: Scale 0-100 data was reported at 41.575 NA in 2023. This stayed constant from the previous number of 41.575 NA for 2022. China SPI: Pillar 4 Data Sources Score: Scale 0-100 data is updated yearly, averaging 41.575 NA from Dec 2015 (Median) to 2023, with 9 observations. The data reached an all-time high of 47.492 NA in 2015 and a record low of 33.992 NA in 2016. China SPI: Pillar 4 Data Sources Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s China – Table CN.World Bank.WDI: Governance: Policy and Institutions. The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;

  2. Job Posting Data in China

    • kaggle.com
    zip
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techsalerator (2024). Job Posting Data in China [Dataset]. https://www.kaggle.com/datasets/techsalerator/job-posting-data-in-china
    Explore at:
    zip(12790179 bytes)Available download formats
    Dataset updated
    Sep 13, 2024
    Authors
    Techsalerator
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    China
    Description

    Techsalerator's Job Openings Data for China: A Comprehensive Resource for Employment Insights

    Techsalerator's Job Openings Data for China offers a detailed and essential resource for businesses, job seekers, and labor market analysts. This dataset provides an in-depth view of job openings across various industries in China, collating information from numerous sources such as company websites, job boards, and recruitment agencies.

    Key Data Fields

    • Job Posting Date: Captures the listing date for each job opening, keeping job seekers and HR professionals up-to-date with the latest opportunities and market trends.
    • Job Title: Details the specific role being advertised, helping categorize and filter job openings by industry and career focus.
    • Company Name: Lists the hiring organizations, enabling job seekers to focus their applications and helping businesses monitor competitors and industry trends.
    • Job Location: Specifies the geographic location of the job within China, aiding job seekers in finding regional opportunities and assisting employers in evaluating regional labor markets.
    • Job Description: Provides comprehensive information about the responsibilities, qualifications, and skills required, offering clarity to both candidates and recruiters.

    Top 5 Job Categories in China

    1. Information Technology (IT): A booming sector with high demand for software developers, data scientists, and cybersecurity experts due to the rapid growth of China's digital economy.
    2. Manufacturing: Significant demand for engineers, production managers, and skilled laborers in one of the world’s largest manufacturing hubs.
    3. Finance and Banking: High demand for financial analysts, investment managers, and compliance officers as China’s financial sector continues to expand.
    4. Healthcare: Roles for doctors, nurses, and healthcare administrators driven by the increasing demand for healthcare services due to population growth and aging.
    5. E-Commerce and Retail: Opportunities for logistics managers, supply chain analysts, and digital marketing specialists, reflecting China's leadership in the global e-commerce market.

    Top 5 Employers in China

    1. Alibaba Group: A leading e-commerce company with frequent openings in logistics, IT, marketing, and management roles.
    2. Tencent: A technology giant offering positions in software development, gaming, and cloud computing.
    3. China National Petroleum Corporation (CNPC): Major employer in the energy sector with roles in engineering, management, and technical services.
    4. China Construction Bank: One of the largest banks in China, regularly hiring in areas like banking operations, financial analysis, and customer service.
    5. Huawei Technologies: A global telecommunications company offering roles in R&D, engineering, sales, and project management.

    Accessing Techsalerator’s Data

    To access Techsalerator’s Job Openings Data for China, please contact info@techsalerator.com with your specific data requirements. We will provide a customized quote based on the data fields and records you need, with delivery available within 24 hours. Ongoing access options can also be discussed.

    Included Data Fields

    • Job Posting Date
    • Job Title
    • Company Name
    • Job Location
    • Job Description
    • Application Deadline
    • Job Type (Full-time, Part-time, Contract)
    • Salary Range
    • Required Qualifications
    • Contact Information

    Techsalerator’s dataset serves as a valuable tool for tracking employment trends and job opportunities in China, empowering businesses, job seekers, and analysts to make informed decisions.

  3. Z

    Modern China Geospatial Database - Main Dataset

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Feb 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Henriot (2025). Modern China Geospatial Database - Main Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5735393
    Explore at:
    Dataset updated
    Feb 28, 2025
    Dataset provided by
    Aix-Marseille University
    Authors
    Christian Henriot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    MCGD_Data_V2.2 contains all the data that we have collected on locations in modern China, plus a number of locations outside of China that we encounter frequently in historical sources on China. All further updates will appear under the name "MCGD_Data" with a time stamp (e.g., MCGD_Data2023-06-21)

    You can also have access to this dataset and all the datasets that the ENP-China makes available on GitLab: https://gitlab.com/enpchina/IndexesEnp

    Altogether there are 464,970 entries. The data include the name of locations and their variants in Chinese, pinyin, and any recorded transliteration; the name of the province in Chinese and in pinyin; Province ID; the latitude and longitude; the Name ID and Location ID, and NameID_Legacy. The Name IDs all start with H followed by seven digits. This is the internal ID system of MCGD (the NameID_Legacy column records the Name IDs in their original format depending on the source). Locations IDs that start with "DH" are data points extracted from China Historical GIS (Harvard University); those that start with "D" are locations extracted from the data points in Geonames; those that have only digits (8 digits) are data points we have added from various map sources.

    One of the main features of the MCGD Main Dataset is the systematic collection and compilation of place names from non-Chinese language historical sources. Locations were designated in transliteration systems that are hardly comprehensible today, which makes it very difficult to find the actual locations they correspond to. This dataset allows for the conversion from these obsolete transliterations to the current names and geocoordinates.

    From June 2021 onward, we have adopted a different file naming system to keep track of versions. From MCGD_Data_V1 we have moved to MCGD_Data_V2. In June 2022, we introduced time stamps, which result in the following naming convention: MCGD_Data_YYYY.MM.DD.

    UPDATES

    MCGD_Data2025_02_28 includes a major change with the duplication of all the locations listed under Beijing, Shanghai, Tianjin, and Chongqing (北京, 上海, 天津, 重慶) and their listing under the name of the provinces to which they belonge origially before the creation of the four special municipalities after 1949. This is meant to facilitate the matching of data from historical sources. Each location has a unique NameID. Altogether there are 472,818 entries

    MCGD_Data2025_02_27 inclues an update on locations extracted from Minguo zhengfu ge yuanhui keyuan yishang zhiyuanlu 國民政府各院部會科員以上職員錄 (Directory of staff members and above in the ministries and committees of the National Government). Nanjing: Guomin zhengfu wenguanchu yinzhuju 國民政府文官處印鑄局國民政府文官處印鑄局, 1944). We also made corrections in the Prov_Py and Prov_Zh columns as there were some misalignments between the pinyin name and the name in Chines characters. The file now includes 465,128 entries.

    MCGD_Data2024_03_23 includes an update on locations in Taiwan from the Asia Directories. Altogether there are 465,603 entries (of which 187 place names without geocoordinates, labelled in the Lat Long columns as "Unknown").

    MCGD_Data2023.12.22 contains all the data that we have collected on locations in China, whatever the period. Altogether there are 465,603 entries (of which 187 place names without geocoordinates, labelled in the Lat Long columns as "Unknown"). The dataset also includes locations outside of China for the purpose of matching such locations to the place names extracted from historical sources. For example, one may need to locate individuals born outside of China. Rather than maintaining two separate files, we made the decision to incorporate all the place names found in historical sources in the gazetteer. Such place names can easily be removed by selecting all the entries where the 'Province' data is missing.

  4. China Data Center Water Consumption Market Size & Share Analysis - Industry...

    • mordorintelligence.com
    pdf,excel,csv,ppt
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mordor Intelligence (2025). China Data Center Water Consumption Market Size & Share Analysis - Industry Research Report - Growth Trends 2030 [Dataset]. https://www.mordorintelligence.com/industry-reports/study-of-data-center-water-consumption-in-china
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Mordor Intelligence
    License

    https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy

    Time period covered
    2021 - 2030
    Area covered
    China
    Description

    The Study of Data Center Water Consumption in China Report is Segmented by Source of Water Procurement (Potable Water, Non-Potable Water, Alternate Sources), by Data Center Type (Enterprise, Colocation, Cloud Service Providers), and by Data Center Size (Mega, Massive, Large, Medium, Small). The Market Sizes and Forecasts are Provided in Terms of Volume (Billion Liters).

  5. C

    China CN: Flow of Funds: Domestic Sector: Source: Social Transfers In Kind

    • ceicdata.com
    Updated Dec 15, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2020). China CN: Flow of Funds: Domestic Sector: Source: Social Transfers In Kind [Dataset]. https://www.ceicdata.com/en/china/flow-of-funds-accounts-physical-transaction-domestic-sector/cn-flow-of-funds-domestic-sector-source-social-transfers-in-kind
    Explore at:
    Dataset updated
    Dec 15, 2020
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2010 - Dec 1, 2021
    Area covered
    China
    Variables measured
    Flow of Fund Account
    Description

    China Flow of Funds: Domestic Sector: Source: Social Transfers In Kind data was reported at 7,332.098 RMB bn in 2021. This records an increase from the previous number of 6,801.915 RMB bn for 2020. China Flow of Funds: Domestic Sector: Source: Social Transfers In Kind data is updated yearly, averaging 853.707 RMB bn from Dec 1992 (Median) to 2021, with 30 observations. The data reached an all-time high of 7,332.098 RMB bn in 2021 and a record low of 78.232 RMB bn in 1992. China Flow of Funds: Domestic Sector: Source: Social Transfers In Kind data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s National Accounts – Table CN.AD: Flow of Funds Accounts: Physical Transaction: Domestic Sector.

  6. C

    Chinese Domestic Databases Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Chinese Domestic Databases Report [Dataset]. https://www.datainsightsmarket.com/reports/chinese-domestic-databases-1964334
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Apr 25, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global, China
    Variables measured
    Market Size
    Description

    The Chinese domestic database market, fueled by robust government initiatives promoting digitalization and a growing need for data security, is experiencing significant growth. While precise figures for market size and CAGR are not provided, observing global database market trends and considering China's substantial digital economy, a reasonable estimate would place the 2025 market size at approximately $5 billion USD, with a projected Compound Annual Growth Rate (CAGR) of 15-20% for the forecast period 2025-2033. This growth is driven by several key factors. Firstly, the increasing adoption of smart government initiatives and the digital transformation of various industries necessitates robust and reliable domestic database solutions. Secondly, escalating concerns over data sovereignty and national security are propelling demand for domestically developed databases to mitigate risks associated with reliance on foreign technologies. Furthermore, the expansion of cloud computing and the rise of open-source database solutions contribute to market diversification and broader adoption. Key market segments include applications like smart government affairs, information security, and industrial digitalization. The types of databases deployed are diverse, encompassing traditional, open-source, and cloud-based solutions. Leading players like Alibaba Ocean Base, Tencent Cloud Computing, and Huawei Causes DB are driving innovation and competition, while emerging companies are also contributing to the vibrant market ecosystem. While challenges like technological maturity and the competitive landscape with established international players exist, the strong government support, rising digitalization needs, and the focus on data security position the Chinese domestic database market for continued robust expansion in the coming years. The market's future growth depends on continued innovation, improvements in data security measures, and the successful integration of domestic solutions into crucial national infrastructure projects.

  7. z

    Counts of Dengue reported in CHINA: 1979-2010

    • zenodo.org
    • tycho.pitt.edu
    • +1more
    json, xml, zip
    Updated Jun 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Counts of Dengue reported in CHINA: 1979-2010 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/cn.38362002
    Explore at:
    json, zip, xmlAvailable download formats
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    Project Tycho
    Authors
    Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1979 - Dec 31, 2010
    Area covered
    China
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

    • Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.
    • Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  8. p

    Counts of Dengue without warning signs reported in CHINA: 1979-2009

    • tycho.pitt.edu
    • data.niaid.nih.gov
    Updated Apr 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Dengue without warning signs reported in CHINA: 1979-2009 [Dataset]. https://www.tycho.pitt.edu/dataset/CN.722862003
    Explore at:
    Dataset updated
    Apr 1, 2018
    Dataset provided by
    Project Tycho, University of Pittsburgh
    Authors
    Willem G Van Panhuis; Anne L Cross; Donald S Burke
    Time period covered
    1979 - 2009
    Area covered
    China
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  9. F

    Chinese Chain of Thought Prompt & Response Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Chinese Chain of Thought Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/chinese-chain-of-thought-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Welcome to the Chinese Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.

    Dataset Content

    This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Chinese language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.

    Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Chinese people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.

    Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.

    Prompt Diversity

    To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.

    These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.

    Response Formats

    To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.

    These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.

    Data Format and Annotation Details

    This fully labeled Chinese Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.

    Quality and Accuracy

    Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.

    The Chinese version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.

    Continuous Updates and Customization

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.

    License

    The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Chinese Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.

  10. Alternative Data Market Analysis North America, Europe, APAC, South America,...

    • technavio.com
    pdf
    Updated Jan 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Alternative Data Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, Canada, China, UK, Mexico, Germany, Japan, India, Italy, France - Size and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/alternative-data-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 17, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    Canada, United States
    Description

    Snapshot img

    Alternative Data Market Size 2025-2029

    The alternative data market size is valued to increase USD 60.32 billion, at a CAGR of 52.5% from 2024 to 2029. Increased availability and diversity of data sources will drive the alternative data market.

    Major Market Trends & Insights

    North America dominated the market and accounted for a 56% growth during the forecast period.
    By Type - Credit and debit card transactions segment was valued at USD 228.40 billion in 2023
    By End-user - BFSI segment accounted for the largest market revenue share in 2023
    

    Market Size & Forecast

    Market Opportunities: USD 6.00 million
    Market Future Opportunities: USD 60318.00 million
    CAGR from 2024 to 2029 : 52.5%
    

    Market Summary

    The market represents a dynamic and rapidly expanding landscape, driven by the increasing availability and diversity of data sources. With the rise of alternative data-driven investment strategies, businesses and investors are increasingly relying on non-traditional data to gain a competitive edge. Core technologies, such as machine learning and natural language processing, are transforming the way alternative data is collected, analyzed, and utilized. Despite its potential, the market faces challenges related to data quality and standardization. According to a recent study, alternative data accounts for only 10% of the total data used in financial services, yet 45% of firms surveyed reported issues with data quality.
    Service types, including data providers, data aggregators, and data analytics firms, are addressing these challenges by offering solutions to ensure data accuracy and reliability. Regional mentions, such as North America and Europe, are leading the adoption of alternative data, with Europe projected to grow at a significant rate due to increasing regulatory support for alternative data usage. The market's continuous evolution is influenced by various factors, including technological advancements, changing regulations, and emerging trends in data usage.
    

    What will be the Size of the Alternative Data Market during the forecast period?

    Get Key Insights on Market Forecast (PDF) Request Free Sample

    How is the Alternative Data Market Segmented ?

    The alternative data industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    Type
    
      Credit and debit card transactions
      Social media
      Mobile application usage
      Web scrapped data
      Others
    
    
    End-user
    
      BFSI
      IT and telecommunication
      Retail
      Others
    
    
    Geography
    
      North America
    
        US
        Canada
        Mexico
    
    
      Europe
    
        France
        Germany
        Italy
        UK
    
    
      APAC
    
        China
        India
        Japan
    
    
      Rest of World (ROW)
    

    By Type Insights

    The credit and debit card transactions segment is estimated to witness significant growth during the forecast period.

    Alternative data derived from credit and debit card transactions plays a significant role in offering valuable insights for market analysts, financial institutions, and businesses. This data category is segmented into credit card and debit card transactions. Credit card transactions serve as a rich source of information on consumers' discretionary spending, revealing their luxury spending tendencies and credit management skills. Debit card transactions, on the other hand, shed light on essential spending habits, budgeting strategies, and daily expenses, providing insights into consumers' practical needs and lifestyle choices. Market analysts and financial institutions utilize this data to enhance their strategies and customer experiences.

    Natural language processing (NLP) and sentiment analysis tools help extract valuable insights from this data. Anomaly detection systems enable the identification of unusual spending patterns, while data validation techniques ensure data accuracy. Risk management frameworks and hypothesis testing methods are employed to assess potential risks and opportunities. Data visualization dashboards and machine learning models facilitate data exploration and trend analysis. Data quality metrics and signal processing methods ensure data reliability and accuracy. Data governance policies and real-time data streams enable timely access to data. Time series forecasting, clustering techniques, and high-frequency data analysis provide insights into trends and patterns.

    Model training datasets and model evaluation metrics are essential for model development and performance assessment. Data security protocols are crucial to protect sensitive financial information. Economic indicators and compliance regulations play a role in the context of this market. Unstructured data analysis, data cleansing pipelines, and statistical significance are essential for deriving meaningful insights from this data. New

  11. S

    Data from: A standardized dataset of built-up areas of China’s cities with...

    • scidb.cn
    Updated Jul 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiang Huiping; Sun Zhongchang; Guo Huadong; Du Wenjie; Xing Qiang; Cai Guoyin (2021). A standardized dataset of built-up areas of China’s cities with populations over 300,000 for the period 1990–2015 [Dataset]. http://doi.org/10.11922/sciencedb.j00076.00004
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 7, 2021
    Dataset provided by
    Science Data Bank
    Authors
    Jiang Huiping; Sun Zhongchang; Guo Huadong; Du Wenjie; Xing Qiang; Cai Guoyin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    Here we used remote sensing data from multiple sources (time-series of Landsat and Sentinel images) to map the impervious surface area (ISA) at five-year intervals from 1990 to 2015, and then converted the results into a standardized dataset of the built-up area for 433 Chinese cities with 300,000 inhabitants or more, which were listed in the United Nations (UN) World Urbanization Prospects (WUP) database (including Mainland China, Hong Kong, Macao and Taiwan). We employed a range of spectral indices to generate the 1990–2015 ISA maps in urban areas based on remotely sensed data acquired from multiple sources. In this process, various types of auxiliary data were used to create the desired products for urban areas through manual segmentation of peri-urban and rural areas together with reference to several freely available products of urban extent derived from ISA data using automated urban–rural segmentation methods. After that, following the well-established rules adopted by the UN, we carried out the conversion to the standardized built-up area products from the 1990–2015 ISA maps in urban areas, which conformed to the definition of urban agglomeration area (UAA). Finally, we implemented data postprocessing to guarantee the spatial accuracy and temporal consistency of the final product.The standardized urban built-up area dataset (SUBAD–China) introduced here is the first product using the same definition of UAA adopted by the WUP database for 433 county and higher-level cities in China. The comparisons made with contemporary data produced by the National Bureau of Statistics of China, the World Bank and UN-habitat indicate that our results have a high spatial accuracy and good temporal consistency and thus can be used to characterize the process of urban expansion in China.The SUBAD–China contains 2,598 vector files in shapefile format containing data for all China's cities listed in the WUP database that have different urban sizes and income levels with populations over 300,000. Attached with it, we also provided the distribution of validation points for the 1990–2010 ISA products of these 433 Chinese cities in shapefile format and the confusion matrices between classified data and reference data during different time periods as a Microsoft Excel Open XML Spreadsheet (XLSX) file.Furthermore, The standardized built-up area products for such cities will be consistently updated and refined to ensure the quality of their spatiotemporal coverage and accuracy. The production of this dataset together with the usage of population counts derived from the WUP database will close some of the data gaps in the calculation of SDG11.3.1 and benefit other downstream applications relevant to a combined analysis of the spatial and socio-economic domains in urban areas.

  12. China Data Center Market Size & Share Analysis - Industry Research Report -...

    • mordorintelligence.com
    pdf,excel,csv,ppt
    Updated Oct 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mordor Intelligence (2025). China Data Center Market Size & Share Analysis - Industry Research Report - Growth Trends [Dataset]. https://www.mordorintelligence.com/industry-reports/china-internet-data-center-market
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Oct 13, 2025
    Dataset authored and provided by
    Mordor Intelligence
    License

    https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy

    Time period covered
    2019 - 2030
    Area covered
    China
    Description

    The China Data Center Market Report is Segmented by Data Center Size (Large, Massive, Medium, Mega, and Small), Tier Type (Tier 1 and 2, Tier 3, and Tier 4), Data Center Type (Hyperscale/Self-Built, and Enterprise/Edge, and Colocation), End User (BFSI, IT and ITES, E-Commerce, Government, Manufacturing, Media and Entertainment, and More), and Hotspot. The Market Forecasts are Provided in Terms of IT Load Capacity (MW).

  13. Data from: A dataset of interprovincial food trade flows in China

    • figshare.com
    zip
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lan Luo; Zhencheng Xing; Yifan Liu; Xiang Liu; Lingling Jiang; Yi Peng; Haibo Zhang; Haikun Wang (2025). A dataset of interprovincial food trade flows in China [Dataset]. http://doi.org/10.6084/m9.figshare.28013150.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 28, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Lan Luo; Zhencheng Xing; Yifan Liu; Xiang Liu; Lingling Jiang; Yi Peng; Haibo Zhang; Haikun Wang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    China
    Description

    Understanding the food trading network is crucial for optimizing resource allocation, maintaining stability in the food system, and safeguarding food security. However, as one of the world's largest agricultural producers, there is a dearth of publicly available data sources pertaining to China's interprovincial physical food trade. Here, we developed a dataset of interprovincial physical food flows in mainland China for the period 2000-2022, using the trade gravity model with the integration of food supply, food demand, and transportation data. This dataset includes 15 key types of plant-based and animal-based food products and represents the longest time series covering the most extensive variety of food products to date on China's food trade patterns. The dataset reveals changes in dietary structures and trade patterns across regions within China since the 21st century. This work provides a methodological framework and dataset that could support studies on virtual resources flow, agri-environmental impact assessment, and food policy formulation across various food categories and regions.

  14. Data from: Chinese Household Income Project, 1995

    • icpsr.umich.edu
    ascii, delimited, sas +2
    Updated Jul 28, 2010
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Riskin, Carl; Renwei, Zhao; Shi, Li (2010). Chinese Household Income Project, 1995 [Dataset]. http://doi.org/10.3886/ICPSR03012.v2
    Explore at:
    spss, stata, sas, delimited, asciiAvailable download formats
    Dataset updated
    Jul 28, 2010
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    Riskin, Carl; Renwei, Zhao; Shi, Li
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/3012/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/3012/terms

    Time period covered
    1995
    Area covered
    China (Peoples Republic)
    Description

    The purpose of this project was to measure and estimate the distribution of personal income in both rural and urban areas of the People's Republic of China. The principal investigators based their definition of income on cash payments and on a broad range of additional components: payments in kind valued at market prices, agricultural output produced for self-consumption valued at market prices, the value of food and other direct subsidies, and the imputed value of housing services. The rural component of this collection consists of two data files, one in which the individual is the unit of analysis (Part 1) and a second in which the household is the unit of analysis (Part 2). Individual rural respondents reported on their employment status, level of education, Communist Party membership, type of employer (e.g., public, private, or foreign), type of economic sector in which they were employed, occupation, whether they held a second job, retirement status, monthly pension, monthly wage, and other sources of income. Demographic variables include relationship to householder, gender, age, and student status. Rural households reported extensively on the character of the household and residence. Information was elicited on type of terrain surrounding the house, geographic position, type of house, and availability of electricity. Also reported were sources of household income (e.g., farming, industry, government, rents, and interest), taxes paid, value of farm, total amount and type of cultivated land, financial assets and debts, quantity and value of various crops, amount of grain purchased or provided by a collective, use of chemical fertilizers, gasoline, and oil, quantity and value of agricultural machinery, and all household expenditures (e.g., food, fuel, medicine, education, transportation, and electricity). The urban component of this collection also consists of two data files, one in which the individual is the unit of analysis (Part 3) and a second in which the household is the unit of analysis (Part 4). Individual urban respondents reported on their economic status within the household, Communist Party membership, sex, age, nature of employment, and relationship to the household head. Information was collected on all types and sources of income from each member of the household whether working, nonworking, or retired, all revenue received by owners of private or individual enterprises, and all in-kind payments (e.g., food, durable goods, and nondurable goods). Urban households reported total income (including salaries, interest on savings and bonds, dividends, rent, leases, alimony, gifts, and boarding fees), all types and values of food subsidies received, and total debt. Information was also gathered on household accommodations and living conditions, including number of rooms, total living area in square meters, availability and cost of running water, sanitary facilities, heating and air-conditioning equipment, kitchen availability, location of residence, ownership of home, and availability of electricity and telephone. Households reported on all their expenditures including amounts spent on food items such as wheat, rice, edible oils, pork, beef and mutton, poultry, fish and seafood, sugar, and vegetables by means of coupons in state-owned stores and at free market prices. Information was also collected on rents paid by the households, fuel available, type of transportation used, and availability and use of medical and child care. The Chinese Household Income Project collected data in 1988, 1995, 2002, and 2007. ICPSR holds data from the first three collections, and information about these can be found on the series description page. Data collected in 2007 are available through the China Institute for Income Distribution.

  15. w

    Global Financial Inclusion (Global Findex) Database 2021 - China

    • microdata.worldbank.org
    • datacatalog.ihsn.org
    • +1more
    Updated Dec 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Research Group, Finance and Private Sector Development Unit (2022). Global Financial Inclusion (Global Findex) Database 2021 - China [Dataset]. https://microdata.worldbank.org/index.php/catalog/4627
    Explore at:
    Dataset updated
    Dec 16, 2022
    Dataset authored and provided by
    Development Research Group, Finance and Private Sector Development Unit
    Time period covered
    2021 - 2022
    Area covered
    China
    Description

    Abstract

    The fourth edition of the Global Findex offers a lens into how people accessed and used financial services during the COVID-19 pandemic, when mobility restrictions and health policies drove increased demand for digital services of all kinds.

    The Global Findex is the world's most comprehensive database on financial inclusion. It is also the only global demand-side data source allowing for global and regional cross-country analysis to provide a rigorous and multidimensional picture of how adults save, borrow, make payments, and manage financial risks. Global Findex 2021 data were collected from national representative surveys of about 128,000 adults in more than 120 economies. The latest edition follows the 2011, 2014, and 2017 editions, and it includes a number of new series measuring financial health and resilience and contains more granular data on digital payment adoption, including merchant and government payments.

    The Global Findex is an indispensable resource for financial service practitioners, policy makers, researchers, and development professionals.

    Geographic coverage

    Tibet was excluded from the sample. The excluded areas represent less than 1 percent of the total population of China.

    Analysis unit

    Individual

    Kind of data

    Observation data/ratings [obs]

    Sampling procedure

    In most developing economies, Global Findex data have traditionally been collected through face-to-face interviews. Surveys are conducted face-to-face in economies where telephone coverage represents less than 80 percent of the population or where in-person surveying is the customary methodology. However, because of ongoing COVID-19 related mobility restrictions, face-to-face interviewing was not possible in some of these economies in 2021. Phone-based surveys were therefore conducted in 67 economies that had been surveyed face-to-face in 2017. These 67 economies were selected for inclusion based on population size, phone penetration rate, COVID-19 infection rates, and the feasibility of executing phone-based methods where Gallup would otherwise conduct face-to-face data collection, while complying with all government-issued guidance throughout the interviewing process. Gallup takes both mobile phone and landline ownership into consideration. According to Gallup World Poll 2019 data, when face-to-face surveys were last carried out in these economies, at least 80 percent of adults in almost all of them reported mobile phone ownership. All samples are probability-based and nationally representative of the resident adult population. Phone surveys were not a viable option in 17 economies that had been part of previous Global Findex surveys, however, because of low mobile phone ownership and surveying restrictions. Data for these economies will be collected in 2022 and released in 2023.

    In economies where face-to-face surveys are conducted, the first stage of sampling is the identification of primary sampling units. These units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day and, where possible, on different days. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households. Each eligible household member is listed, and the hand-held survey device randomly selects the household member to be interviewed. For paper surveys, the Kish grid method is used to select the respondent. In economies where cultural restrictions dictate gender matching, respondents are randomly selected from among all eligible adults of the interviewer's gender.

    In traditionally phone-based economies, respondent selection follows the same procedure as in previous years, using random digit dialing or a nationally representative list of phone numbers. In most economies where mobile phone and landline penetration is high, a dual sampling frame is used.

    The same respondent selection procedure is applied to the new phone-based economies. Dual frame (landline and mobile phone) random digital dialing is used where landline presence and use are 20 percent or higher based on historical Gallup estimates. Mobile phone random digital dialing is used in economies with limited to no landline presence (less than 20 percent).

    For landline respondents in economies where mobile phone or landline penetration is 80 percent or higher, random selection of respondents is achieved by using either the latest birthday or household enumeration method. For mobile phone respondents in these economies or in economies where mobile phone or landline penetration is less than 80 percent, no further selection is performed. At least three attempts are made to reach a person in each household, spread over different days and times of day.

    Sample size for China is 3500.

    Mode of data collection

    Mobile telephone

    Research instrument

    Questionnaires are available on the website.

    Sampling error estimates

    Estimates of standard errors (which account for sampling error) vary by country and indicator. For country-specific margins of error, please refer to the Methodology section and corresponding table in Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar. 2022. The Global Findex Database 2021: Financial Inclusion, Digital Payments, and Resilience in the Age of COVID-19. Washington, DC: World Bank.

  16. S

    Research Dataset on Water pressure in China's water-energy-food nexus from...

    • scidb.cn
    Updated Sep 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cui Shixi (2025). Research Dataset on Water pressure in China's water-energy-food nexus from the perspective of regional trade [Dataset]. http://doi.org/10.57760/sciencedb.j00100.00029
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Cui Shixi
    Area covered
    China
    Description

    This dataset covers relevant data on consumption driven water consumption, external contributions and transmission pathways of water pressure, and future changes in implicit water use in the water-energy-food (WEF) system of various provinces, covering 31 provinces across the country. The main sources of data are the China Environmental Statistics Yearbook, Water Resources Bulletin, China Energy Statistics Yearbook, and the World Resources Institute. The dataset mainly includes three types of data: (1) Embodied blue water use in WEF nexus by province in China; (2) External contribution of water stress and energy- food competition for water in WEF nexus by province in China; (3) Embodied water use statistics for high and low water pressure pathways in the WEF sector by province in China.

  17. N

    Dataset for China, TX Census Bureau Income Distribution by Race

    • neilsberg.com
    Updated Jan 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Dataset for China, TX Census Bureau Income Distribution by Race [Dataset]. https://www.neilsberg.com/research/datasets/80c17315-9fc2-11ee-b48f-3860777c1fe6/
    Explore at:
    Dataset updated
    Jan 3, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Texas, China
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the China median household income by race. The dataset can be utilized to understand the racial distribution of China income.

    Content

    The dataset will have the following datasets when applicable

    Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

    • China, TX median household income breakdown by race betwen 2011 and 2021
    • Median Household Income by Racial Categories in China, TX (2021, in 2022 inflation-adjusted dollars)

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Interested in deeper insights and visual analysis?

    Explore our comprehensive data analysis and visual representations for a deeper understanding of China median household income by race. You can refer the same here

  18. S

    Industrial heat sources in China during 2012 and 2023

    • scidb.cn
    Updated Aug 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ma Caihong (2025). Industrial heat sources in China during 2012 and 2023 [Dataset]. http://doi.org/10.57760/sciencedb.nbsdc.00238
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 15, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Ma Caihong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    Comprehensively grasping the high spatiotemporal dynamic information of industrial heat sources (IHS) in China is of great significance for the green, high-quality and sustainable development of industry under the background of "dual carbon". At present, there is still a lack of dynamic data on industrial heat sources for large regions, long time series, and high spatiotemporal, and measures such as structural adjustment and capacity reduction in China have not been effectively tracked and monitored in space. This article utilizes a long time sequence of 375m NPP VIIRS(United States Suomi National Polar-orbiting Partnership, Visible Infrared Imaging Radiometer Suite) Active fire/hotspot data (ACF), based on an improved Kmeans industrial heat source identification method, combined with POI topology analysis and high-resolution remote sensing image features of different types of factories and mines, is used to identify and classify industrial heat sources in China from 2021 to 2023. A dataset of industrial heat sources in China from 2021 to 2023 including type information in vector format is first constructed and public free. The dynamic remote sensing monitoring results of industrial heat sources in China from 2021 to 2023 provide independent scientific basis for China to actively respond to the upgrading of the "structural adjustment and capacity reduction" industrial model, domestic and international carbon tax trading, and improvement of atmospheric environment and other sustainable development processes. The results show that extending the time span to 2012-2023, combined with POI topology analysis based industrial heat source category recognition, can effectively reveal the spatiotemporal evolution laws of different types of industrial heat sources during the critical period of industrial transformation and upgrading; Based on the improved Kmeans industrial heat source recognition method, while ensuring recognition accuracy (98.14%), the number, accuracy, average particle size, and spatial coverage of industrial heat source recognition have been effectively improved; The dataset includes 20 characteristic parameters such as factory and mine locations, annual operating conditions, and categories, which fully record the radiation flux characteristics and production activity intensity of different types of industrial heat sources, providing richer data support for industrial carbon emission estimation and regional economic development assessment.

  19. F

    Chinese Open Ended Question Answer Text Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Chinese Open Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/chinese-open-ended-question-answer-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    The Chinese Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the Chinese language, advancing the field of artificial intelligence.

    Dataset Content:

    This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in Chinese. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

    Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Chinese people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.

    This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Question Diversity:

    To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.

    Answer Formats:

    To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.

    Data Format and Annotation Details:

    This fully labeled Chinese Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.

    Quality and Accuracy:

    The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

    Both the question and answers in Chinese are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.

    Continuous Updates and Customization:

    The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.

    License:

    The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Chinese Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.

  20. List of academic disciplines in Chinese and English for data transformation

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Feb 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Henriot; Christian Henriot (2025). List of academic disciplines in Chinese and English for data transformation [Dataset]. http://doi.org/10.5281/zenodo.14845567
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christian Henriot; Christian Henriot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Feb 10, 2025
    Description

    Overview: This dataset provides a bilingual list of academic disciplines, extracted from various sources in Chinese and translated into English. It includes a broad range of terms beyond strictly defined academic disciplines, encompassing subfields, interdisciplinary categories, and related academic domains. The dataset is designed to be reusable for various applications, such as term matching across corpora and extracting relevant terms from new datasets.

    Structure: The dataset consists of multiple columns, each representing different levels of classification for academic disciplines. The primary columns include:

    • Discipline (Chinese Simplified & Traditional): The name of the discipline in Chinese.

    • Discipline (English): The corresponding English translation of the discipline.

    • Discipline Level 2 (Chinese & English): A more specific categorization within a broader academic category.

    • Discipline Level 1 (Chinese & English): A higher-level classification grouping multiple related disciplines.

    • Discipline Level 0 (Chinese & English): The broadest classification, representing major academic fields.

    • Level1_code: A numerical or coded identifier for Level 1 disciplines, which may be useful for structured data processing.

    Purpose & Applications:

    • Term Matching: The dataset can be used to match extracted terms from other corpora, ensuring consistency across multilingual sources.

    • Hierarchical Classification: The multi-level structure allows users to analyze disciplines at different granularities.

    • Corpus Analysis & Text Mining: The dataset facilitates term extraction and standardization in computational text analysis projects.

    • Cross-Linguistic Comparisons: Researchers can use the bilingual nature of the dataset to study the relationship between Chinese and English academic terminologies.

    Potential Use Cases:

    • Automated classification of academic articles based on discipline.

    • Developing bilingual glossaries for research institutions.

    • Improving machine learning models for academic domain recognition.

    Data Quality Considerations:

    • Some terms may appear at multiple levels, reflecting differences in classification across sources.

    • The dataset structure should be checked for delimiter consistency before processing in automated systems.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
CEICdata.com (2021). China CN: SPI: Pillar 4 Data Sources Score: Scale 0-100 [Dataset]. https://www.ceicdata.com/en/china/governance-policy-and-institutions/cn-spi-pillar-4-data-sources-score-scale-0100

China CN: SPI: Pillar 4 Data Sources Score: Scale 0-100

Explore at:
Dataset updated
Apr 4, 2021
Dataset provided by
CEICdata.com
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
Dec 1, 2016 - Dec 1, 2019
Area covered
China
Variables measured
Money Market Rate
Description

China SPI: Pillar 4 Data Sources Score: Scale 0-100 data was reported at 41.575 NA in 2023. This stayed constant from the previous number of 41.575 NA for 2022. China SPI: Pillar 4 Data Sources Score: Scale 0-100 data is updated yearly, averaging 41.575 NA from Dec 2015 (Median) to 2023, with 9 observations. The data reached an all-time high of 47.492 NA in 2015 and a record low of 33.992 NA in 2016. China SPI: Pillar 4 Data Sources Score: Scale 0-100 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s China – Table CN.World Bank.WDI: Governance: Policy and Institutions. The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.;Statistical Performance Indicators, The World Bank (https://datacatalog.worldbank.org/dataset/statistical-performance-indicators);Weighted average;

Search
Clear search
Close search
Google apps
Main menu