5 datasets found
  1. f

    Data from: S1 Data -

    • plos.figshare.com
    zip
    Updated Oct 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0292466.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 11, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.

  2. RxNorm Data

    • kaggle.com
    • bioregistry.io
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2019). RxNorm Data [Dataset]. https://www.kaggle.com/datasets/nlm-nih/nlm-rxnorm
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    National Library of Medicine
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    RxNorm is a name of a US-specific terminology in medicine that contains all medications available on US market. Source: https://en.wikipedia.org/wiki/RxNorm

    RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Source: https://www.nlm.nih.gov/research/umls/rxnorm/

    Content

    RxNorm was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs, defined as the combination of {ingredient + strength + dose form}. In addition to the naming system, the RxNorm dataset also provides structured information such as brand names, ingredients, drug classes, and so on, for each clinical drug. Typical uses of RxNorm include navigating between names and codes among different drug vocabularies and using information in RxNorm to assist with health information exchange/medication reconciliation, e-prescribing, drug analytics, formulary development, and other functions.

    This public dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.

    The following tables are included in the RxNorm dataset:

    • RXNCONSO contains concept and source information

    • RXNREL contains information regarding relationships between entities

    • RXNSAT contains attribute information

    • RXNSTY contains semantic information

    • RXNSAB contains source info

    • RXNCUI contains retired rxcui codes

    • RXNATOMARCHIVE contains archived data

    • RXNCUICHANGES contains concept changes

    Update Frequency: Monthly

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://www.nlm.nih.gov/research/umls/rxnorm/

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:nlm_rxnorm

    https://cloud.google.com/bigquery/public-data/rxnorm

    Dataset Source: Unified Medical Language System RxNorm. The dataset is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. This dataset uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the dataset, does not endorse or recommend this or any other dataset.

    Banner Photo by @freestocks from Unsplash.

    Inspiration

    What are the RXCUI codes for the ingredients of a list of drugs?

    Which ingredients have the most variety of dose forms?

    In what dose forms is the drug phenylephrine found?

    What are the ingredients of the drug labeled with the generic code number 072718?

  3. f

    Partial data display of 000300.SH.

    • plos.figshare.com
    xls
    Updated Jan 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yiming Qian (2025). Partial data display of 000300.SH. [Dataset]. http://doi.org/10.1371/journal.pone.0316955.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Yiming Qian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To address the limitations of existing stock price prediction models in handling real-time data streams—such as poor scalability, declining predictive performance due to dynamic changes in data distribution, and difficulties in accurately forecasting non-stationary stock prices—this paper proposes an incremental learning-based enhanced Transformer framework (IL-ETransformer) for online stock price prediction. This method leverages a multi-head self-attention mechanism to deeply explore the complex temporal dependencies between stock prices and feature factors. Additionally, a continual normalization mechanism is employed to stabilize the data stream, enhancing the model’s adaptability to dynamic changes. To ensure that the model retains prior knowledge while integrating new information, a time series elastic weight consolidation (TSEWC) algorithm is introduced to enable efficient incremental training with incoming data. Experiments conducted on five publicly available datasets demonstrate that the proposed method not only effectively captures the temporal information in the data but also fully exploits the correlations among multi-dimensional features, significantly improving stock price prediction accuracy. Notably, the method shows robust performance in coping with non-stationary and frequently changing financial market data.

  4. The P2P-IEEE 14 bus system data set

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tiago Sousa; Tiago Sousa; Tiago Soares; Pierre Pinson; Fabio Moret; Thomas Baroche; Etienne Sorin; Tiago Soares; Pierre Pinson; Fabio Moret; Thomas Baroche; Etienne Sorin (2020). The P2P-IEEE 14 bus system data set [Dataset]. http://doi.org/10.5281/zenodo.1220935
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tiago Sousa; Tiago Sousa; Tiago Soares; Pierre Pinson; Fabio Moret; Thomas Baroche; Etienne Sorin; Tiago Soares; Pierre Pinson; Fabio Moret; Thomas Baroche; Etienne Sorin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set models the IEEE 14-bus system for studies on P2P electricity markets, including real data of consumption, solar and wind power from Australia. This data set is characterized by 30 minutes time-step over one year, i.e. from July 2012 to June 2013.

    The transmission system comprises 14 buses and 20 lines, and its characteristics are based on [1]. The original number of generators was increased to 8 generators, i.e. 1 coal-based generator, 2 gas-based generators, 3 wind turbines and 2 PV plants. The data set uses the original number of 11 loads.

    The bus 1 represents the upstream connection to the main grid, where the generator assumes an infinite power. The market price from the Australian Energy Market Operator is used in this generator. It is assumed the same period from July 2012 to June 2013 [4]. This data set supposes a tariff of 10$/MWh for using the main grid. The energy imported and exported in bus 1 has to account this extra cost. Thus, the exportation price is equal to the market price minus this grid tariff. On the other hand, the importation price is equal to the market price plus this grid tariff.

    The wind production has been based on the data set from [2]. The time resolution has been converted from 5 minutes to 30 minutes. The authors would like to acknowledge that the data set in [2] was processed by Stefanos Delikaraoglou and Jethro Dowell. The solar production and load consumption are taken from [3]. The load consumption is split into fixed and flexible consumption per time-step. Since there is no access to the total capacity of the flexible consumption, we split the daily flexible consumption over each time-step. In this way, the maximum consumption is equal to the fixed consumption plus twice this flexible consumption per time-step. The minimum consumption is equal to the fixed consumption in each time-step.

    The wind, solar and load data sets have been normalized, i.e. values relative to rated power. Then, these normalized sequences were multiplied by the capacity of each element. The data is intended for use in studies related to consumer-centric electricity markets, e.g.:

    • Validate new market designs or business models;
    • Assess the impact of new grid operation strategies;
    • Test the effect of strategic behavior by producers or consumers.
  5. w

    Global Data Cleanroom Software Market Research Report: By Data Type...

    • wiseguyreports.com
    Updated Jul 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wWiseguy Research Consultants Pvt Ltd (2024). Global Data Cleanroom Software Market Research Report: By Data Type (First-Party Data, Second-Party Data, Third-Party Data), By Deployment Model (Cloud-based, On-premises, Hybrid), By Data Privacy Regulations (GDPR, CCPA, LGPD), By Industry Vertical (Retail, Finance, Healthcare, Manufacturing), By Data Cleansing Features (Data Standardization, Data Deduplication, Data Enrichment) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/de/reports/data-cleanroom-software-market
    Explore at:
    Dataset updated
    Jul 23, 2024
    Dataset authored and provided by
    wWiseguy Research Consultants Pvt Ltd
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Jan 7, 2024
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2024
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20234.31(USD Billion)
    MARKET SIZE 20245.1(USD Billion)
    MARKET SIZE 203219.6(USD Billion)
    SEGMENTS COVEREDData Type ,Deployment Model ,Data Privacy Regulations ,Industry Vertical ,Data Cleansing Features ,Regional
    COUNTRIES COVEREDNorth America, Europe, APAC, South America, MEA
    KEY MARKET DYNAMICSRising Demand for Data Privacy Increased Collaboration Across Industries Advancements in Cloud Computing Growing Need for Data Governance Emergence of AI and Machine Learning
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDOracle ,LiveRamp ,InfoSum ,Dun & Bradstreet ,Talend ,Verisk ,Informatica ,IBM ,Acxiom ,AdAdapted ,Experian ,Salesforce ,Snowflake ,SAP ,Precisely
    MARKET FORECAST PERIOD2024 - 2032
    KEY MARKET OPPORTUNITIESIncreasing adoption of cloudbased data analytics Rising demand for data privacy and security Growing need for data collaboration and sharing Expansion of the digital advertising market Technological advancements in data cleaning and matching
    COMPOUND ANNUAL GROWTH RATE (CAGR) 18.32% (2024 - 2032)
  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0292466.s001

Data from: S1 Data -

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Oct 11, 2023
Dataset provided by
PLOS ONE
Authors
Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.

Search
Clear search
Close search
Google apps
Main menu