Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
RxNorm is a name of a US-specific terminology in medicine that contains all medications available on US market. Source: https://en.wikipedia.org/wiki/RxNorm
RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Source: https://www.nlm.nih.gov/research/umls/rxnorm/
RxNorm was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs, defined as the combination of {ingredient + strength + dose form}. In addition to the naming system, the RxNorm dataset also provides structured information such as brand names, ingredients, drug classes, and so on, for each clinical drug. Typical uses of RxNorm include navigating between names and codes among different drug vocabularies and using information in RxNorm to assist with health information exchange/medication reconciliation, e-prescribing, drug analytics, formulary development, and other functions.
This public dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.
The following tables are included in the RxNorm dataset:
RXNCONSO contains concept and source information
RXNREL contains information regarding relationships between entities
RXNSAT contains attribute information
RXNSTY contains semantic information
RXNSAB contains source info
RXNCUI contains retired rxcui codes
RXNATOMARCHIVE contains archived data
RXNCUICHANGES contains concept changes
Update Frequency: Monthly
Fork this kernel to get started with this dataset.
https://www.nlm.nih.gov/research/umls/rxnorm/
https://bigquery.cloud.google.com/dataset/bigquery-public-data:nlm_rxnorm
https://cloud.google.com/bigquery/public-data/rxnorm
Dataset Source: Unified Medical Language System RxNorm. The dataset is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. This dataset uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the dataset, does not endorse or recommend this or any other dataset.
Banner Photo by @freestocks from Unsplash.
What are the RXCUI codes for the ingredients of a list of drugs?
Which ingredients have the most variety of dose forms?
In what dose forms is the drug phenylephrine found?
What are the ingredients of the drug labeled with the generic code number 072718?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To address the limitations of existing stock price prediction models in handling real-time data streams—such as poor scalability, declining predictive performance due to dynamic changes in data distribution, and difficulties in accurately forecasting non-stationary stock prices—this paper proposes an incremental learning-based enhanced Transformer framework (IL-ETransformer) for online stock price prediction. This method leverages a multi-head self-attention mechanism to deeply explore the complex temporal dependencies between stock prices and feature factors. Additionally, a continual normalization mechanism is employed to stabilize the data stream, enhancing the model’s adaptability to dynamic changes. To ensure that the model retains prior knowledge while integrating new information, a time series elastic weight consolidation (TSEWC) algorithm is introduced to enable efficient incremental training with incoming data. Experiments conducted on five publicly available datasets demonstrate that the proposed method not only effectively captures the temporal information in the data but also fully exploits the correlations among multi-dimensional features, significantly improving stock price prediction accuracy. Notably, the method shows robust performance in coping with non-stationary and frequently changing financial market data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set models the IEEE 14-bus system for studies on P2P electricity markets, including real data of consumption, solar and wind power from Australia. This data set is characterized by 30 minutes time-step over one year, i.e. from July 2012 to June 2013.
The transmission system comprises 14 buses and 20 lines, and its characteristics are based on [1]. The original number of generators was increased to 8 generators, i.e. 1 coal-based generator, 2 gas-based generators, 3 wind turbines and 2 PV plants. The data set uses the original number of 11 loads.
The bus 1 represents the upstream connection to the main grid, where the generator assumes an infinite power. The market price from the Australian Energy Market Operator is used in this generator. It is assumed the same period from July 2012 to June 2013 [4]. This data set supposes a tariff of 10$/MWh for using the main grid. The energy imported and exported in bus 1 has to account this extra cost. Thus, the exportation price is equal to the market price minus this grid tariff. On the other hand, the importation price is equal to the market price plus this grid tariff.
The wind production has been based on the data set from [2]. The time resolution has been converted from 5 minutes to 30 minutes. The authors would like to acknowledge that the data set in [2] was processed by Stefanos Delikaraoglou and Jethro Dowell. The solar production and load consumption are taken from [3]. The load consumption is split into fixed and flexible consumption per time-step. Since there is no access to the total capacity of the flexible consumption, we split the daily flexible consumption over each time-step. In this way, the maximum consumption is equal to the fixed consumption plus twice this flexible consumption per time-step. The minimum consumption is equal to the fixed consumption in each time-step.
The wind, solar and load data sets have been normalized, i.e. values relative to rated power. Then, these normalized sequences were multiplied by the capacity of each element. The data is intended for use in studies related to consumer-centric electricity markets, e.g.:
https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
BASE YEAR | 2024 |
HISTORICAL DATA | 2019 - 2024 |
REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
MARKET SIZE 2023 | 4.31(USD Billion) |
MARKET SIZE 2024 | 5.1(USD Billion) |
MARKET SIZE 2032 | 19.6(USD Billion) |
SEGMENTS COVERED | Data Type ,Deployment Model ,Data Privacy Regulations ,Industry Vertical ,Data Cleansing Features ,Regional |
COUNTRIES COVERED | North America, Europe, APAC, South America, MEA |
KEY MARKET DYNAMICS | Rising Demand for Data Privacy Increased Collaboration Across Industries Advancements in Cloud Computing Growing Need for Data Governance Emergence of AI and Machine Learning |
MARKET FORECAST UNITS | USD Billion |
KEY COMPANIES PROFILED | Oracle ,LiveRamp ,InfoSum ,Dun & Bradstreet ,Talend ,Verisk ,Informatica ,IBM ,Acxiom ,AdAdapted ,Experian ,Salesforce ,Snowflake ,SAP ,Precisely |
MARKET FORECAST PERIOD | 2024 - 2032 |
KEY MARKET OPPORTUNITIES | Increasing adoption of cloudbased data analytics Rising demand for data privacy and security Growing need for data collaboration and sharing Expansion of the digital advertising market Technological advancements in data cleaning and matching |
COMPOUND ANNUAL GROWTH RATE (CAGR) | 18.32% (2024 - 2032) |
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.