5 datasets found

f
Data from: S1 Data -
plos.figshare.com
zip
Updated Oct 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0292466.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0292466.s001
Dataset updated
Oct 11, 2023
Dataset provided by
PLOS ONE
Authors
Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.
RxNorm Data
kaggle.com
bioregistry.io
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2019). RxNorm Data [Dataset]. https://www.kaggle.com/datasets/nlm-nih/nlm-rxnorm
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
National Library of Medicine
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

RxNorm is a name of a US-specific terminology in medicine that contains all medications available on US market. Source: https://en.wikipedia.org/wiki/RxNorm

RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Source: https://www.nlm.nih.gov/research/umls/rxnorm/

Content

RxNorm was created by the U.S. National Library of Medicine (NLM) to provide a normalized naming system for clinical drugs, defined as the combination of {ingredient + strength + dose form}. In addition to the naming system, the RxNorm dataset also provides structured information such as brand names, ingredients, drug classes, and so on, for each clinical drug. Typical uses of RxNorm include navigating between names and codes among different drug vocabularies and using information in RxNorm to assist with health information exchange/medication reconciliation, e-prescribing, drug analytics, formulary development, and other functions.

This public dataset includes multiple data files originally released in RxNorm Rich Release Format (RXNRRF) that are loaded into Bigquery tables. The data is updated and archived on a monthly basis.

The following tables are included in the RxNorm dataset:

RXNCONSO contains concept and source information

RXNREL contains information regarding relationships between entities

RXNSAT contains attribute information

RXNSTY contains semantic information

RXNSAB contains source info

RXNCUI contains retired rxcui codes

RXNATOMARCHIVE contains archived data

RXNCUICHANGES contains concept changes

Update Frequency: Monthly

Fork this kernel to get started with this dataset.

Acknowledgements

https://www.nlm.nih.gov/research/umls/rxnorm/

https://bigquery.cloud.google.com/dataset/bigquery-public-data:nlm_rxnorm

https://cloud.google.com/bigquery/public-data/rxnorm

Dataset Source: Unified Medical Language System RxNorm. The dataset is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset. This dataset uses publicly available data from the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services; NLM is not responsible for the dataset, does not endorse or recommend this or any other dataset.

Banner Photo by @freestocks from Unsplash.

Inspiration

What are the RXCUI codes for the ingredients of a list of drugs?

Which ingredients have the most variety of dose forms?

In what dose forms is the drug phenylephrine found?

What are the ingredients of the drug labeled with the generic code number 072718?
f
Partial data display of 000300.SH.
plos.figshare.com
xls
Updated Jan 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yiming Qian (2025). Partial data display of 000300.SH. [Dataset]. http://doi.org/10.1371/journal.pone.0316955.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316955.t001
Dataset updated
Jan 13, 2025
Dataset provided by
PLOS ONE
Authors
Yiming Qian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To address the limitations of existing stock price prediction models in handling real-time data streams—such as poor scalability, declining predictive performance due to dynamic changes in data distribution, and difficulties in accurately forecasting non-stationary stock prices—this paper proposes an incremental learning-based enhanced Transformer framework (IL-ETransformer) for online stock price prediction. This method leverages a multi-head self-attention mechanism to deeply explore the complex temporal dependencies between stock prices and feature factors. Additionally, a continual normalization mechanism is employed to stabilize the data stream, enhancing the model’s adaptability to dynamic changes. To ensure that the model retains prior knowledge while integrating new information, a time series elastic weight consolidation (TSEWC) algorithm is introduced to enable efficient incremental training with incoming data. Experiments conducted on five publicly available datasets demonstrate that the proposed method not only effectively captures the temporal information in the data but also fully exploits the correlations among multi-dimensional features, significantly improving stock price prediction accuracy. Notably, the method shows robust performance in coping with non-stationary and frequently changing financial market data.
The P2P-IEEE 14 bus system data set
zenodo.org
explore.openaire.eu
+1more
zip
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tiago Sousa; Tiago Sousa; Tiago Soares; Pierre Pinson; Fabio Moret; Thomas Baroche; Etienne Sorin; Tiago Soares; Pierre Pinson; Fabio Moret; Thomas Baroche; Etienne Sorin (2020). The P2P-IEEE 14 bus system data set [Dataset]. http://doi.org/10.5281/zenodo.1220935
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1220935
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tiago Sousa; Tiago Sousa; Tiago Soares; Pierre Pinson; Fabio Moret; Thomas Baroche; Etienne Sorin; Tiago Soares; Pierre Pinson; Fabio Moret; Thomas Baroche; Etienne Sorin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set models the IEEE 14-bus system for studies on P2P electricity markets, including real data of consumption, solar and wind power from Australia. This data set is characterized by 30 minutes time-step over one year, i.e. from July 2012 to June 2013.

The transmission system comprises 14 buses and 20 lines, and its characteristics are based on [1]. The original number of generators was increased to 8 generators, i.e. 1 coal-based generator, 2 gas-based generators, 3 wind turbines and 2 PV plants. The data set uses the original number of 11 loads.

The bus 1 represents the upstream connection to the main grid, where the generator assumes an infinite power. The market price from the Australian Energy Market Operator is used in this generator. It is assumed the same period from July 2012 to June 2013 [4]. This data set supposes a tariff of 10$/MWh for using the main grid. The energy imported and exported in bus 1 has to account this extra cost. Thus, the exportation price is equal to the market price minus this grid tariff. On the other hand, the importation price is equal to the market price plus this grid tariff.

The wind production has been based on the data set from [2]. The time resolution has been converted from 5 minutes to 30 minutes. The authors would like to acknowledge that the data set in [2] was processed by Stefanos Delikaraoglou and Jethro Dowell. The solar production and load consumption are taken from [3]. The load consumption is split into fixed and flexible consumption per time-step. Since there is no access to the total capacity of the flexible consumption, we split the daily flexible consumption over each time-step. In this way, the maximum consumption is equal to the fixed consumption plus twice this flexible consumption per time-step. The minimum consumption is equal to the fixed consumption in each time-step.

The wind, solar and load data sets have been normalized, i.e. values relative to rated power. Then, these normalized sequences were multiplied by the capacity of each element. The data is intended for use in studies related to consumer-centric electricity markets, e.g.:

Validate new market designs or business models;

Assess the impact of new grid operation strategies;

Test the effect of strategic behavior by producers or consumers.

Global Data Cleanroom Software Market Research Report: By Data Type...

wiseguyreports.com

Updated Jul 23, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

wWiseguy Research Consultants Pvt Ltd (2024). Global Data Cleanroom Software Market Research Report: By Data Type (First-Party Data, Second-Party Data, Third-Party Data), By Deployment Model (Cloud-based, On-premises, Hybrid), By Data Privacy Regulations (GDPR, CCPA, LGPD), By Industry Vertical (Retail, Finance, Healthcare, Manufacturing), By Data Cleansing Features (Data Standardization, Data Deduplication, Data Enrichment) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/de/reports/data-cleanroom-software-market

Explore at:

Dataset updated

Jul 23, 2024

Dataset authored and provided by

wWiseguy Research Consultants Pvt Ltd

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Jan 7, 2024

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2024
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2023	4.31(USD Billion)
MARKET SIZE 2024	5.1(USD Billion)
MARKET SIZE 2032	19.6(USD Billion)
SEGMENTS COVERED	Data Type ,Deployment Model ,Data Privacy Regulations ,Industry Vertical ,Data Cleansing Features ,Regional
COUNTRIES COVERED	North America, Europe, APAC, South America, MEA
KEY MARKET DYNAMICS	Rising Demand for Data Privacy Increased Collaboration Across Industries Advancements in Cloud Computing Growing Need for Data Governance Emergence of AI and Machine Learning
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Oracle ,LiveRamp ,InfoSum ,Dun & Bradstreet ,Talend ,Verisk ,Informatica ,IBM ,Acxiom ,AdAdapted ,Experian ,Salesforce ,Snowflake ,SAP ,Precisely
MARKET FORECAST PERIOD	2024 - 2032
KEY MARKET OPPORTUNITIES	Increasing adoption of cloudbased data analytics Rising demand for data privacy and security Growing need for data collaboration and sharing Expansion of the digital advertising market Technological advancements in data cleaning and matching
COMPOUND ANNUAL GROWTH RATE (CAGR)	18.32% (2024 - 2032)

Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang (2023). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0292466.s001

Data from: S1 Data -

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0292466.s001

Dataset updated

Oct 11, 2023

Dataset provided by

PLOS ONE

Authors

Yancong Zhou; Wenyue Chen; Xiaochen Sun; Dandan Yang

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analyzing customers’ characteristics and giving the early warning of customer churn based on machine learning algorithms, can help enterprises provide targeted marketing strategies and personalized services, and save a lot of operating costs. Data cleaning, oversampling, data standardization and other preprocessing operations are done on 900,000 telecom customer personal characteristics and historical behavior data set based on Python language. Appropriate model parameters were selected to build BPNN (Back Propagation Neural Network). Random Forest (RF) and Adaboost, the two classic ensemble learning models were introduced, and the Adaboost dual-ensemble learning model with RF as the base learner was put forward. The four models and the other four classical machine learning models-decision tree, naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) were utilized respectively to analyze the customer churn data. The results show that the four models have better performance in terms of recall rate, precision rate, F1 score and other indicators, and the RF-Adaboost dual-ensemble model has the best performance. Among them, the recall rates of BPNN, RF, Adaboost and RF-Adaboost dual-ensemble model on positive samples are respectively 79%, 90%, 89%,93%, the precision rates are 97%, 99%, 98%, 99%, and the F1 scores are 87%, 95%, 94%, 96%. The RF-Adaboost dual-ensemble model has the best performance, and the three indicators are 10%, 1%, and 6% higher than the reference. The prediction results of customer churn provide strong data support for telecom companies to adopt appropriate retention strategies for pre-churn customers and reduce customer churn.

Clear search

Close search

Google apps

Main menu

Data from: S1 Data -

RxNorm Data

Context

Content

Acknowledgements

Inspiration

Partial data display of 000300.SH.

The P2P-IEEE 14 bus system data set

Global Data Cleanroom Software Market Research Report: By Data Type...

Data from: S1 Data -