52 datasets found

e
Data Warehousing and Data Mining (Old), 7th Semester, Computer Science and...
paper.erudition.co.in
html
Updated Nov 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Data Warehousing and Data Mining (Old), 7th Semester, Computer Science and Engineering, MAKAUT | Erudition Paper [Dataset]. https://paper.erudition.co.in/makaut/btech-in-computer-science-and-engineering/7/data-warehousing-and-data-mining
Explore at:
htmlAvailable download formats
Dataset updated
Nov 23, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of Data Warehousing and Data Mining (Old),7th Semester,Computer Science and Engineering,Maulana Abul Kalam Azad University of Technology
e
Module II
paper.erudition.co.in
html
Updated Nov 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Module II [Dataset]. https://paper.erudition.co.in/makaut/btech-in-computer-science-and-engineering/7/data-warehousing-and-data-mining
Explore at:
htmlAvailable download formats
Dataset updated
Nov 23, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Module II of Data Warehousing and Data Mining, 7th Semester , Computer Science and Engineering
Musical Chords and Image Descriptors from Film Fantasia (Disney)
figshare.com
txt
Updated Apr 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucía Martín-Gómez (2020). Musical Chords and Image Descriptors from Film Fantasia (Disney) [Dataset]. http://doi.org/10.6084/m9.figshare.12110712.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12110712.v2
Dataset updated
Apr 10, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Lucía Martín-Gómez
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
FANTASIAThis repository contains the data related to image descriptors and sounds associated with a selection of frames of the film Fantasia, produced by Disney.AboutThis repository contains data used in a doctoral thesis for the automatic composition of descriptive music. The information is extracted from the fragment of The Nutcracker from film Fantasia (Disney, 1940) using SIFT and BoVW, color quantization and CENS. Data- Attributes 1-50: weighted vector of visual words.- Attributes 51-59: red, green and blue values for three RGB colors.- Note1, note2 and note3: MIDI notes related to each frame of the film.LicenseData is available under MIT License. To make use of the data the article must be cited.
Data from: CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2...
zenodo.org
bin, png, zip
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado (2024). CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2 DIABETES [Dataset]. http://doi.org/10.5281/zenodo.7778291
Explore at:
bin, png, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7778291
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Technical notes and documentation on the common data model of the project CONCEPT-DM2.

This publication corresponds to the Common Data Model (CDM) specification of the CONCEPT-DM2 project for the implementation of a federated network analysis of the healthcare pathway of type 2 diabetes.

Aims of the CONCEPT-DM2 project:

General aim: To analyse chronic care effectiveness and efficiency of care pathways in diabetes, assuming the relevance of care pathways as independent factors of health outcomes using data from real life world (RWD) from five Spanish Regional Health Systems.

Main specific aims:

To characterize the care pathways in patients with diabetes through the whole care system in terms of process indicators and pharmacologic recommendations

To compare these observed care pathways with the theoretical clinical pathways derived from the clinical practice guidelines

To assess if the adherence to clinical guidelines influence on important health outcomes, such as cardiovascular hospitalizations.

To compare the traditional analytical methods with process mining methods in terms of modeling quality, prediction performance and information provided.

Study Design: It is a population-based retrospective observational study centered on all T2D patients diagnosed in five Regional Health Services within the Spanish National Health Service. We will include all the contacts of these patients with the health services using the electronic medical record systems including Primary Care data, Specialized Care data, Hospitalizations, Urgent Care data, Pharmacy Claims, and also other registers such as the mortality and the population register.

Cohort definition: All patients with code of Type 2 Diabetes in the clinical health records

Inclusion criteria: patients that, at 01/01/2017 or during the follow-up from 01/01/2017 to 31/12/2022 had active health card (active TIS - tarjeta sanitaria activa) and code of type 2 diabetes (T2D, DM2 in spanish) in the clinical records of primary care (CIAP2 T90 in case of using CIAP code system)

Exclusion criteria:

patients with no contact with the health system from 01/01/2017 to 31/12/2022

patients that had a T1D (DM1) code opened after the T2D code during the follow-up.

Study period. From 01/01/2017 to 31/12/2022

Files included in this publication:

Datamodel_CONCEPT_DM2_diagram.png

Common data model specification (Datamodel_CONCEPT_DM2_v.0.1.0.xlsx)

Synthetic datasets (Datamodel_CONCEPT_DM2_sample_data)

sample_data1_dm_patient.csv

sample_data2_dm_param.csv

sample_data3_dm_patient.csv

sample_data4_dm_param.csv

sample_data5_dm_patient.csv

sample_data6_dm_param.csv

sample_data7_dm_param.csv

sample_data8_dm_param.csv

Datamodel_CONCEPT_DM2_explanation.pptx
d
Data for: Epidemiological landscape of Batrachochytrium dendrobatidis and...
datadryad.org
search.dataone.org
+1more
zip
Updated Dec 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. Delia Basanta; Julián A. Velasco; Constantino González-Salazar (2023). Data for: Epidemiological landscape of Batrachochytrium dendrobatidis and its impact on amphibian diversity at global scale [Dataset]. http://doi.org/10.5061/dryad.83bk3j9zv
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.83bk3j9zv
Dataset updated
Dec 13, 2023
Dataset provided by
Dryad
Authors
M. Delia Basanta; Julián A. Velasco; Constantino González-Salazar
Time period covered
Nov 26, 2023
Description
1. Title of Dataset: Epidemiological landscape of Batrachochytrium dendrobatidis and its impact on amphibian diversity at global scale

https://doi.org/10.5061/dryad.83bk3j9zv

2. Authors Information

M. Delia Basanta Department of Biology, University of Nevada Reno. Reno, Nevada, USA. delibasanta@gmail.com

Julián A. Velasco Instituto de Ciencias de la Atmósfera y Cambio Climático, Universidad Nacional Autónoma de México. Ciudad de México, México. javelasco@atmosfera.unam.mx

Constantino González-Salazar. Instituto de Ciencias de la Atmósfera y Cambio Climático, Universidad Nacional Autónoma de México. Ciudad de México, México. cgsalazar@atmosfera.unam.mx

3. Date of data collection (single date, range, approximate date): 2019-2022

4. Geographic location of data collection: Global

DATA & FILE OVERVIEW

1. File List:

Table S1.xls

Supplementary information S1.docx

Table S2.xlsx

Table S3.xlsx

Table S4.xlsx

DATA-SPECIFIC INFORMATION F...
M
Data from: Characterizing and classifying neuroendocrine neoplasms through...
datacatalog.mskcc.org
search.dataone.org
+1more
Updated Sep 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nanayakkara, Jina; Yang, Xiaojing; Tyryshkin, Kathrin; Wong, Justin J.M.; Vanderbeck, Kaitlin; Ginter, Paula S.; Scognamiglio, Theresa; Chen, Yao-Tseng; Panarelli, Nicole; Cheung, Nai-Kong; Dijk, Frederike; Ben-Dov, Iddo Z.; Kim, Michelle Kang; Singh, Simron; Morozov, Pavel; Max, Klaas E. A.; Tuschl, Thomas; Renwick, Neil (2023). Characterizing and classifying neuroendocrine neoplasms through microRNA sequencing and data mining [Dataset]. http://doi.org/10.5061/dryad.fn2z34tqj
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.fn2z34tqj
Dataset updated
Sep 19, 2023
Dataset provided by
MSK Library
Authors
Nanayakkara, Jina; Yang, Xiaojing; Tyryshkin, Kathrin; Wong, Justin J.M.; Vanderbeck, Kaitlin; Ginter, Paula S.; Scognamiglio, Theresa; Chen, Yao-Tseng; Panarelli, Nicole; Cheung, Nai-Kong; Dijk, Frederike; Ben-Dov, Iddo Z.; Kim, Michelle Kang; Singh, Simron; Morozov, Pavel; Max, Klaas E. A.; Tuschl, Thomas; Renwick, Neil
Description
From Dryad entry:

"Abstract
Neuroendocrine neoplasms (NENs) are clinically diverse and incompletely characterized cancers that are challenging to classify. MicroRNAs (miRNAs) are small regulatory RNAs that can be used to classify cancers. Recently, a morphology-based classification framework for evaluating NENs from different anatomic sites was proposed by experts, with the requirement of improved molecular data integration. Here, we compiled 378 miRNA expression profiles to examine NEN classification through comprehensive miRNA profiling and data mining. Following data preprocessing, our final study cohort included 221 NEN and 114 non-NEN samples, representing 15 NEN pathological types and five site-matched non-NEN control groups. Unsupervised hierarchical clustering of miRNA expression profiles clearly separated NENs from non-NENs. Comparative analyses showed that miR-375 and miR-7 expression is substantially higher in NEN cases than non-NEN controls. Correlation analyses showed that NENs from diverse anatomic sites have convergent miRNA expression programs, likely reflecting morphologic and functional similarities. Using machine learning approaches, we identified 17 miRNAs to discriminate 15 NEN pathological types and subsequently constructed a multi-layer classifier, correctly identifying 217 (98%) of 221 samples and overturning one histologic diagnosis. Through our research, we have identified common and type-specific miRNA tissue markers and constructed an accurate miRNA-based classifier, advancing our understanding of NEN diversity.

Methods
Sequencing-based miRNA expression profiles from 378 clinical samples, comprising 239 neuroendocrine neoplasm (NEN) cases and 139 site-matched non-NEN controls, were used in this study. Expression profiles were either compiled from published studies (n=149) or generated through small RNA sequencing (n=229). Prior to sequencing, total RNA was isolated from formalin-fixed paraffin-embedded (FFPE) tissue blocks or fresh-frozen (FF) tissue samples. Small RNA cDNA libraries were sequenced on HiSeq 2500 Illumina platforms using an established small RNA sequencing (Hafner et al., 2012 Methods) and sequence annotation pipeline (Brown et al., 2013 Front Genet) to generate miRNA expression profiles. Scaling our existing approach to miRNA-based NEN classification (Panarelli et al., 2019 Endocr Relat Cancer; Ren et al., 2017 Oncotarget), we constructed and cross-validated a multi-layer classifier for discriminating NEN pathological types based on selected miRNAs.

Usage notes
Diagnostic histopathology and small RNA cDNA library preparation information for all samples are presented in Table S1 of the associated manuscript."
e
Module IV
paper.erudition.co.in
html
Updated Nov 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Module IV [Dataset]. https://paper.erudition.co.in/makaut/btech-in-computer-science-and-engineering/7/data-warehousing-and-data-mining
Explore at:
htmlAvailable download formats
Dataset updated
Nov 23, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Module IV of Data Warehousing and Data Mining, 7th Semester , Computer Science and Engineering
d
Synthetic temporal dataset for temporal trend analysis and retrieval
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jing Ao; Kara Schatz; Rada Chirkova (2025). Synthetic temporal dataset for temporal trend analysis and retrieval [Dataset]. http://doi.org/10.5061/dryad.q573n5trf
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.q573n5trf
Dataset updated
Jul 31, 2025
Dataset provided by
Dryad Digital Repository
Authors
Jing Ao; Kara Schatz; Rada Chirkova
Time period covered
May 7, 2024
Description
This repository contains a synthetic, temporal data set that was generated by the authors by sampling values from the Gaussian distribution. The dataset contains eight nontemporal dimensions, a temporal dimension, and a numerical measure attribute. The data set was generated according to the scheme and procedure detailed in this source paper: Kaufmann, M., Fischer, P.M., May, N., Tonder, A., Kossmann, D. (2014). TPC-BiH: A Benchmark for Bitemporal Databases. In: Performance Characterization and Benchmarking. TPCTC 2013. Lecture Notes in Computer Science, vol 8391. Springer, Cham.Â The data set can be used for analyzing and locating temporal trends of interest, where a temporal trend is generated by selecting the desired values of the nontemporal dimensions, and then selecting the corresponding values of the temporal dimension and the numerical measure attribute. Locating temporal trends of interest, e.g., unusual trends, is a common task in many applications and domains. It can also be o..., , , # Synthetic temporal dataset for temporal trend analysis and retrieval

https://doi.org/10.5061/dryad.q573n5trf

The data set can be used for analyzing and locating temporal trends of interest, where a temporal trend is generated by selecting the desired values of the nontemporal dimensions, and then selecting the corresponding values of the temporal dimension and the numerical measure attribute. Locating temporal trends of interest, e.g., unusual trends, is a common task in many applications and domains. It can also be of interest to understand which nontemporal dimensions are associated with the temporal trends of interest. To this end, the data set can be used for analyzing and locating temporal trends in the data cube induced by the data set, e.g., retrieving outlier temporal trends using an outlier detector.Â

We generated the synthetic temporal data set [1], which contains up to 8 nontemporal dimensions, one temporal dimension, and a nume...
Examples of natural language in (a) mortality dataset, (b) aeromedical...
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neva J. Bull; Bridget Honan; Neil J. Spratt; Simon Quilty (2023). Examples of natural language in (a) mortality dataset, (b) aeromedical retrieval dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0284965.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0284965.t001
Dataset updated
Jun 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Neva J. Bull; Bridget Honan; Neil J. Spratt; Simon Quilty
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Examples of natural language in (a) mortality dataset, (b) aeromedical retrieval dataset.
Zenodo Open Metadata snapshot - Training dataset for records and communities...
zenodo.org
data.niaid.nih.gov
application/gzip, bin
Updated Dec 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo team; Zenodo team (2022). Zenodo Open Metadata snapshot - Training dataset for records and communities classifier building [Dataset]. http://doi.org/10.5281/zenodo.7438358
Explore at:
bin, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7438358
Dataset updated
Dec 15, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zenodo team; Zenodo team
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains Zenodo's published open access records and communities metadata, including entries marked by the Zenodo staff as spam and deleted.

The datasets are gzipped compressed JSON-lines files, where each line is a JSON object representation of a Zenodo record or community.

Records dataset

Filename: zenodo_open_metadata_{ date of export }.jsonl.gz

Each object contains the terms: part_of, thesis, description, doi, meeting, imprint, references, recid, alternate_identifiers, resource_type, journal, related_identifiers, title, subjects, notes, creators, communities, access_right, keywords, contributors, publication_date

which correspond to the fields with the same name available in Zenodo's record JSON Schema at https://zenodo.org/schemas/records/record-v1.0.0.json.

In addition, some terms have been altered:

The term files contains a list of dictionaries containing filetype, size, and filename only.

The term license contains a short Zenodo ID of the license (e.g. "cc-by").

Communities dataset

Filename: zenodo_community_metadata_{ date of export }.jsonl.gz

Each object contains the terms: id, title, description, curation_policy, page

which correspond to the fields with the same name available in Zenodo's community creation form.

Notes for all datasets

For each object the term spam contains a boolean value, determining whether a given record/community was marked as spam content by Zenodo staff.

Some values for the top-level terms, which were missing in the metadata may contain a null value.

A smaller uncompressed random sample of 200 JSON lines is also included for each dataset to test and get familiar with the format without having to download the entire dataset.
m
Wind Turbine Accident News (1980-2013)
data.mendeley.com
Updated Nov 27, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gurdal Ertek (2017). Wind Turbine Accident News (1980-2013) [Dataset]. http://doi.org/10.17632/jkjvmn9tz3.1
Explore at:
Unique identifier
https://doi.org/10.17632/jkjvmn9tz3.1
Dataset updated
Nov 27, 2017
Authors
Gurdal Ertek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data sets includes 216 news on 240 wind turbine accidents between the years 1980 and 2013. The analysis of this data set and the insights obtained are reported in the following research paper:

Asian, S., Ertek, G., Haksoz, C., Pakter, S. and Ulun, S., 2017. Wind turbine accidents: A data mining study. IEEE Systems Journal, 11(3), pp.1567-1578.

As of now, the most extensive data available on the Internet on wind turbines accidents is published by the Caithness Windfarm Information Forum (CWIF), a UK-based grassroots organization opposing wind turbine installations.

While the Caithness list is impressive in magnitude, the quality and reliability of the list is open to discussion because of the following reason:

Many of the web links to the news sources are not valid, and some of the accidents appear in multiple lines of the data.

In spite of containing much more magnitude of data, the data available in other online sources also exhibit similar deficiencies.

So, there are problems when it comes to using the Caithness data or other data in research studies. To this end, we collected data on wind turbine accidents ourselves, also using the data from Caithness and we share our collected data on this page (please click the link at the top of the page to download the data).

The data we collected consists of three folders, and a MS Excel file.

The folder News.txt contains the accident news, with each news in a separate text file:

The folder News.doc contains news, with each news in a separate MS Word file:

Finally, the folder News.doc.with.notes contains news, with each news in a separate MS Word file, but with extensive comments, explaining how the database in the MS Excel file was constructed:

The MS Excel file News.Database.xlsx contains the structured data created based on the detailed reading of the accident news text:

The MS Excel file is the file that was analyzed in our research paper.
O
ATP 312, NOTES ON THE PETROLEUM PROSPECTS, FOR DOMINION MINING AND OIL NL
data.qld.gov.au
Updated May 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geological Survey of Queensland (2023). ATP 312, NOTES ON THE PETROLEUM PROSPECTS, FOR DOMINION MINING AND OIL NL [Dataset]. https://www.data.qld.gov.au/dataset/cr011048
Explore at:
Dataset updated
May 10, 2023
Dataset authored and provided by
Geological Survey of Queensland
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
URL: https://geoscience.data.qld.gov.au/dataset/cr011048

ATP 312, NOTES ON THE PETROLEUM PROSPECTS, FOR DOMINION MINING AND OIL NL
Vale: A Leader in the Mining Industry (Forecast)
kappasignal.com
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KappaSignal (2023). Vale: A Leader in the Mining Industry (Forecast) [Dataset]. https://www.kappasignal.com/2023/06/vale-leader-in-mining-industry.html
Explore at:
Dataset updated
Jun 3, 2023
Dataset authored and provided by
KappaSignal
License
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
Description
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

Vale: A Leader in the Mining Industry

Financial data:

Historical daily stock prices (open, high, low, close, volume)

Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

Machine learning features:

Feature engineering based on financial data and technical indicators

Sentiment analysis data from social media and news articles

Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

Potential Applications:

Stock price prediction

Portfolio optimization

Algorithmic trading

Market sentiment analysis

Risk management

Use Cases:

Researchers investigating the effectiveness of machine learning in stock market prediction

Analysts developing quantitative trading Buy/Sell strategies

Individuals interested in building their own stock market prediction models

Students learning about machine learning and financial applications

Additional Notes:

The dataset may include different levels of granularity (e.g., daily, hourly)

Data cleaning and preprocessing are essential before model training

Regular updates are recommended to maintain the accuracy and relevance of the data
Practice-Based Evidence: Profiling the Safety of Cilostazol by Text-Mining...
plos.figshare.com
xlsx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas J. Leeper; Anna Bauer-Mehren; Srinivasan V. Iyer; Paea LePendu; Cliff Olson; Nigam H. Shah (2023). Practice-Based Evidence: Profiling the Safety of Cilostazol by Text-Mining of Clinical Notes [Dataset]. http://doi.org/10.1371/journal.pone.0063499
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0063499
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Nicholas J. Leeper; Anna Bauer-Mehren; Srinivasan V. Iyer; Paea LePendu; Cliff Olson; Nigam H. Shah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundPeripheral arterial disease (PAD) is a growing problem with few available therapies. Cilostazol is the only FDA-approved medication with a class I indication for intermittent claudication, but carries a black box warning due to concerns for increased cardiovascular mortality. To assess the validity of this black box warning, we employed a novel text-analytics pipeline to quantify the adverse events associated with Cilostazol use in a clinical setting, including patients with congestive heart failure (CHF).Methods and ResultsWe analyzed the electronic medical records of 1.8 million subjects from the Stanford clinical data warehouse spanning 18 years using a novel text-mining/statistical analytics pipeline. We identified 232 PAD patients taking Cilostazol and created a control group of 1,160 PAD patients not taking this drug using 1∶5 propensity-score matching. Over a mean follow up of 4.2 years, we observed no association between Cilostazol use and any major adverse cardiovascular event including stroke (OR = 1.13, CI [0.82, 1.55]), myocardial infarction (OR = 1.00, CI [0.71, 1.39]), or death (OR = 0.86, CI [0.63, 1.18]). Cilostazol was not associated with an increase in any arrhythmic complication. We also identified a subset of CHF patients who were prescribed Cilostazol despite its black box warning, and found that it did not increase mortality in this high-risk group of patients.ConclusionsThis proof of principle study shows the potential of text-analytics to mine clinical data warehouses to uncover ‘natural experiments’ such as the use of Cilostazol in CHF patients. We envision this method will have broad applications for examining difficult to test clinical hypotheses and to aid in post-marketing drug safety surveillance. Moreover, our observations argue for a prospective study to examine the validity of a drug safety warning that may be unnecessarily limiting the use of an efficacious therapy.
Anglo American: A Mining Titan's Next Chapter (AAL) (Forecast)
kappasignal.com
Updated Nov 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KappaSignal (2024). Anglo American: A Mining Titan's Next Chapter (AAL) (Forecast) [Dataset]. https://www.kappasignal.com/2024/11/anglo-american-mining-titans-next.html
Explore at:
Dataset updated
Nov 18, 2024
Dataset authored and provided by
KappaSignal
License
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
Description
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

Anglo American: A Mining Titan's Next Chapter (AAL)

Financial data:

Historical daily stock prices (open, high, low, close, volume)

Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

Machine learning features:

Feature engineering based on financial data and technical indicators

Sentiment analysis data from social media and news articles

Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

Potential Applications:

Stock price prediction

Portfolio optimization

Algorithmic trading

Market sentiment analysis

Risk management

Use Cases:

Researchers investigating the effectiveness of machine learning in stock market prediction

Analysts developing quantitative trading Buy/Sell strategies

Individuals interested in building their own stock market prediction models

Students learning about machine learning and financial applications

Additional Notes:

The dataset may include different levels of granularity (e.g., daily, hourly)

Data cleaning and preprocessing are essential before model training

Regular updates are recommended to maintain the accuracy and relevance of the data
Data from: BZ:TSXV Benz Mining Corp. (Forecast)
kappasignal.com
Updated May 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KappaSignal (2023). BZ:TSXV Benz Mining Corp. (Forecast) [Dataset]. https://www.kappasignal.com/2023/05/bztsxv-benz-mining-corp.html
Explore at:
Dataset updated
May 21, 2023
Dataset authored and provided by
KappaSignal
License
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
Description
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

BZ:TSXV Benz Mining Corp.

Financial data:

Historical daily stock prices (open, high, low, close, volume)

Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

Machine learning features:

Feature engineering based on financial data and technical indicators

Sentiment analysis data from social media and news articles

Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

Potential Applications:

Stock price prediction

Portfolio optimization

Algorithmic trading

Market sentiment analysis

Risk management

Use Cases:

Researchers investigating the effectiveness of machine learning in stock market prediction

Analysts developing quantitative trading Buy/Sell strategies

Individuals interested in building their own stock market prediction models

Students learning about machine learning and financial applications

Additional Notes:

The dataset may include different levels of granularity (e.g., daily, hourly)

Data cleaning and preprocessing are essential before model training

Regular updates are recommended to maintain the accuracy and relevance of the data
Harmony Mining: Digging for Gold or Digging a Hole? (HMY) (Forecast)
kappasignal.com
Updated Nov 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KappaSignal (2025). Harmony Mining: Digging for Gold or Digging a Hole? (HMY) (Forecast) [Dataset]. https://www.kappasignal.com/2024/02/harmony-mining-digging-for-gold-or.html
Explore at:
Dataset updated
Nov 25, 2025
Dataset authored and provided by
KappaSignal
License
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
Description
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

Harmony Mining: Digging for Gold or Digging a Hole? (HMY)

Financial data:

Historical daily stock prices (open, high, low, close, volume)

Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

Machine learning features:

Feature engineering based on financial data and technical indicators

Sentiment analysis data from social media and news articles

Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

Potential Applications:

Stock price prediction

Portfolio optimization

Algorithmic trading

Market sentiment analysis

Risk management

Use Cases:

Researchers investigating the effectiveness of machine learning in stock market prediction

Analysts developing quantitative trading Buy/Sell strategies

Individuals interested in building their own stock market prediction models

Students learning about machine learning and financial applications

Additional Notes:

The dataset may include different levels of granularity (e.g., daily, hourly)

Data cleaning and preprocessing are essential before model training

Regular updates are recommended to maintain the accuracy and relevance of the data
d
Data from: Stillwater Complex, Montana—logs of core drilled by Stillwater...
catalog.data.gov
data.usgs.gov
Updated Nov 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Stillwater Complex, Montana—logs of core drilled by Stillwater Mining Company and Anaconda Copper Corp. in the Stillwater Mine area, 1983 to 1989 [Dataset]. https://catalog.data.gov/dataset/stillwater-complex-montanalogs-of-core-drilled-by-stillwater-mining-company-and-anaconda-c
Explore at:
Dataset updated
Nov 26, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This dataset includes TIFF (Tagged Image File Format) images of graphic drill core logs showing associated drill core information, a TIFF image of the explanation for the lithology and structure sections of the logs, an Esri shapefile of the locations of the drill holes, and 12 .csv files of tabular data that were compiled from handwritten drill core logs. The drill core is from the Stillwater Mine area of the Stillwater Complex, Montana and was drilled from 1983 to 1989 by the Stillwater Mining Company and Anaconda Copper Corp. The data shown in the graphic drill logs and contained within the .csv files includes lithologic, structure, percent recovery, grain size, sulfide, nickel, copper, platinum, and palladium mineralization information. The graphic drill logs were created using Golden software's Strater 5 drill core visualization software and are provided with both logarithmic and linear scales where applicable. The graphic drill logs are plotted using the depth recorded in the drill logs and do not reflect stratigraphic true thickness. All instances of question marks ("?") represent original data as written by the geologist. In areas where the hand-written notes were unreadable, the notation of "[unreadable]" was used. See USGS SIR 2014-5183 (https://pubs.usgs.gov/sir/2014/5183/) for report and spatial data relating to the Stillwater Complex.
m
Data from: The extent and consequences of p-hacking in science
figshare.mq.edu.au
search.dataone.org
+3more
bin
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Megan L. Head; Luke Holman; Rob Lanfear; Andrew T. Kahn; Michael D. Jennions (2023). Data from: The extent and consequences of p-hacking in science [Dataset]. http://doi.org/10.5061/dryad.79d43
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.79d43
Dataset updated
Jun 13, 2023
Dataset provided by
Macquarie University
Authors
Megan L. Head; Luke Holman; Rob Lanfear; Andrew T. Kahn; Michael D. Jennions
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as “p-hacking,” occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured. This result suggests that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses.

Usage Notes Data from: The extent and consequences of p-hacking in scienceThis zip file consists of three parts. 1. Data obtained from text-mining and associated analysis files. 2. Data obtained from previously published meta-analyses and associated analysis files. 3. Analysis files used to conduct meta-analyses of the data. Read me files are contained within this zip file.FILES_FOR_DRYAD.zip
a
BMP Areas Unsuitable for Mining - Petitioned
pa-geo-data-pennmap.hub.arcgis.com
Updated Mar 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PA Department of Environmental Protection (2025). BMP Areas Unsuitable for Mining - Petitioned [Dataset]. https://pa-geo-data-pennmap.hub.arcgis.com/datasets/PADEP-1::bmp-areas-unsuitable-for-mining-petitioned
Explore at:
Dataset updated
Mar 11, 2025
Dataset authored and provided by
PA Department of Environmental Protection
Area covered

Description
OBJECTID ObjectIDSHAPE ESRI Geometry FieldNAME_PETITION Name assigned to the area petitioned to be designated as unsuitable for mining.PETITIONER The entity that submitted the petition.PETITIONID An identification number assigned to the area petitioned to be designated as unsuitable for mining.DATE_RECEIVED The date the Department received the petition to designate the area as unsuitable for mining.COUNTY The county the area is located in.PETITIONSTATUS Current status of the petition review.DATE_FINAL The date the Department made a final action on the petition.ACRES_PETITIONED Acreage of area petitioned to be designated as unsuitable for mining.ACRES_DESIGNATED Acreage of area designated as unsuitable for mining during review.ACRES_COAL Acreage of coal field extents inside the area designated as unsuitable for mining.ACRES_GIS Acreage of area calculated in GIS using PA Albers Equal Area Conic projectionSQMILE_GIS Square miles of area calculated in GIS using PA Albers Equal Area Conic projection.NOTES Additional notes.SHAPE.AREA GIS Area in native map unitsSHAPE.LEN Length/Perimeter in native map units

Facebook

Twitter

Click to copy link

Link copied

Cite

Einetic (2025). Data Warehousing and Data Mining (Old), 7th Semester, Computer Science and Engineering, MAKAUT | Erudition Paper [Dataset]. https://paper.erudition.co.in/makaut/btech-in-computer-science-and-engineering/7/data-warehousing-and-data-mining

Data Warehousing and Data Mining (Old), 7th Semester, Computer Science and Engineering, MAKAUT | Erudition Paper

Explore at:

htmlAvailable download formats

Dataset updated

Nov 23, 2025

Dataset authored and provided by

Einetic

License

https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms

Description

Question Paper Solutions of Data Warehousing and Data Mining (Old),7th Semester,Computer Science and Engineering,Maulana Abul Kalam Azad University of Technology

Clear search

Close search

Google apps

Main menu

Data Warehousing and Data Mining (Old), 7th Semester, Computer Science and...

Module II

Musical Chords and Image Descriptors from Film Fantasia (Disney)

Data from: CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2...

Data for: Epidemiological landscape of Batrachochytrium dendrobatidis and...

1. Title of Dataset: Epidemiological landscape of Batrachochytrium dendrobatidis and its impact on amphibian diversity at global scale

2. Authors Information

3. Date of data collection (single date, range, approximate date): 2019-2022

4. Geographic location of data collection: Global

DATA & FILE OVERVIEW

1. File List:

DATA-SPECIFIC INFORMATION F...

Data from: Characterizing and classifying neuroendocrine neoplasms through...

Module IV

Synthetic temporal dataset for temporal trend analysis and retrieval

Examples of natural language in (a) mortality dataset, (b) aeromedical...

Zenodo Open Metadata snapshot - Training dataset for records and communities...

Wind Turbine Accident News (1980-2013)

ATP 312, NOTES ON THE PETROLEUM PROSPECTS, FOR DOMINION MINING AND OIL NL

Vale: A Leader in the Mining Industry (Forecast)

Vale: A Leader in the Mining Industry

Financial data:

Machine learning features:

Potential Applications:

Use Cases:

Additional Notes:

Practice-Based Evidence: Profiling the Safety of Cilostazol by Text-Mining...

Anglo American: A Mining Titan's Next Chapter (AAL) (Forecast)

Anglo American: A Mining Titan's Next Chapter (AAL)

Financial data:

Machine learning features:

Potential Applications:

Use Cases:

Additional Notes:

Data from: BZ:TSXV Benz Mining Corp. (Forecast)

BZ:TSXV Benz Mining Corp.

Financial data:

Machine learning features:

Potential Applications:

Use Cases:

Additional Notes:

Harmony Mining: Digging for Gold or Digging a Hole? (HMY) (Forecast)

Harmony Mining: Digging for Gold or Digging a Hole? (HMY)

Financial data:

Machine learning features:

Potential Applications:

Use Cases:

Additional Notes:

Data from: Stillwater Complex, Montana—logs of core drilled by Stillwater...

Data from: The extent and consequences of p-hacking in science

BMP Areas Unsuitable for Mining - Petitioned

Data Warehousing and Data Mining (Old), 7th Semester, Computer Science and Engineering, MAKAUT | Erudition PaperSee More Versions

Data Warehousing and Data Mining (Old), 7th Semester, Computer Science and Engineering, MAKAUT | Erudition Paper