Facebook
Twitterhttps://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of Data Warehousing and Data Mining (Old),7th Semester,Computer Science and Engineering,Maulana Abul Kalam Azad University of Technology
Facebook
Twitterhttps://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Module II of Data Warehousing and Data Mining, 7th Semester , Computer Science and Engineering
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
FANTASIAThis repository contains the data related to image descriptors and sounds associated with a selection of frames of the film Fantasia, produced by Disney.AboutThis repository contains data used in a doctoral thesis for the automatic composition of descriptive music. The information is extracted from the fragment of The Nutcracker from film Fantasia (Disney, 1940) using SIFT and BoVW, color quantization and CENS. Data- Attributes 1-50: weighted vector of visual words.- Attributes 51-59: red, green and blue values for three RGB colors.- Note1, note2 and note3: MIDI notes related to each frame of the film.LicenseData is available under MIT License. To make use of the data the article must be cited.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Technical notes and documentation on the common data model of the project CONCEPT-DM2.
This publication corresponds to the Common Data Model (CDM) specification of the CONCEPT-DM2 project for the implementation of a federated network analysis of the healthcare pathway of type 2 diabetes.
Aims of the CONCEPT-DM2 project:
General aim: To analyse chronic care effectiveness and efficiency of care pathways in diabetes, assuming the relevance of care pathways as independent factors of health outcomes using data from real life world (RWD) from five Spanish Regional Health Systems.
Main specific aims:
Study Design: It is a population-based retrospective observational study centered on all T2D patients diagnosed in five Regional Health Services within the Spanish National Health Service. We will include all the contacts of these patients with the health services using the electronic medical record systems including Primary Care data, Specialized Care data, Hospitalizations, Urgent Care data, Pharmacy Claims, and also other registers such as the mortality and the population register.
Cohort definition: All patients with code of Type 2 Diabetes in the clinical health records
Files included in this publication:
Facebook
Twitterhttps://doi.org/10.5061/dryad.83bk3j9zv
M. Delia Basanta Department of Biology, University of Nevada Reno. Reno, Nevada, USA. delibasanta@gmail.com
Julián A. Velasco Instituto de Ciencias de la Atmósfera y Cambio Climático, Universidad Nacional Autónoma de México. Ciudad de México, México. javelasco@atmosfera.unam.mx
Constantino González-Salazar. Instituto de Ciencias de la Atmósfera y Cambio Climático, Universidad Nacional Autónoma de México. Ciudad de México, México. cgsalazar@atmosfera.unam.mx
Facebook
TwitterFrom Dryad entry:
"Abstract
Neuroendocrine neoplasms (NENs) are clinically diverse and incompletely characterized cancers that are challenging to classify. MicroRNAs (miRNAs) are small regulatory RNAs that can be used to classify cancers. Recently, a morphology-based classification framework for evaluating NENs from different anatomic sites was proposed by experts, with the requirement of improved molecular data integration. Here, we compiled 378 miRNA expression profiles to examine NEN classification through comprehensive miRNA profiling and data mining. Following data preprocessing, our final study cohort included 221 NEN and 114 non-NEN samples, representing 15 NEN pathological types and five site-matched non-NEN control groups. Unsupervised hierarchical clustering of miRNA expression profiles clearly separated NENs from non-NENs. Comparative analyses showed that miR-375 and miR-7 expression is substantially higher in NEN cases than non-NEN controls. Correlation analyses showed that NENs from diverse anatomic sites have convergent miRNA expression programs, likely reflecting morphologic and functional similarities. Using machine learning approaches, we identified 17 miRNAs to discriminate 15 NEN pathological types and subsequently constructed a multi-layer classifier, correctly identifying 217 (98%) of 221 samples and overturning one histologic diagnosis. Through our research, we have identified common and type-specific miRNA tissue markers and constructed an accurate miRNA-based classifier, advancing our understanding of NEN diversity.
Methods
Sequencing-based miRNA expression profiles from 378 clinical samples, comprising 239 neuroendocrine neoplasm (NEN) cases and 139 site-matched non-NEN controls, were used in this study. Expression profiles were either compiled from published studies (n=149) or generated through small RNA sequencing (n=229). Prior to sequencing, total RNA was isolated from formalin-fixed paraffin-embedded (FFPE) tissue blocks or fresh-frozen (FF) tissue samples. Small RNA cDNA libraries were sequenced on HiSeq 2500 Illumina platforms using an established small RNA sequencing (Hafner et al., 2012 Methods) and sequence annotation pipeline (Brown et al., 2013 Front Genet) to generate miRNA expression profiles. Scaling our existing approach to miRNA-based NEN classification (Panarelli et al., 2019 Endocr Relat Cancer; Ren et al., 2017 Oncotarget), we constructed and cross-validated a multi-layer classifier for discriminating NEN pathological types based on selected miRNAs.
Usage notes
Diagnostic histopathology and small RNA cDNA library preparation information for all samples are presented in Table S1 of the associated manuscript."
Facebook
Twitterhttps://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Module IV of Data Warehousing and Data Mining, 7th Semester , Computer Science and Engineering
Facebook
TwitterThis repository contains a synthetic, temporal data set that was generated by the authors by sampling values from the Gaussian distribution. The dataset contains eight nontemporal dimensions, a temporal dimension, and a numerical measure attribute. The data set was generated according to the scheme and procedure detailed in this source paper: Kaufmann, M., Fischer, P.M., May, N., Tonder, A., Kossmann, D. (2014). TPC-BiH: A Benchmark for Bitemporal Databases. In: Performance Characterization and Benchmarking. TPCTC 2013. Lecture Notes in Computer Science, vol 8391. Springer, Cham. The data set can be used for analyzing and locating temporal trends of interest, where a temporal trend is generated by selecting the desired values of the nontemporal dimensions, and then selecting the corresponding values of the temporal dimension and the numerical measure attribute. Locating temporal trends of interest, e.g., unusual trends, is a common task in many applications and domains. It can also be o..., , , # Synthetic temporal dataset for temporal trend analysis and retrieval
https://doi.org/10.5061/dryad.q573n5trf
The data set can be used for analyzing and locating temporal trends of interest, where a temporal trend is generated by selecting the desired values of the nontemporal dimensions, and then selecting the corresponding values of the temporal dimension and the numerical measure attribute. Locating temporal trends of interest, e.g., unusual trends, is a common task in many applications and domains. It can also be of interest to understand which nontemporal dimensions are associated with the temporal trends of interest. To this end, the data set can be used for analyzing and locating temporal trends in the data cube induced by the data set, e.g., retrieving outlier temporal trends using an outlier detector.Â
We generated the synthetic temporal data set [1], which contains up to 8 nontemporal dimensions, one temporal dimension, and a nume...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Examples of natural language in (a) mortality dataset, (b) aeromedical retrieval dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains Zenodo's published open access records and communities metadata, including entries marked by the Zenodo staff as spam and deleted.
The datasets are gzipped compressed JSON-lines files, where each line is a JSON object representation of a Zenodo record or community.
Records dataset
Filename: zenodo_open_metadata_{ date of export }.jsonl.gz
Each object contains the terms: part_of, thesis, description, doi, meeting, imprint, references, recid, alternate_identifiers, resource_type, journal, related_identifiers, title, subjects, notes, creators, communities, access_right, keywords, contributors, publication_date
which correspond to the fields with the same name available in Zenodo's record JSON Schema at https://zenodo.org/schemas/records/record-v1.0.0.json.
In addition, some terms have been altered:
Communities dataset
Filename: zenodo_community_metadata_{ date of export }.jsonl.gz
Each object contains the terms: id, title, description, curation_policy, page
which correspond to the fields with the same name available in Zenodo's community creation form.
Notes for all datasets
For each object the term spam contains a boolean value, determining whether a given record/community was marked as spam content by Zenodo staff.
Some values for the top-level terms, which were missing in the metadata may contain a null value.
A smaller uncompressed random sample of 200 JSON lines is also included for each dataset to test and get familiar with the format without having to download the entire dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data sets includes 216 news on 240 wind turbine accidents between the years 1980 and 2013. The analysis of this data set and the insights obtained are reported in the following research paper:
Asian, S., Ertek, G., Haksoz, C., Pakter, S. and Ulun, S., 2017. Wind turbine accidents: A data mining study. IEEE Systems Journal, 11(3), pp.1567-1578.
As of now, the most extensive data available on the Internet on wind turbines accidents is published by the Caithness Windfarm Information Forum (CWIF), a UK-based grassroots organization opposing wind turbine installations.
While the Caithness list is impressive in magnitude, the quality and reliability of the list is open to discussion because of the following reason:
In spite of containing much more magnitude of data, the data available in other online sources also exhibit similar deficiencies.
So, there are problems when it comes to using the Caithness data or other data in research studies. To this end, we collected data on wind turbine accidents ourselves, also using the data from Caithness and we share our collected data on this page (please click the link at the top of the page to download the data).
The data we collected consists of three folders, and a MS Excel file.
The folder News.txt contains the accident news, with each news in a separate text file:
The folder News.doc contains news, with each news in a separate MS Word file:
Finally, the folder News.doc.with.notes contains news, with each news in a separate MS Word file, but with extensive comments, explaining how the database in the MS Excel file was constructed:
The MS Excel file News.Database.xlsx contains the structured data created based on the detailed reading of the accident news text:
The MS Excel file is the file that was analyzed in our research paper.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
URL: https://geoscience.data.qld.gov.au/dataset/cr011048
ATP 312, NOTES ON THE PETROLEUM PROSPECTS, FOR DOMINION MINING AND OIL NL
Facebook
Twitterhttps://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundPeripheral arterial disease (PAD) is a growing problem with few available therapies. Cilostazol is the only FDA-approved medication with a class I indication for intermittent claudication, but carries a black box warning due to concerns for increased cardiovascular mortality. To assess the validity of this black box warning, we employed a novel text-analytics pipeline to quantify the adverse events associated with Cilostazol use in a clinical setting, including patients with congestive heart failure (CHF).Methods and ResultsWe analyzed the electronic medical records of 1.8 million subjects from the Stanford clinical data warehouse spanning 18 years using a novel text-mining/statistical analytics pipeline. We identified 232 PAD patients taking Cilostazol and created a control group of 1,160 PAD patients not taking this drug using 1∶5 propensity-score matching. Over a mean follow up of 4.2 years, we observed no association between Cilostazol use and any major adverse cardiovascular event including stroke (OR = 1.13, CI [0.82, 1.55]), myocardial infarction (OR = 1.00, CI [0.71, 1.39]), or death (OR = 0.86, CI [0.63, 1.18]). Cilostazol was not associated with an increase in any arrhythmic complication. We also identified a subset of CHF patients who were prescribed Cilostazol despite its black box warning, and found that it did not increase mortality in this high-risk group of patients.ConclusionsThis proof of principle study shows the potential of text-analytics to mine clinical data warehouses to uncover ‘natural experiments’ such as the use of Cilostazol in CHF patients. We envision this method will have broad applications for examining difficult to test clinical hypotheses and to aid in post-marketing drug safety surveillance. Moreover, our observations argue for a prospective study to examine the validity of a drug safety warning that may be unnecessarily limiting the use of an efficacious therapy.
Facebook
Twitterhttps://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
Facebook
Twitterhttps://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
Facebook
Twitterhttps://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
Facebook
TwitterThis dataset includes TIFF (Tagged Image File Format) images of graphic drill core logs showing associated drill core information, a TIFF image of the explanation for the lithology and structure sections of the logs, an Esri shapefile of the locations of the drill holes, and 12 .csv files of tabular data that were compiled from handwritten drill core logs. The drill core is from the Stillwater Mine area of the Stillwater Complex, Montana and was drilled from 1983 to 1989 by the Stillwater Mining Company and Anaconda Copper Corp. The data shown in the graphic drill logs and contained within the .csv files includes lithologic, structure, percent recovery, grain size, sulfide, nickel, copper, platinum, and palladium mineralization information. The graphic drill logs were created using Golden software's Strater 5 drill core visualization software and are provided with both logarithmic and linear scales where applicable. The graphic drill logs are plotted using the depth recorded in the drill logs and do not reflect stratigraphic true thickness. All instances of question marks ("?") represent original data as written by the geologist. In areas where the hand-written notes were unreadable, the notation of "[unreadable]" was used. See USGS SIR 2014-5183 (https://pubs.usgs.gov/sir/2014/5183/) for report and spatial data relating to the Stillwater Complex.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as “p-hacking,” occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant. Here, we use text-mining to demonstrate that p-hacking is widespread throughout science. We then illustrate how one can test for p-hacking when performing a meta-analysis and show that, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured. This result suggests that p-hacking probably does not drastically alter scientific consensuses drawn from meta-analyses.
Usage Notes Data from: The extent and consequences of p-hacking in scienceThis zip file consists of three parts. 1. Data obtained from text-mining and associated analysis files. 2. Data obtained from previously published meta-analyses and associated analysis files. 3. Analysis files used to conduct meta-analyses of the data. Read me files are contained within this zip file.FILES_FOR_DRYAD.zip
Facebook
TwitterOBJECTID ObjectIDSHAPE ESRI Geometry FieldNAME_PETITION Name assigned to the area petitioned to be designated as unsuitable for mining.PETITIONER The entity that submitted the petition.PETITIONID An identification number assigned to the area petitioned to be designated as unsuitable for mining.DATE_RECEIVED The date the Department received the petition to designate the area as unsuitable for mining.COUNTY The county the area is located in.PETITIONSTATUS Current status of the petition review.DATE_FINAL The date the Department made a final action on the petition.ACRES_PETITIONED Acreage of area petitioned to be designated as unsuitable for mining.ACRES_DESIGNATED Acreage of area designated as unsuitable for mining during review.ACRES_COAL Acreage of coal field extents inside the area designated as unsuitable for mining.ACRES_GIS Acreage of area calculated in GIS using PA Albers Equal Area Conic projectionSQMILE_GIS Square miles of area calculated in GIS using PA Albers Equal Area Conic projection.NOTES Additional notes.SHAPE.AREA GIS Area in native map unitsSHAPE.LEN Length/Perimeter in native map units
Facebook
Twitterhttps://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of Data Warehousing and Data Mining (Old),7th Semester,Computer Science and Engineering,Maulana Abul Kalam Azad University of Technology