39 datasets found
  1. m

    Raw data outputs 1-18

    • bridges.monash.edu
    • researchdata.edu.au
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie (2023). Raw data outputs 1-18 [Dataset]. http://doi.org/10.26180/21259491.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Monash University
    Authors
    Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.

  2. Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  3. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  4. MSR Challenge Excel Figures

    • figshare.com
    png
    Updated Feb 20, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke Soldano (2016). MSR Challenge Excel Figures [Dataset]. http://doi.org/10.6084/m9.figshare.2504173.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    Feb 20, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Luke Soldano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Figures for the paper "The Relationship between Commit Message Detail and Defect Proneness in Java Projects on GitHub" submitted to the MSR 2016 Data Mining Challenge. These figures show the number of available Java projects with certain constraints applied. In particular, these constraints are number of contributors to the repository and number of commits to that repository.

  5. m

    Wind Turbine Accident News (1980-2013)

    • data.mendeley.com
    Updated Nov 27, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gurdal Ertek (2017). Wind Turbine Accident News (1980-2013) [Dataset]. http://doi.org/10.17632/jkjvmn9tz3.1
    Explore at:
    Dataset updated
    Nov 27, 2017
    Authors
    Gurdal Ertek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data sets includes 216 news on 240 wind turbine accidents between the years 1980 and 2013. The analysis of this data set and the insights obtained are reported in the following research paper:

    Asian, S., Ertek, G., Haksoz, C., Pakter, S. and Ulun, S., 2017. Wind turbine accidents: A data mining study. IEEE Systems Journal, 11(3), pp.1567-1578.

    As of now, the most extensive data available on the Internet on wind turbines accidents is published by the Caithness Windfarm Information Forum (CWIF), a UK-based grassroots organization opposing wind turbine installations.

    While the Caithness list is impressive in magnitude, the quality and reliability of the list is open to discussion because of the following reason:

    • Many of the web links to the news sources are not valid, and some of the accidents appear in multiple lines of the data.

    In spite of containing much more magnitude of data, the data available in other online sources also exhibit similar deficiencies.

    So, there are problems when it comes to using the Caithness data or other data in research studies. To this end, we collected data on wind turbine accidents ourselves, also using the data from Caithness and we share our collected data on this page (please click the link at the top of the page to download the data).

    The data we collected consists of three folders, and a MS Excel file.

    The folder News.txt contains the accident news, with each news in a separate text file:

    The folder News.doc contains news, with each news in a separate MS Word file:

    Finally, the folder News.doc.with.notes contains news, with each news in a separate MS Word file, but with extensive comments, explaining how the database in the MS Excel file was constructed:

    The MS Excel file News.Database.xlsx contains the structured data created based on the detailed reading of the accident news text:

    The MS Excel file is the file that was analyzed in our research paper.

  6. Additional file 1 of msBiodat analysis tool, big data analysis for...

    • springernature.figshare.com
    bin
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pau Muñoz-Torres; Filip Rokć; Robert Belužic; Ivana Grbeša; Oliver Vugrek (2023). Additional file 1 of msBiodat analysis tool, big data analysis for high-throughput experiments [Dataset]. http://doi.org/10.6084/m9.figshare.c.3645041_D1.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Pau Muñoz-Torres; Filip Rokć; Robert Belužic; Ivana Grbeša; Oliver Vugrek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Excel spreadsheets. XLSX file containing the data from Sousa Abreu et al. which is used in the example of the article. (XLSX 611 kb)

  7. m

    A brief dataset highlighting online learning test scores of Bangladeshi...

    • data.mendeley.com
    Updated Feb 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shabab Rahman (2024). A brief dataset highlighting online learning test scores of Bangladeshi high-school students [Dataset]. http://doi.org/10.17632/g88h8vz9kg.2
    Explore at:
    Dataset updated
    Feb 6, 2024
    Authors
    Shabab Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh
    Description

    Purposive sampling was the method we chose to collect the data. We obtained information from two after-school coaching programs that voluntarily provided their online learning data to us in 2020 during the pandemic. Batches of 45 and 75 students each were used to organize the data, which were then combined to create a single dataset with 399 entries. Two phases of collection took place: on January 17, 2023, and on February 12, 2023. The initial data recording was done using Google Learning Management System's Google Classroom. The data was then exported to local storage by the classroom faculties and then passed onto the researchers. Excel was used to organize the data, with rows representing individual students and columns representing different topics. The dataset, which consists of four mock tests and sixteen physics topics, was gathered from grade 10 physics instructors and students. Every pupil was given a unique ID to protect their privacy, resulting in 399 distinct entries overall. The coaching institution standardized the dataset to score it out of 100 for consistency. It is important to note that for students who did not take the majority of the exams, the institutions did not gather or transmit missing data. The dataset displays a spread with a standard deviation of 20.5 and an average score of 69.547.

  8. d

    2011–2016 Single Well Aquifer Tests: Pumping Schedules, Water-Level Data in...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). 2011–2016 Single Well Aquifer Tests: Pumping Schedules, Water-Level Data in Aquifer Test Wells, and Analysis Results from Tests Conducted near Long Canyon, Goshute Valley, Northeastern Nevada [Dataset]. https://catalog.data.gov/dataset/20112016-single-well-aquifer-tests-pumping-schedules-water-level-data-in-aquifer-test-well
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Goshute Valley, Nevada
    Description

    This dataset presents tabular data and Excel workbooks used to analyze single-well aquifer tests in pumping wells and slug tests in monitoring wells near Long Canyon. The data also include pdf outputs from the analysis program, Aqtesolv (Duffield, 2007). The data are presented in two zipped files, (1) single-well aquifer tests in pumping wells and (2) slug tests in monitoring wells. The slug-test data were supplied by Newmont Mining Corporation and collected by Golder and Associates in 2011. Reference Cited: Duffield, G.M., 2007, AQTESOLV for windows: Version 4.5 User’s Guide, HydroSOLV, Inc. Reston, VA, p. 530, at, http://www.aqtesolv.com/download/aqtw20070719.pdf.

  9. Excel file figs 2, 3, 4, 5, 6, 7 and 10.

    • plos.figshare.com
    xlsx
    Updated Jun 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hossein Bibak; Farzad Heydari; Mohammad Sadat-Hosseini (2024). Excel file figs 2, 3, 4, 5, 6, 7 and 10. [Dataset]. http://doi.org/10.1371/journal.pone.0303229.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Hossein Bibak; Farzad Heydari; Mohammad Sadat-Hosseini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The present study recorded indigenous knowledge of medicinal plants in Shahrbabak, Iran. We described a method using data mining algorithms to predict medicinal plants’ mode of application. Twenty-oneindividuals aged 28 to 81 were interviewed. Firstly, data were collected and analyzed based on quantitative indices such as the informant consensus factor (ICF), the cultural importance index (CI), and the relative frequency of citation (RFC). Secondly, the data was classified by support vector machines, J48 decision trees, neural networks, and logistic regression. So, 141 medicinal plants from 43 botanical families were documented. Lamiaceae, with 18 species, was the dominant family among plants, and plant leaves were most frequently used for medicinal purposes. The decoction was the most commonly used preparation method (56%), and therophytes were the most dominant (48.93%) among plants. Regarding the RFC index, the most important species are Adiantum capillus-veneris L. and Plantago ovata Forssk., while Artemisia auseri Boiss. ranked first based on the CI index. The ICF index demonstrated that metabolic disorders are the most common problems among plants in the Shahrbabak region. Finally, the J48 decision tree algorithm consistently outperforms other methods, achieving 95% accuracy in 10-fold cross-validation and 70–30 data split scenarios. The developed model detects with maximum accuracy how to consume medicinal plants.

  10. Credit Rating Precision

    • kaggle.com
    zip
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prashant Gadhiya (2024). Credit Rating Precision [Dataset]. https://www.kaggle.com/datasets/prashantgadhiya/credit-rating-precision
    Explore at:
    zip(1100228 bytes)Available download formats
    Dataset updated
    Feb 14, 2024
    Authors
    Prashant Gadhiya
    Description

    Within the realm of data mining and analytics, this carefully curated dataset, hosted on Kaggle, stands as an invaluable resource for educational purposes. With a substantial volume of 15,000 records, this dataset is an open-source treasure trove, devoid of copyright restrictions, expressly designed to empower students and analysts in their pursuit of excellence in data mining and analytics. The dataset's primary focus lies in predicting Credit Scores, utilizing a binary variable to distinguish between "good" and "bad" credit ratings. It spans a diverse range of information types, incorporating nominal, continuous, ordinal, and binary variables to provide a comprehensive understanding of creditworthiness. As we embark on this educational journey, the dataset serves as a foundation for building predictive models, including but not limited to Logistics, CHAID, CART, as well as other notable models such as Random Forest, Support Vector Machines (SVM), and Gradient Boosting. By encompassing a broad spectrum of models, we aim to offer students and analysts a holistic view of various data mining techniques and their applications. The overarching goal remains to equip individuals with the skills and knowledge necessary to excel in the dynamic fields of data mining and analytics.

  11. Vrinda Store Analysis with Excel and DashBoard

    • kaggle.com
    zip
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Balu Desineti (2024). Vrinda Store Analysis with Excel and DashBoard [Dataset]. https://www.kaggle.com/datasets/baludesineti/vrinda-store-analysis-with-excel-and-dashboard
    Explore at:
    zip(10546227 bytes)Available download formats
    Dataset updated
    Jul 11, 2024
    Authors
    Balu Desineti
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Vrinda Store Data Analysis using Advance Excel, In this Dataset Cleaning the dataset and data mining remove the null value and using the Hlookup & Vlookup,Match,Index Pivot Tables and using the Chats to crated a beautiful DashBoard.

  12. a

    South Fork Cherry River Water Quality

    • conservation-abra.hub.arcgis.com
    Updated Feb 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allegheny-Blue Ridge Alliance (2023). South Fork Cherry River Water Quality [Dataset]. https://conservation-abra.hub.arcgis.com/maps/3b366a6bc44e4392847b71ec82038173
    Explore at:
    Dataset updated
    Feb 22, 2023
    Dataset authored and provided by
    Allegheny-Blue Ridge Alliance
    Area covered
    Description

    Purpose:This feature layer describes water quality sampling data performed at several operating coal mines in the South Fork of Cherry watershed, West Virginia.Source & Data:Data was downloaded from WV Department of Environmental Protection's ApplicationXtender online database and EPA's ECHO online database between January and April, 2023.There are five data sets here: Surface Water Monitoring Sites, which contains basic information about monitoring sites (name, lat/long, etc.) and NPDES Outlet Monitoring Sites, which contains similar information about outfall discharges surrounding the active mines. Biological Assessment Stations (BAS) contain similar information for pre-project biological sampling. NOV Summary contains locations of Notices of Violation received by South Fork Coal Company from WV Department of Environmental Protection. The Quarterly Monitoring Reports table contains the sampling data for the Surface Water Monitoring Sites, which actually goes as far back as 2018 for some mines. Parameters of concern include iron, aluminum and selenium, among others.A relationship class between Surface Water Monitoring Sites and the Quarterly Monitoring Reports allows access to individual sample results.Processing:Notices of Violation were obtained from the WV DEP AppXtender database for Mining and Reclamation Article 3 (SMCRA) Permitting, and Mining and Reclamation NPDES Permitting. Violation data were entered into Excel and loaded into ArcGIS Pro as a CSV text file with Lat/Long coordinates for each Violation. The CSV file was converted to a point feature class.Water quality data were downloaded in PDF format from the WVDEP AppXtender website. Non-searchable PDFs were converted via Optical Character Recognition, so that data could be copied. Sample results were copied and pasted manually to Notepad++, and several columns were re-ordered. Data was grouped by sample station and sorted chronologically. Sample data, contained in the associated table (SW_QM_Reports) were linked back to the monitoring station locations using the Station_ID text field in a geodatabase relationship class.Water monitoring station locations were taken from published Drainage Maps and from water quality reports. A CSV table was created with station Lat/Long locations and loaded into ArcGIS Pro. It was then converted to a point feature class.Stream Crossings and Road Construction Areas were digitized as polygon feature classes from project Drainage and Progress maps that were converted to TIFF image format from PDF and georeferenced.The ArcGIS Pro map - South Fork Cherry River Water Quality, was published as a service definition to ArcGIS Online.Symbology:NOV Summary - dark blue, solid pointLost Flats Surface Water Monitoring Sites: Data Available - medium blue point, black outlineLost Flats Surface Water Monitoring Sites: No Data Available - no-fill point, thick medium blue outlineLost Flats NPDES Outlet Monitoring Sites - orange point, black outlineBlue Knob Surface Water Monitoring Sites: Data Available - medium blue point, black outlineBlue Knob Surface Water Monitoring Sites: No Data Available - no-fill point, thick medium blue outlineBlue Knob NPDES Outlet Monitoring Sites - orange point, black outlineBlue Knob Biological Assessment Stations: Data Available - medium green point, black outlineBlue Knob Biological Assessment Stations: No Data Available - no-fill point, thick medium green outlineRocky Run Surface Water Monitoring Sites: Data Available - medium blue point, black outlineRocky Run Surface Water Monitoring Sites: No Data Available - no-fill point, thick medium blue outlineRocky Run NPDES Outlet Monitoring Sites - orange point, black outlineRocky Run Biological Assessment Stations: Data Available - medium green point, black outlineRocky Run Biological Assessment Stations: No Data Available - no-fill point, thick medium green outlineRocky Run Stream Crossings: turquoise blue polygon with red outlineRocky Run Haul Road Construction Areas: dark red (40% transparent) polygon with black outlineHaul Road No 2 Surface Water Monitoring Sites: Data Available - medium blue point, black outlineHaul Road No 2 Surface Water Monitoring Sites: No Data Available - no-fill point, thick medium blue outlineHaul Road No 2 NPDES Outlet Monitoring Sites - orange point, black outline

  13. Z

    Mining and Extractivism Records Data for Bibliometric Analysis (Scopus...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Nov 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zampier, Maika (2021). Mining and Extractivism Records Data for Bibliometric Analysis (Scopus database 1992-2020) [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_5638787
    Explore at:
    Dataset updated
    Nov 2, 2021
    Authors
    Zampier, Maika
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset file export from scopus database and the dataset file export as bibliometrix file on excel format from biblioshiny.

  14. s

    In-Air Hand-Drawn Number and Shape Dataset

    • orda.shef.ac.uk
    zip
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Basheer Alwaely; Charith Abhayaratne (2025). In-Air Hand-Drawn Number and Shape Dataset [Dataset]. http://doi.org/10.15131/shef.data.7381472.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 14, 2025
    Dataset provided by
    The University of Sheffield
    Authors
    Basheer Alwaely; Charith Abhayaratne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains in-air hand-written numbers and shapes data used in the paper:B. Alwaely and C. Abhayaratne, "Graph Spectral Domain Feature Learning With Application to in-Air Hand-Drawn Number and Shape Recognition," in IEEE Access, vol. 7, pp. 159661-159673, 2019, doi: 10.1109/ACCESS.2019.2950643.The dataset contains the following:-Readme.txt- InAirNumberShapeDataset.zip containing-Number Folder (With 2 sub folders for Matlab and Excel)-Shapes Folder (With 2 sub folders for Matlab and Excel)The datasets include the in-air drawn number and shape hand movement path captured by a Kinect sensor. The number sub dataset includes 500 instances per each number 0 to 9, resulting in a total of 5000 number data instances. Similarly, the shape sub dataset also includes 500 instances per each shape for 10 different arbitrary 2D shapes, resulting in a total of 5000 shape instances. The dataset provides X, Y, Z coordinates of the hand movement path data in Matlab (M-file) and Excel formats and their corresponding labels.This dataset creation has received The University of Sheffield ethics approval under application #023005 granted on 19/10/2018.

  15. LCA Baseline for U.S. Coal Mining and Delivery Data Products

    • osti.gov
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDOE Office of Fossil Energy (FE) (2025). LCA Baseline for U.S. Coal Mining and Delivery Data Products [Dataset]. http://doi.org/10.18141/2560300
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    National Energy Technology Laboratoryhttps://netl.doe.gov/
    USDOE Office of Fossil Energy (FE)
    NETL
    Description

    This group of data models include the 2024 NETL models for the NETL Coal Baseline Lifecycle Model in both open LCA and Excel, in addition to basin and transportation inventory data files in Excel supporting the overall model.

  16. f

    Dynamic Tracking Parameters

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Nov 9, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naumann, Marcel; Kreiter, Nicole; Sczech, Ronny; Hermann, Andreas; Pal, Arun; Glaß, Hannes; Japtok, Julia (2018). Dynamic Tracking Parameters [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000665073
    Explore at:
    Dataset updated
    Nov 9, 2018
    Authors
    Naumann, Marcel; Kreiter, Nicole; Sczech, Ronny; Hermann, Andreas; Pal, Arun; Glaß, Hannes; Japtok, Julia
    Description

    MS Excel result table containing all parameters of the dynamic organelle tracking analysis as described in the main manuscript under Methods, section 'Data mining in CSV result files and assembly of final EXCEL result tables with KNIME'.

  17. e

    Excel Mining Company Limited Export Import Data | Eximpedia

    • eximpedia.app
    Updated Oct 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2025). Excel Mining Company Limited Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Oct 2, 2025
    Dataset provided by
    Eximpedia Export Import Trade Data
    Eximpedia PTE LTD
    Authors
    Seair Exim
    Area covered
    Dominica, India, Saint Vincent and the Grenadines, Portugal, Malawi, Suriname, Liechtenstein, Bolivia (Plurinational State of), France, Singapore
    Description

    Excel Mining Company Limited Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  18. n

    Real-world VRP data with realistic non-standard constraints - parameter...

    • narcis.nl
    • data.4tu.nl
    Updated Dec 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emir Žunić (2018). Real-world VRP data with realistic non-standard constraints - parameter setting problem regression input data [Dataset]. http://doi.org/10.4121/uuid:97006624-d6a3-4a29-bffa-e8daf60699d8
    Explore at:
    media types: application/vnd.ms-excel, text/plainAvailable download formats
    Dataset updated
    Dec 14, 2018
    Dataset provided by
    4TU.Centre for Research Data
    Authors
    Emir Žunić
    Description

    This file is in Excel (xls) format, and contains data about regression model for input and output parameters (constants) that can be used for the solving of real-world vehicle routing problems with realistic non-standard constraints. All data are real and obtained experimentally by using VRP algorithm on production environment in one of the biggest distribution companies in Bosnia and Herzegovina.

  19. e

    Excel Mining And Infra Services Export Import Data | Eximpedia

    • eximpedia.app
    Updated Feb 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2025). Excel Mining And Infra Services Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 14, 2025
    Dataset provided by
    Eximpedia Export Import Trade Data
    Eximpedia PTE LTD
    Authors
    Seair Exim
    Area covered
    Egypt, Wallis and Futuna, Georgia, Botswana, Kenya, Saint Martin (French part), Macedonia (the former Yugoslav Republic of), Greece, Korea (Democratic People's Republic of), Bahamas
    Description

    Excel Mining And Infra Services Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  20. Z

    Data from: How are software repositories mined? A systematic literature...

    • data.niaid.nih.gov
    Updated Sep 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymized for Review (2021). How are software repositories mined? A systematic literature review of workflows, methodologies, reproducibility, and tools [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5274207
    Explore at:
    Dataset updated
    Sep 2, 2021
    Dataset provided by
    Anonymized
    Authors
    Anonymized for Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the excel spreadsheet dataset containing our analysis of papers performing mining software repositories research from the conferences ICSE, ESEC/FSE, and MSR from the years 2018 - 2020. The data is broken into columns and can be explained at a high-level as follows:

    Column Content

    1 The paper being analyzed

    2 Does the paper state the data they analyzed is available

    3 Does the paper perform some sort of data analysis or sampling using data others have compiled in the past

    4 Does the paper state a timestamp for when they begin their work

    5 Does the paper state the use of systems pre-built to help with MSR work

    6 - 18 Forms of sampling researchers may have employed to select their data

    19 What datasets (if any) were used in the analysis

    20 What tools (if any) were used in the analysis

    21 How they performed their data sampling workflow

    22 How they performed their data filtering workflow

    23 How they performed their data retrieval workflow

    24 Did they create any scripts in each of these workflows

    25 - 33 Did they publish a replication package and what is contained within

    34 Is the paper describing a tool for research or not

    35 Short description of the paper read

    36 A high-level category of the work performed in each paper

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie (2023). Raw data outputs 1-18 [Dataset]. http://doi.org/10.26180/21259491.v1

Raw data outputs 1-18

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
Monash University
Authors
Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.

Search
Clear search
Close search
Google apps
Main menu