39 datasets found

m
Raw data outputs 1-18
bridges.monash.edu
researchdata.edu.au
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie (2023). Raw data outputs 1-18 [Dataset]. http://doi.org/10.26180/21259491.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.26180/21259491.v1
Dataset updated
May 30, 2023
Dataset provided by
Monash University
Authors
Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.
Orange dataset table
figshare.com
xlsx
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19146410.v1
Dataset updated
Mar 4, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Rui Simões
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
MSR Challenge Excel Figures
figshare.com
png
Updated Feb 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luke Soldano (2016). MSR Challenge Excel Figures [Dataset]. http://doi.org/10.6084/m9.figshare.2504173.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.2504173.v1
Dataset updated
Feb 20, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Luke Soldano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Figures for the paper "The Relationship between Commit Message Detail and Defect Proneness in Java Projects on GitHub" submitted to the MSR 2016 Data Mining Challenge. These figures show the number of available Java projects with certain constraints applied. In particular, these constraints are number of contributors to the repository and number of commits to that repository.
m
Wind Turbine Accident News (1980-2013)
data.mendeley.com
Updated Nov 27, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gurdal Ertek (2017). Wind Turbine Accident News (1980-2013) [Dataset]. http://doi.org/10.17632/jkjvmn9tz3.1
Explore at:
Unique identifier
https://doi.org/10.17632/jkjvmn9tz3.1
Dataset updated
Nov 27, 2017
Authors
Gurdal Ertek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data sets includes 216 news on 240 wind turbine accidents between the years 1980 and 2013. The analysis of this data set and the insights obtained are reported in the following research paper:

Asian, S., Ertek, G., Haksoz, C., Pakter, S. and Ulun, S., 2017. Wind turbine accidents: A data mining study. IEEE Systems Journal, 11(3), pp.1567-1578.

As of now, the most extensive data available on the Internet on wind turbines accidents is published by the Caithness Windfarm Information Forum (CWIF), a UK-based grassroots organization opposing wind turbine installations.

While the Caithness list is impressive in magnitude, the quality and reliability of the list is open to discussion because of the following reason:

Many of the web links to the news sources are not valid, and some of the accidents appear in multiple lines of the data.

In spite of containing much more magnitude of data, the data available in other online sources also exhibit similar deficiencies.

So, there are problems when it comes to using the Caithness data or other data in research studies. To this end, we collected data on wind turbine accidents ourselves, also using the data from Caithness and we share our collected data on this page (please click the link at the top of the page to download the data).

The data we collected consists of three folders, and a MS Excel file.

The folder News.txt contains the accident news, with each news in a separate text file:

The folder News.doc contains news, with each news in a separate MS Word file:

Finally, the folder News.doc.with.notes contains news, with each news in a separate MS Word file, but with extensive comments, explaining how the database in the MS Excel file was constructed:

The MS Excel file News.Database.xlsx contains the structured data created based on the detailed reading of the accident news text:

The MS Excel file is the file that was analyzed in our research paper.
Additional file 1 of msBiodat analysis tool, big data analysis for...
springernature.figshare.com
bin
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pau Muñoz-Torres; Filip Rokć; Robert Belužic; Ivana Grbeša; Oliver Vugrek (2023). Additional file 1 of msBiodat analysis tool, big data analysis for high-throughput experiments [Dataset]. http://doi.org/10.6084/m9.figshare.c.3645041_D1.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3645041_D1.v1
Dataset updated
Jun 4, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Pau Muñoz-Torres; Filip Rokć; Robert Belužic; Ivana Grbeša; Oliver Vugrek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Excel spreadsheets. XLSX file containing the data from Sousa Abreu et al. which is used in the example of the article. (XLSX 611 kb)
m
A brief dataset highlighting online learning test scores of Bangladeshi...
data.mendeley.com
Updated Feb 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shabab Rahman (2024). A brief dataset highlighting online learning test scores of Bangladeshi high-school students [Dataset]. http://doi.org/10.17632/g88h8vz9kg.2
Explore at:
Unique identifier
https://doi.org/10.17632/g88h8vz9kg.2
Dataset updated
Feb 6, 2024
Authors
Shabab Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh
Description
Purposive sampling was the method we chose to collect the data. We obtained information from two after-school coaching programs that voluntarily provided their online learning data to us in 2020 during the pandemic. Batches of 45 and 75 students each were used to organize the data, which were then combined to create a single dataset with 399 entries. Two phases of collection took place: on January 17, 2023, and on February 12, 2023. The initial data recording was done using Google Learning Management System's Google Classroom. The data was then exported to local storage by the classroom faculties and then passed onto the researchers. Excel was used to organize the data, with rows representing individual students and columns representing different topics. The dataset, which consists of four mock tests and sixteen physics topics, was gathered from grade 10 physics instructors and students. Every pupil was given a unique ID to protect their privacy, resulting in 399 distinct entries overall. The coaching institution standardized the dataset to score it out of 100 for consistency. It is important to note that for students who did not take the majority of the exams, the institutions did not gather or transmit missing data. The dataset displays a spread with a standard deviation of 20.5 and an average score of 69.547.
d
2011–2016 Single Well Aquifer Tests: Pumping Schedules, Water-Level Data in...
catalog.data.gov
data.usgs.gov
Updated Nov 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). 2011–2016 Single Well Aquifer Tests: Pumping Schedules, Water-Level Data in Aquifer Test Wells, and Analysis Results from Tests Conducted near Long Canyon, Goshute Valley, Northeastern Nevada [Dataset]. https://catalog.data.gov/dataset/20112016-single-well-aquifer-tests-pumping-schedules-water-level-data-in-aquifer-test-well
Explore at:
Dataset updated
Nov 26, 2025
Dataset provided by
U.S. Geological Survey
Area covered
Goshute Valley, Nevada
Description
This dataset presents tabular data and Excel workbooks used to analyze single-well aquifer tests in pumping wells and slug tests in monitoring wells near Long Canyon. The data also include pdf outputs from the analysis program, Aqtesolv (Duffield, 2007). The data are presented in two zipped files, (1) single-well aquifer tests in pumping wells and (2) slug tests in monitoring wells. The slug-test data were supplied by Newmont Mining Corporation and collected by Golder and Associates in 2011. Reference Cited: Duffield, G.M., 2007, AQTESOLV for windows: Version 4.5 User’s Guide, HydroSOLV, Inc. Reston, VA, p. 530, at, http://www.aqtesolv.com/download/aqtw20070719.pdf.
Excel file figs 2, 3, 4, 5, 6, 7 and 10.
plos.figshare.com
xlsx
Updated Jun 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hossein Bibak; Farzad Heydari; Mohammad Sadat-Hosseini (2024). Excel file figs 2, 3, 4, 5, 6, 7 and 10. [Dataset]. http://doi.org/10.1371/journal.pone.0303229.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303229.s001
Dataset updated
Jun 10, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Hossein Bibak; Farzad Heydari; Mohammad Sadat-Hosseini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The present study recorded indigenous knowledge of medicinal plants in Shahrbabak, Iran. We described a method using data mining algorithms to predict medicinal plants’ mode of application. Twenty-oneindividuals aged 28 to 81 were interviewed. Firstly, data were collected and analyzed based on quantitative indices such as the informant consensus factor (ICF), the cultural importance index (CI), and the relative frequency of citation (RFC). Secondly, the data was classified by support vector machines, J48 decision trees, neural networks, and logistic regression. So, 141 medicinal plants from 43 botanical families were documented. Lamiaceae, with 18 species, was the dominant family among plants, and plant leaves were most frequently used for medicinal purposes. The decoction was the most commonly used preparation method (56%), and therophytes were the most dominant (48.93%) among plants. Regarding the RFC index, the most important species are Adiantum capillus-veneris L. and Plantago ovata Forssk., while Artemisia auseri Boiss. ranked first based on the CI index. The ICF index demonstrated that metabolic disorders are the most common problems among plants in the Shahrbabak region. Finally, the J48 decision tree algorithm consistently outperforms other methods, achieving 95% accuracy in 10-fold cross-validation and 70–30 data split scenarios. The developed model detects with maximum accuracy how to consume medicinal plants.
Credit Rating Precision
kaggle.com
zip
Updated Feb 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prashant Gadhiya (2024). Credit Rating Precision [Dataset]. https://www.kaggle.com/datasets/prashantgadhiya/credit-rating-precision
Explore at:
zip(1100228 bytes)Available download formats
Dataset updated
Feb 14, 2024
Authors
Prashant Gadhiya
Description
Within the realm of data mining and analytics, this carefully curated dataset, hosted on Kaggle, stands as an invaluable resource for educational purposes. With a substantial volume of 15,000 records, this dataset is an open-source treasure trove, devoid of copyright restrictions, expressly designed to empower students and analysts in their pursuit of excellence in data mining and analytics. The dataset's primary focus lies in predicting Credit Scores, utilizing a binary variable to distinguish between "good" and "bad" credit ratings. It spans a diverse range of information types, incorporating nominal, continuous, ordinal, and binary variables to provide a comprehensive understanding of creditworthiness. As we embark on this educational journey, the dataset serves as a foundation for building predictive models, including but not limited to Logistics, CHAID, CART, as well as other notable models such as Random Forest, Support Vector Machines (SVM), and Gradient Boosting. By encompassing a broad spectrum of models, we aim to offer students and analysts a holistic view of various data mining techniques and their applications. The overarching goal remains to equip individuals with the skills and knowledge necessary to excel in the dynamic fields of data mining and analytics.
Vrinda Store Analysis with Excel and DashBoard
kaggle.com
zip
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Balu Desineti (2024). Vrinda Store Analysis with Excel and DashBoard [Dataset]. https://www.kaggle.com/datasets/baludesineti/vrinda-store-analysis-with-excel-and-dashboard
Explore at:
zip(10546227 bytes)Available download formats
Dataset updated
Jul 11, 2024
Authors
Balu Desineti
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Vrinda Store Data Analysis using Advance Excel, In this Dataset Cleaning the dataset and data mining remove the null value and using the Hlookup & Vlookup,Match,Index Pivot Tables and using the Chats to crated a beautiful DashBoard.
a
South Fork Cherry River Water Quality
conservation-abra.hub.arcgis.com
Updated Feb 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allegheny-Blue Ridge Alliance (2023). South Fork Cherry River Water Quality [Dataset]. https://conservation-abra.hub.arcgis.com/maps/3b366a6bc44e4392847b71ec82038173
Explore at:
Dataset updated
Feb 22, 2023
Dataset authored and provided by
Allegheny-Blue Ridge Alliance
Area covered

Description
Purpose:This feature layer describes water quality sampling data performed at several operating coal mines in the South Fork of Cherry watershed, West Virginia.Source & Data:Data was downloaded from WV Department of Environmental Protection's ApplicationXtender online database and EPA's ECHO online database between January and April, 2023.There are five data sets here: Surface Water Monitoring Sites, which contains basic information about monitoring sites (name, lat/long, etc.) and NPDES Outlet Monitoring Sites, which contains similar information about outfall discharges surrounding the active mines. Biological Assessment Stations (BAS) contain similar information for pre-project biological sampling. NOV Summary contains locations of Notices of Violation received by South Fork Coal Company from WV Department of Environmental Protection. The Quarterly Monitoring Reports table contains the sampling data for the Surface Water Monitoring Sites, which actually goes as far back as 2018 for some mines. Parameters of concern include iron, aluminum and selenium, among others.A relationship class between Surface Water Monitoring Sites and the Quarterly Monitoring Reports allows access to individual sample results.Processing:Notices of Violation were obtained from the WV DEP AppXtender database for Mining and Reclamation Article 3 (SMCRA) Permitting, and Mining and Reclamation NPDES Permitting. Violation data were entered into Excel and loaded into ArcGIS Pro as a CSV text file with Lat/Long coordinates for each Violation. The CSV file was converted to a point feature class.Water quality data were downloaded in PDF format from the WVDEP AppXtender website. Non-searchable PDFs were converted via Optical Character Recognition, so that data could be copied. Sample results were copied and pasted manually to Notepad++, and several columns were re-ordered. Data was grouped by sample station and sorted chronologically. Sample data, contained in the associated table (SW_QM_Reports) were linked back to the monitoring station locations using the Station_ID text field in a geodatabase relationship class.Water monitoring station locations were taken from published Drainage Maps and from water quality reports. A CSV table was created with station Lat/Long locations and loaded into ArcGIS Pro. It was then converted to a point feature class.Stream Crossings and Road Construction Areas were digitized as polygon feature classes from project Drainage and Progress maps that were converted to TIFF image format from PDF and georeferenced.The ArcGIS Pro map - South Fork Cherry River Water Quality, was published as a service definition to ArcGIS Online.Symbology:NOV Summary - dark blue, solid pointLost Flats Surface Water Monitoring Sites: Data Available - medium blue point, black outlineLost Flats Surface Water Monitoring Sites: No Data Available - no-fill point, thick medium blue outlineLost Flats NPDES Outlet Monitoring Sites - orange point, black outlineBlue Knob Surface Water Monitoring Sites: Data Available - medium blue point, black outlineBlue Knob Surface Water Monitoring Sites: No Data Available - no-fill point, thick medium blue outlineBlue Knob NPDES Outlet Monitoring Sites - orange point, black outlineBlue Knob Biological Assessment Stations: Data Available - medium green point, black outlineBlue Knob Biological Assessment Stations: No Data Available - no-fill point, thick medium green outlineRocky Run Surface Water Monitoring Sites: Data Available - medium blue point, black outlineRocky Run Surface Water Monitoring Sites: No Data Available - no-fill point, thick medium blue outlineRocky Run NPDES Outlet Monitoring Sites - orange point, black outlineRocky Run Biological Assessment Stations: Data Available - medium green point, black outlineRocky Run Biological Assessment Stations: No Data Available - no-fill point, thick medium green outlineRocky Run Stream Crossings: turquoise blue polygon with red outlineRocky Run Haul Road Construction Areas: dark red (40% transparent) polygon with black outlineHaul Road No 2 Surface Water Monitoring Sites: Data Available - medium blue point, black outlineHaul Road No 2 Surface Water Monitoring Sites: No Data Available - no-fill point, thick medium blue outlineHaul Road No 2 NPDES Outlet Monitoring Sites - orange point, black outline
Z
Mining and Extractivism Records Data for Bibliometric Analysis (Scopus...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Nov 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zampier, Maika (2021). Mining and Extractivism Records Data for Bibliometric Analysis (Scopus database 1992-2020) [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_5638787
Explore at:
Dataset updated
Nov 2, 2021
Authors
Zampier, Maika
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset file export from scopus database and the dataset file export as bibliometrix file on excel format from biblioshiny.
s
In-Air Hand-Drawn Number and Shape Dataset
orda.shef.ac.uk
zip
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Basheer Alwaely; Charith Abhayaratne (2025). In-Air Hand-Drawn Number and Shape Dataset [Dataset]. http://doi.org/10.15131/shef.data.7381472.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.7381472.v2
Dataset updated
Jul 14, 2025
Dataset provided by
The University of Sheffield
Authors
Basheer Alwaely; Charith Abhayaratne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains in-air hand-written numbers and shapes data used in the paper:B. Alwaely and C. Abhayaratne, "Graph Spectral Domain Feature Learning With Application to in-Air Hand-Drawn Number and Shape Recognition," in IEEE Access, vol. 7, pp. 159661-159673, 2019, doi: 10.1109/ACCESS.2019.2950643.The dataset contains the following:-Readme.txt- InAirNumberShapeDataset.zip containing-Number Folder (With 2 sub folders for Matlab and Excel)-Shapes Folder (With 2 sub folders for Matlab and Excel)The datasets include the in-air drawn number and shape hand movement path captured by a Kinect sensor. The number sub dataset includes 500 instances per each number 0 to 9, resulting in a total of 5000 number data instances. Similarly, the shape sub dataset also includes 500 instances per each shape for 10 different arbitrary 2D shapes, resulting in a total of 5000 shape instances. The dataset provides X, Y, Z coordinates of the hand movement path data in Matlab (M-file) and Excel formats and their corresponding labels.This dataset creation has received The University of Sheffield ethics approval under application #023005 granted on 19/10/2018.
LCA Baseline for U.S. Coal Mining and Delivery Data Products
osti.gov
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDOE Office of Fossil Energy (FE) (2025). LCA Baseline for U.S. Coal Mining and Delivery Data Products [Dataset]. http://doi.org/10.18141/2560300
Explore at:
Unique identifier
https://doi.org/10.18141/2560300
Dataset updated
Apr 1, 2025
Dataset provided by
National Energy Technology Laboratoryhttps://netl.doe.gov/
USDOE Office of Fossil Energy (FE)
NETL
Description
This group of data models include the 2024 NETL models for the NETL Coal Baseline Lifecycle Model in both open LCA and Excel, in addition to basin and transportation inventory data files in Excel supporting the overall model.
f
Dynamic Tracking Parameters
datasetcatalog.nlm.nih.gov
springernature.figshare.com
Updated Nov 9, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naumann, Marcel; Kreiter, Nicole; Sczech, Ronny; Hermann, Andreas; Pal, Arun; Glaß, Hannes; Japtok, Julia (2018). Dynamic Tracking Parameters [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000665073
Explore at:
Dataset updated
Nov 9, 2018
Authors
Naumann, Marcel; Kreiter, Nicole; Sczech, Ronny; Hermann, Andreas; Pal, Arun; Glaß, Hannes; Japtok, Julia
Description
MS Excel result table containing all parameters of the dynamic organelle tracking analysis as described in the main manuscript under Methods, section 'Data mining in CSV result files and assembly of final EXCEL result tables with KNIME'.
e
Excel Mining Company Limited Export Import Data | Eximpedia
eximpedia.app
Updated Oct 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Excel Mining Company Limited Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Oct 2, 2025
Dataset provided by
Eximpedia Export Import Trade Data
Eximpedia PTE LTD
Authors
Seair Exim
Area covered
Dominica, India, Saint Vincent and the Grenadines, Portugal, Malawi, Suriname, Liechtenstein, Bolivia (Plurinational State of), France, Singapore
Description
Excel Mining Company Limited Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
n
Real-world VRP data with realistic non-standard constraints - parameter...
narcis.nl
data.4tu.nl
Updated Dec 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emir Žunić (2018). Real-world VRP data with realistic non-standard constraints - parameter setting problem regression input data [Dataset]. http://doi.org/10.4121/uuid:97006624-d6a3-4a29-bffa-e8daf60699d8
Explore at:
media types: application/vnd.ms-excel, text/plainAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:97006624-d6a3-4a29-bffa-e8daf60699d8
Dataset updated
Dec 14, 2018
Dataset provided by
4TU.Centre for Research Data
Authors
Emir Žunić
Description
This file is in Excel (xls) format, and contains data about regression model for input and output parameters (constants) that can be used for the solving of real-world vehicle routing problems with realistic non-standard constraints. All data are real and obtained experimentally by using VRP algorithm on production environment in one of the biggest distribution companies in Bosnia and Herzegovina.
e
Excel Mining And Infra Services Export Import Data | Eximpedia
eximpedia.app
Updated Feb 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Excel Mining And Infra Services Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 14, 2025
Dataset provided by
Eximpedia Export Import Trade Data
Eximpedia PTE LTD
Authors
Seair Exim
Area covered
Egypt, Wallis and Futuna, Georgia, Botswana, Kenya, Saint Martin (French part), Macedonia (the former Yugoslav Republic of), Greece, Korea (Democratic People's Republic of), Bahamas
Description
Excel Mining And Infra Services Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Z
Data from: How are software repositories mined? A systematic literature...
data.niaid.nih.gov
Updated Sep 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymized for Review (2021). How are software repositories mined? A systematic literature review of workflows, methodologies, reproducibility, and tools [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5274207
Explore at:
Dataset updated
Sep 2, 2021
Dataset provided by
Anonymized
Authors
Anonymized for Review
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the excel spreadsheet dataset containing our analysis of papers performing mining software repositories research from the conferences ICSE, ESEC/FSE, and MSR from the years 2018 - 2020. The data is broken into columns and can be explained at a high-level as follows:

Column Content

1 The paper being analyzed

2 Does the paper state the data they analyzed is available

3 Does the paper perform some sort of data analysis or sampling using data others have compiled in the past

4 Does the paper state a timestamp for when they begin their work

5 Does the paper state the use of systems pre-built to help with MSR work

6 - 18 Forms of sampling researchers may have employed to select their data

19 What datasets (if any) were used in the analysis

20 What tools (if any) were used in the analysis

21 How they performed their data sampling workflow

22 How they performed their data filtering workflow

23 How they performed their data retrieval workflow

24 Did they create any scripts in each of these workflows

25 - 33 Did they publish a replication package and what is contained within

34 Is the paper describing a tool for research or not

35 Short description of the paper read

36 A high-level category of the work performed in each paper

Facebook

Twitter

Click to copy link

Link copied

Cite

Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie (2023). Raw data outputs 1-18 [Dataset]. http://doi.org/10.26180/21259491.v1

Raw data outputs 1-18

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.26180/21259491.v1

Dataset updated

May 30, 2023

Dataset provided by

Monash University

Authors

Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.

Clear search

Close search

Google apps

Main menu

Raw data outputs 1-18

Orange dataset table

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

MSR Challenge Excel Figures

Wind Turbine Accident News (1980-2013)

Additional file 1 of msBiodat analysis tool, big data analysis for...

A brief dataset highlighting online learning test scores of Bangladeshi...

2011–2016 Single Well Aquifer Tests: Pumping Schedules, Water-Level Data in...

Excel file figs 2, 3, 4, 5, 6, 7 and 10.

Credit Rating Precision

Vrinda Store Analysis with Excel and DashBoard

South Fork Cherry River Water Quality

Mining and Extractivism Records Data for Bibliometric Analysis (Scopus...

In-Air Hand-Drawn Number and Shape Dataset

LCA Baseline for U.S. Coal Mining and Delivery Data Products

Dynamic Tracking Parameters

Excel Mining Company Limited Export Import Data | Eximpedia

Real-world VRP data with realistic non-standard constraints - parameter...

Excel Mining And Infra Services Export Import Data | Eximpedia

Data from: How are software repositories mined? A systematic literature...

Raw data outputs 1-18