Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data sets includes 216 news on 240 wind turbine accidents between the years 1980 and 2013. The analysis of this data set and the insights obtained are reported in the following research paper:
Asian, S., Ertek, G., Haksoz, C., Pakter, S. and Ulun, S., 2017. Wind turbine accidents: A data mining study. IEEE Systems Journal, 11(3), pp.1567-1578.
As of now, the most extensive data available on the Internet on wind turbines accidents is published by the Caithness Windfarm Information Forum (CWIF), a UK-based grassroots organization opposing wind turbine installations.
While the Caithness list is impressive in magnitude, the quality and reliability of the list is open to discussion because of the following reason:
In spite of containing much more magnitude of data, the data available in other online sources also exhibit similar deficiencies.
So, there are problems when it comes to using the Caithness data or other data in research studies. To this end, we collected data on wind turbine accidents ourselves, also using the data from Caithness and we share our collected data on this page (please click the link at the top of the page to download the data).
The data we collected consists of three folders, and a MS Excel file.
The folder News.txt contains the accident news, with each news in a separate text file:
The folder News.doc contains news, with each news in a separate MS Word file:
Finally, the folder News.doc.with.notes contains news, with each news in a separate MS Word file, but with extensive comments, explaining how the database in the MS Excel file was constructed:
The MS Excel file News.Database.xlsx contains the structured data created based on the detailed reading of the accident news text:
The MS Excel file is the file that was analyzed in our research paper.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Excel spreadsheets. XLSX file containing the data from Sousa Abreu et al. which is used in the example of the article. (XLSX 611 kb)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Vrinda Store Data Analysis using Advance Excel, In this Dataset Cleaning the dataset and data mining remove the null value and using the Hlookup & Vlookup,Match,Index Pivot Tables and using the Chats to crated a beautiful DashBoard.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General descriptionThis dataset contains some markers of Open Science in the publications of the Chemical Biology Consortium Sweden (CBCS) between 2010 and July 2023. The sample of CBCS publications during this period consists of 188 articles. Every publication was visited manually at its DOI URL to answer the following questions.1. Is the research article an Open Access publication?2. Does the research article have a Creative Common license or a similar license?3. Does the research article contain a data availability statement?4. Did the authors submit data of their study to a repository such as EMBL, Genbank, Protein Data Bank PDB, Cambridge Crystallographic Data Centre CCDC, Dryad or a similar repository?5. Does the research article contain supplementary data?6. Do the supplementary data have a persistent identifier that makes them citable as a defined research output?VariablesThe data were compiled in a Microsoft Excel 365 document that includes the following variables.1. DOI URL of research article2. Year of publication3. Research article published with Open Access4. License for research article5. Data availability statement in article6. Supplementary data added to article7. Persistent identifier for supplementary data8. Authors submitted data to NCBI or EMBL or PDB or Dryad or CCDCVisualizationParts of the data were visualized in two figures as bar diagrams using Microsoft Excel 365. The first figure displays the number of publications during a year, the number of publications that is published with open access and the number of publications that contain a data availability statement (Figure 1). The second figure shows the number of publication sper year and how many publications contain supplementary data. This figure also shows how many of the supplementary datasets have a persistent identifier (Figure 2).File formats and softwareThe file formats used in this dataset are:.csv (Text file).docx (Microsoft Word 365 file).jpg (JPEG image file).pdf/A (Portable Document Format for archiving).png (Portable Network Graphics image file).pptx (Microsoft Power Point 365 file).txt (Text file).xlsx (Microsoft Excel 365 file)All files can be opened with Microsoft Office 365 and work likely also with the older versions Office 2019 and 2016. MD5 checksumsHere is a list of all files of this dataset and of their MD5 checksums.1. Readme.txt (MD5: 795f171be340c13d78ba8608dafb3e76)2. Manifest.txt (MD5: 46787888019a87bb9d897effdf719b71)3. Materials_and_methods.docx (MD5: 0eedaebf5c88982896bd1e0fe57849c2),4. Materials_and_methods.pdf (MD5: d314bf2bdff866f827741d7a746f063b),5. Materials_and_methods.txt (MD5: 26e7319de89285fc5c1a503d0b01d08a),6. CBCS_publications_until_date_2023_07_05.xlsx (MD5: 532fec0bd177844ac0410b98de13ca7c),7. CBCS_publications_until_date_2023_07_05.csv (MD5: 2580410623f79959c488fdfefe8b4c7b),8. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.xlsx (MD5: 9c67dd84a6b56a45e1f50a28419930e5),9. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.csv (MD5: fb3ac69476bfc57a8adc734b4d48ea2b),10. Aggregated_data_from_CBCS_publications_until_2023_07_05.xlsx (MD5: 6b6cbf3b9617fa8960ff15834869f793),11. Aggregated_data_from_CBCS_publications_until_2023_07_05.csv (MD5: b2b8dd36ba86629ed455ae5ad2489d6e),12. Figure_1_CBCS_publications_until_2023_07_05_Open_Access_and_data_availablitiy_statement.xlsx (MD5: 9c0422cf1bbd63ac0709324cb128410e),13. Figure_1.pptx (MD5: 55a1d12b2a9a81dca4bb7f333002f7fe),14. Image_of_figure_1.jpg (MD5: 5179f69297fbbf2eaaf7b641784617d7),15. Image_of_figure_1.png (MD5: 8ec94efc07417d69115200529b359698),16. Figure_2_CBCS_publications_until_2023_07_05_supplementary_data_and_PID_for_supplementary_data.xlsx (MD5: f5f0d6e4218e390169c7409870227a0a),17. Figure_2.pptx (MD5: 0fd4c622dc0474549df88cf37d0e9d72),18. Image_of_figure_2.jpg (MD5: c6c68b63b7320597b239316a1c15e00d),19. Image_of_figure_2.png (MD5: 24413cc7d292f468bec0ac60cbaa7809)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Purposive sampling was the method we chose to collect the data. We obtained information from two after-school coaching programs that voluntarily provided their online learning data to us in 2020 during the pandemic. Batches of 45 and 75 students each were used to organize the data, which were then combined to create a single dataset with 399 entries. Two phases of collection took place: on January 17, 2023, and on February 12, 2023. The initial data recording was done using Google Learning Management System's Google Classroom. The data was then exported to local storage by the classroom faculties and then passed onto the researchers. Excel was used to organize the data, with rows representing individual students and columns representing different topics. The dataset, which consists of four mock tests and sixteen physics topics, was gathered from grade 10 physics instructors and students. Every pupil was given a unique ID to protect their privacy, resulting in 399 distinct entries overall. The coaching institution standardized the dataset to score it out of 100 for consistency. It is important to note that for students who did not take the majority of the exams, the institutions did not gather or transmit missing data. The dataset displays a spread with a standard deviation of 20.5 and an average score of 69.547.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.
Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
Facebook
TwitterWithin the realm of data mining and analytics, this carefully curated dataset, hosted on Kaggle, stands as an invaluable resource for educational purposes. With a substantial volume of 15,000 records, this dataset is an open-source treasure trove, devoid of copyright restrictions, expressly designed to empower students and analysts in their pursuit of excellence in data mining and analytics. The dataset's primary focus lies in predicting Credit Scores, utilizing a binary variable to distinguish between "good" and "bad" credit ratings. It spans a diverse range of information types, incorporating nominal, continuous, ordinal, and binary variables to provide a comprehensive understanding of creditworthiness. As we embark on this educational journey, the dataset serves as a foundation for building predictive models, including but not limited to Logistics, CHAID, CART, as well as other notable models such as Random Forest, Support Vector Machines (SVM), and Gradient Boosting. By encompassing a broad spectrum of models, we aim to offer students and analysts a holistic view of various data mining techniques and their applications. The overarching goal remains to equip individuals with the skills and knowledge necessary to excel in the dynamic fields of data mining and analytics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures for the paper "The Relationship between Commit Message Detail and Defect Proneness in Java Projects on GitHub" submitted to the MSR 2016 Data Mining Challenge. These figures show the number of available Java projects with certain constraints applied. In particular, these constraints are number of contributors to the repository and number of commits to that repository.
Facebook
TwitterThis dataset presents tabular data and Excel workbooks used to analyze single-well aquifer tests in pumping wells and slug tests in monitoring wells near Long Canyon. The data also include pdf outputs from the analysis program, Aqtesolv (Duffield, 2007). The data are presented in two zipped files, (1) single-well aquifer tests in pumping wells and (2) slug tests in monitoring wells. The slug-test data were supplied by Newmont Mining Corporation and collected by Golder and Associates in 2011. Reference Cited: Duffield, G.M., 2007, AQTESOLV for windows: Version 4.5 User’s Guide, HydroSOLV, Inc. Reston, VA, p. 530, at, http://www.aqtesolv.com/download/aqtw20070719.pdf.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the excel spreadsheet dataset containing our analysis of papers performing mining software repositories research from the conferences ICSE, ESEC/FSE, and MSR from the years 2018 - 2020. The data is broken into columns and can be explained at a high-level as follows:
Column Content
1 The paper being analyzed
2 Does the paper state the data they analyzed is available
3 Does the paper perform some sort of data analysis or sampling using data others have compiled in the past
4 Does the paper state a timestamp for when they begin their work
5 Does the paper state the use of systems pre-built to help with MSR work
6 - 18 Forms of sampling researchers may have employed to select their data
19 What datasets (if any) were used in the analysis
20 What tools (if any) were used in the analysis
21 How they performed their data sampling workflow
22 How they performed their data filtering workflow
23 How they performed their data retrieval workflow
24 Did they create any scripts in each of these workflows
25 - 33 Did they publish a replication package and what is contained within
34 Is the paper describing a tool for research or not
35 Short description of the paper read
36 A high-level category of the work performed in each paper
Facebook
TwitterPurpose:This feature layer describes water quality sampling data performed at several operating coal mines in the South Fork of Cherry watershed, West Virginia.Source & Data:Data was downloaded from WV Department of Environmental Protection's ApplicationXtender online database and EPA's ECHO online database between January and April, 2023.There are five data sets here: Surface Water Monitoring Sites, which contains basic information about monitoring sites (name, lat/long, etc.) and NPDES Outlet Monitoring Sites, which contains similar information about outfall discharges surrounding the active mines. Biological Assessment Stations (BAS) contain similar information for pre-project biological sampling. NOV Summary contains locations of Notices of Violation received by South Fork Coal Company from WV Department of Environmental Protection. The Quarterly Monitoring Reports table contains the sampling data for the Surface Water Monitoring Sites, which actually goes as far back as 2018 for some mines. Parameters of concern include iron, aluminum and selenium, among others.A relationship class between Surface Water Monitoring Sites and the Quarterly Monitoring Reports allows access to individual sample results.Processing:Notices of Violation were obtained from the WV DEP AppXtender database for Mining and Reclamation Article 3 (SMCRA) Permitting, and Mining and Reclamation NPDES Permitting. Violation data were entered into Excel and loaded into ArcGIS Pro as a CSV text file with Lat/Long coordinates for each Violation. The CSV file was converted to a point feature class.Water quality data were downloaded in PDF format from the WVDEP AppXtender website. Non-searchable PDFs were converted via Optical Character Recognition, so that data could be copied. Sample results were copied and pasted manually to Notepad++, and several columns were re-ordered. Data was grouped by sample station and sorted chronologically. Sample data, contained in the associated table (SW_QM_Reports) were linked back to the monitoring station locations using the Station_ID text field in a geodatabase relationship class.Water monitoring station locations were taken from published Drainage Maps and from water quality reports. A CSV table was created with station Lat/Long locations and loaded into ArcGIS Pro. It was then converted to a point feature class.Stream Crossings and Road Construction Areas were digitized as polygon feature classes from project Drainage and Progress maps that were converted to TIFF image format from PDF and georeferenced.The ArcGIS Pro map - South Fork Cherry River Water Quality, was published as a service definition to ArcGIS Online.Symbology:NOV Summary - dark blue, solid pointLost Flats Surface Water Monitoring Sites: Data Available - medium blue point, black outlineLost Flats Surface Water Monitoring Sites: No Data Available - no-fill point, thick medium blue outlineLost Flats NPDES Outlet Monitoring Sites - orange point, black outlineBlue Knob Surface Water Monitoring Sites: Data Available - medium blue point, black outlineBlue Knob Surface Water Monitoring Sites: No Data Available - no-fill point, thick medium blue outlineBlue Knob NPDES Outlet Monitoring Sites - orange point, black outlineBlue Knob Biological Assessment Stations: Data Available - medium green point, black outlineBlue Knob Biological Assessment Stations: No Data Available - no-fill point, thick medium green outlineRocky Run Surface Water Monitoring Sites: Data Available - medium blue point, black outlineRocky Run Surface Water Monitoring Sites: No Data Available - no-fill point, thick medium blue outlineRocky Run NPDES Outlet Monitoring Sites - orange point, black outlineRocky Run Biological Assessment Stations: Data Available - medium green point, black outlineRocky Run Biological Assessment Stations: No Data Available - no-fill point, thick medium green outlineRocky Run Stream Crossings: turquoise blue polygon with red outlineRocky Run Haul Road Construction Areas: dark red (40% transparent) polygon with black outlineHaul Road No 2 Surface Water Monitoring Sites: Data Available - medium blue point, black outlineHaul Road No 2 Surface Water Monitoring Sites: No Data Available - no-fill point, thick medium blue outlineHaul Road No 2 NPDES Outlet Monitoring Sites - orange point, black outline
Facebook
TwitterThis group of data models include the 2024 NETL models for the NETL Coal Baseline Lifecycle Model in both open LCA and Excel, in addition to basin and transportation inventory data files in Excel supporting the overall model.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains an Nvivo file which is a collection of webpages attributed to a framework developed in the thesis. The Excel file is the analysis of whether the evidence points to the sustainability standard programs meeting the framework criteria.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundKlebsiella pneumoniae is one of the most important pathogens responsible for nosocomial outbreaks worldwide. Epidemiological analyses are useful in determining the extent of an outbreak and in elucidating the sources and the spread of infections. The aim of this study was to investigate the epidemiological spread of K. pneumoniae strains using a MALDI-TOF MS approach.MethodsFive hundred and thirty-five strains of K. pneumoniae were collected between January 2008 and March 2011 from hospitals in France and Algeria and were identified using MALDI-TOF. Antibiotic resistance patterns were investigated. Clinical and epidemiological data were recorded in an Excel file, including clustering obtained from the MSP dendrogram, and were analyzed using PASW Statistics software.ResultsAntibiotic susceptibility and phenotypic tests of the 535 isolates showed the presence of six resistance profiles distributed unequally between the two countries. The MSP dendrogram revealed five distinct clusters according to an arbitrary cut-off at the distance level of 500. Data mining analysis of the five clusters showed that K. pneumoniae strains isolated in Algerian hospitals were significantly associated with respiratory infections and the ESBL phenotype, whereas those from French hospitals were significantly associated with urinary tract infections and the wild-type phenotype.ConclusionsMALDI-TOF was found to be a promising tool to identify and differentiate between K. pneumoniae strains according to their phenotypic properties and their epidemiological distribution. This is the first time that MALDI-TOF has been used as a rapid tool for typing K. pneumoniae clinical isolates.
Facebook
TwitterThis file is in Excel (xls) format, and contains data about regression model for input and output parameters (constants) that can be used for the solving of real-world vehicle routing problems with realistic non-standard constraints. All data are real and obtained experimentally by using VRP algorithm on production environment in one of the biggest distribution companies in Bosnia and Herzegovina.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Computer code, algorithm, and data are available to generate results that are reported in the paper and central to the main claims.
- Crystal informatics
Part 1 - Code and README for crystal informatics by ANN
Part 2 - Data of crystal informatics (cubic system) for ANN
- Data mining of synthesis parameters
Part 3 - Code and README for data mining of synthesis parameters
Part 4 - Data for data mining of synthesis parameters
- Operation of the Robotic Scientist platform
Part 5 - An example of Robotic Execution Excel (REE) file
Part 6 - A brief instruction for operation of the Robotic Scientist platform
Part 7 - Code and by README for ML prediction by SISSO
Part 8 - Data (an example) of ML prediction for SISSO
Part 9 - Code for reading in situ color characterization results
Part 10 - Data (an example) of in situ color characterization results
- Supplementary information
Part 11 - Supplementary information
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains in-air hand-written numbers and shapes data used in the paper:B. Alwaely and C. Abhayaratne, "Graph Spectral Domain Feature Learning With Application to in-Air Hand-Drawn Number and Shape Recognition," in IEEE Access, vol. 7, pp. 159661-159673, 2019, doi: 10.1109/ACCESS.2019.2950643.The dataset contains the following:-Readme.txt- InAirNumberShapeDataset.zip containing-Number Folder (With 2 sub folders for Matlab and Excel)-Shapes Folder (With 2 sub folders for Matlab and Excel)The datasets include the in-air drawn number and shape hand movement path captured by a Kinect sensor. The number sub dataset includes 500 instances per each number 0 to 9, resulting in a total of 5000 number data instances. Similarly, the shape sub dataset also includes 500 instances per each shape for 10 different arbitrary 2D shapes, resulting in a total of 5000 shape instances. The dataset provides X, Y, Z coordinates of the hand movement path data in Matlab (M-file) and Excel formats and their corresponding labels.This dataset creation has received The University of Sheffield ethics approval under application #023005 granted on 19/10/2018.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset file export from scopus database and the dataset file export as bibliometrix file on excel format from biblioshiny.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.