Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data consist of a collection of legitimate as well as phishing website instances. Each website is represented by the set of features which denote, whether website is legitimate or not. Data can serve as an input for machine learning process.
In this repository the two variants of the Phishing Dataset are presented.
Full variant - dataset_full.csv Short description of the full variant dataset: Total number of instances: 88,647 Number of legitimate website instances (labeled as 0): 58,000 Number of phishing website instances (labeled as 1): 30,647 Total number of features: 111
Small variant - dataset_small.csv Short description of the small variant dataset: Total number of instances: 58,645 Number of legitimate website instances (labeled as 0): 27,998 Number of phishing website instances (labeled as 1): 30,647 Total number of features: 111
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Medical Diagnostics: This model can be used in the healthcare industry to provide high-speed automated analysis of pathology slides, determining whether cells are normal or abnormal, and assisting in diagnosis of various diseases such as cancer.
Scientific Research: Researchers studying cell biology or genetics can use this AI model for their studies on cellular abnormalities and diseases. This can accelerate the onset of breakthroughs in medical science.
Pharmaceutical Applications: Pharmaceutical companies can use this model in drug discovery and development process. By identifying how different medications affect normal and abnormal cells, they can speed up and enhance their research.
Educational Tool: This AI model could serve as a rich educational tool in courses related to biology, medicine and health sciences, helping students to visualize and understand differences between normal and abnormal cells.
Personalized Medicine: This model can be used to analyze patients' cells to create personalized treatment plans. Understanding an individual's cellular structure could help healthcare professionals tailor treatments to the patient's specific needs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Mendeley is a dataset for classification tasks - it contains YellowLeaves Or HealthyLeaves annotations for 1,348 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Cloud-based data repository for storing, publishing and accessing scientific data. Mendeley Data creates a permanent location and issues Force 11 compliant citations for uploaded data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of validated OCT and Chest X-Ray images described and analyzed in "Deep learning-based classification and referral of treatable human diseases". The OCT Images are split into a training set and a testing set of independent patients. OCT Images are labeled as (disease)-(randomized patient ID)-(image number by this patient) and split into 4 directories: CNV, DME, DRUSEN, and NORMAL.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
June16 Mendeley is a dataset for object detection tasks - it contains Hard Hats annotations for 5,000 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The included tests were performed at McMaster University in Hamilton, Ontario, Canada by Dr. Phillip Kollmeyer (phillip.kollmeyer@gmail.com). If this data is utilized for any purpose, it should be appropriately referenced. -A brand new 3Ah LG HG2 cell was tested in an 8 cu.ft. thermal chamber with a 75amp, 5 volt Digatron Firing Circuits Universal Battery Tester channel with a voltage and current accuracy of 0.1% of full scale. these data are used in the design process of an SOC estimator using a deep feedforward neural network (FNN) approach. The data also includes a description of data acquisition, data preparation, development of an FNN example script.
-Instructions for Downloading and Running the Script:
1-Select download all files from the Mendeley Data page (https://data.mendeley.com/datasets/cp3473x7xv/2).
2-The files will be downloaded as a zip file. Unzip the file to a folder, do not modify the folder structure.
3-Navigate to the folder with "FNN_xEV_Li_ion_SOC_EstimatorScript_March_2020.mlx"
4-Open and run "FNN_xEV_Li_ion_SOC_EstimatorScript_March_2020.mlx"
5-The matlab script should run without any modification, if there is an issue it's likely due to the testing and training data not being in the expected place.
6-The script is set by default to train for 50 epochs and to repeat the training 3 times. This should take 5-10 minutes to execute.
7-To recreate the results in the paper, set number of epochs to 5500 and number of repetitions to 10.
-The test data, or similar data, has been used for some publications, including: [1] C. Vidal, P. Kollmeyer, M. Naguib, P. Malysz, O. Gross, and A. Emadi, “Robust xEV Battery State-of-Charge Estimator Design using Deep Neural Networks,” in Proc WCX SAE World Congress Experience, Detroit, MI, Apr 2020 [2] C. Vidal, P. Kollmeyer, E. Chemali and A. Emadi, "Li-ion Battery State of Charge Estimation Using Long Short-Term Memory Recurrent Neural Network with Transfer Learning," 2019 IEEE Transportation Electrification Conference and Expo (ITEC), Detroit, MI, USA, 2019, pp. 1-6.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
odysseywt/Mendeley dataset hosted on Hugging Face and contributed by the HF Datasets community
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Provide:
a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD) lung cancer dataset was collected in the above-mentioned specialist hospitals over a period of three months in fall 2019. It includes CT scans of patients diagnosed with lung cancer in different stages, as well as healthy subjects. IQ-OTH/NCCD slides were marked by oncologists and radiologists in these two centers. The dataset contains a total of 1190 images representing CT scan slices of 110 cases (see Figure 1). These cases are grouped into three classes: normal, benign, and malignant. of these, 40 cases are diagnosed as malignant; 15 cases diagnosed with benign; and 55 cases classified as normal cases. The CT scans were originally collected in DICOM format. The scanner used is SOMATOM from Siemens. CT protocol includes: 120 kV, slice thickness of 1 mm, with window width ranging from 350 to 1200 HU and window center from 50 to 600 were used for reading. with breath hold at full inspiration. All images were de-identified before performing analysis. Written consent was waived by the oversight review board. The study was approved by the institutional review board of participating medical centers. Each scan contains several slices. The number of these slices range from 80 to 200 slices, each of them represents an image of the human chest with different sides and angles. The 110 cases vary in gender, age, educational attainment, area of residence and living status. Some of them are employees of the Iraqi ministries of Transport and Oil, others are farmers and gainers. Most of them come from places in the middle region of Iraq, particularly, the provinces of Baghdad, Wasit, Diyala, Salahuddin, and Babylon.
Web application as free reference manager and academic social network to organize your research, collaborate with others online, and discover the latest research. Automatically generate bibliographies, Collaborate easily with other researchers online, Easily import papers from other research software, Find relevant papers based on what you're reading, Access your papers from anywhere online, Read papers on the go with the iPhone app. The software, Mendeley Desktop, offers: * Automatic extraction of document details * Efficient management of your papers * Sharing and synchronization of your library (or parts of it) * Additional features: A plug-in for citing your articles in Microsoft Word, OCR (image-to-text conversion, so you can full-text search all your scanned PDFs), etc The website, Mendeley Web, complements Mendeley Desktop by offering these features: * An online back up of your library * Statistics of all things interesting * A research network that allows you to keep track of your colleagues' publications, conference participations, awards etc * A recommendation engine for papers that might interest you.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset includes color images of oral lesions captured using mobile cameras and intraoral cameras. These images can be used for identifying potential oral malignancies by image analysis. These images have been collected in consultation with doctors from different hospitals and colleges in Karnataka, India. This dataset contains two folders - original_data and augmented_data. The first folder contains images of 165 benign lesions and 158 malignant lesions. The second folder contains images created by augmenting the original images. The augmentation techniques used are flipping, rotation and resizing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing.
The audio data was scraped using Echo Nest® API integrated engine with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Principal components analysis for metrics data (n = 33,683).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains count matrices and per-cells metadata tables for RNA sequencing of 39778 single nuclei from healthy primary lung samples of 12 lung adenocarcinoma patients as well as 17451 single human bronchiole epithelial cells from 4 donors. All samples were processed using the 10X Genomics Chromium platform with v2 chemistry and sequenced with one sample per lane on an Illumina HiSeq4000. Reads were aligned to the hg19 reference genome version 1.2.0 obtained from 10X Genomics. Data processing was performed using Seurat3. The metadata table includes patient ID, sex, age, smoking status, and cell type, as well as QC statistics (number of genes, number of cells, ratio of mitochondrial reads).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents a large-scale collection of millions of Twitter posts related to the coronavirus pandemic in Spanish language. The collection was built by monitoring public posts written in Spanish containing a diverse set of hashtags related to the COVID-19, as well as tweets shared by the official Argentinian government offices, such as ministries and secretaries at different levels. Data was collected between March and June 2020 using the Twitter API, and will be periodically updated.
In addition to tweets IDs, the dataset includes information about mentions, retweets, media, URLs, hashtags, replies, users and content-based user relations, allowing the observation of the dynamics of the shared information. Data is presented in different tables that can be analysed separately or combined.
The dataset aims at serving as source for studying several coronavirus effects in people through social media, including the impact of public policies, the perception of risk and related disease consequences, the adoption of guidelines, the emergence, dynamics and propagation of disinformation and rumours, the formation of communities and other social phenomena, the evolution of health related indicators (such as fear, stress, sleep disorders, or children behaviour changes), among other possibilities. In this sense, the dataset can be useful for multi-disciplinary researchers related to the different fields of data science, social network analysis, social computing, medical informatics, social sciences, among others.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains urban and rural LST, DEM, and NDVI data for annual, summer, and winter daytime and nighttime for all census tracts in US urbanized areas, as well as the mean values for the entire urbanized area.
METADATA
DEM: Digital Elevation Model
NDVI: Normalized Difference Vegetation Index
LST: Land Surface Temperature
_urb: Urban values (all urban pixels within urbanized areas)
_rur: Rural reference (Spatial mean of the non-urban, non-water pixels within the region of interest)
Regions of Interest:
_CT: Spatial mean of pixels intersecting the Census Tract clipped to the urbanized area (one value per census tract). This should be equal to the _CT for census tracts that are completely within the urbanized areas (the census tracts with the green dots in the image below)
_all: Spatial mean of all pixels intersecting the urbanized area, as defined by the US census (one value for one urbanized area)
_CT_act: Spatial mean of all available pixels intersecting the Census Tract (one value per census tract) [This should be equal to the previous values I calculated]
For the UHI: The ideal configuration is LST_urb_all-LST_rur_all for the entire urbanized area (from the US_Urbanized file) and LST_urb_CT_act-LST_rur_all for individual census tracts within the urbanized areas (from the census file) For the equity analysis: Either _CT or CT_act can be used if we are only concerned with spatial variation. Using CT_act leads to mismatch between census data for the tracts crossing the urban boundary and the remotely sensed data. Using _CT leads to mismatch between the UHI analysis and the equity analysis.
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
DAWN (Detection in Adverse Weather Nature) dataset consists of real-world images collected under various adverse weather conditions. This dataset emphasizes a diverse traffic environment (urban, highway and freeway) as well as a rich variety of traffic flow. The DAWN dataset comprises a collection of 1000 images from real-traffic environments, which are divided into four sets of weather conditions: fog, snow, rain and sandstorms. The dataset is annotated with object bounding boxes for autonomous driving and video surveillance scenarios. This data helps interpreting effects caused by the adverse weather conditions on the performance of vehicle detection systems. Also, it is required by researchers work in autonomous vehicles and intelligent visual traffic surveillance systems fields. All the rights of the DAWN dataset are reserved and commercial use/distribution of this database is strictly prohibited.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This heart disease dataset is acquired from one o f the multispecialty hospitals in India. Over 14 common features which makes it one of the heart disease dataset available so far for research purposes. This dataset consists of 1000 subjects with 12 features. This dataset will be useful for building a early-stage heart disease detection as well as to generate predictive machine learning models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of anonymized and deidentified panoramic dental X-rays of 116 patients, taken at Noor Medical Imaging Center, Qom, Iran. The subjects cover a wide range of dental conditions from healthy, to partial and complete edentulous cases. The mandibles of all cases are manually segmented by two dentists. This dataset is used as the basis for the article by Abdi et al [1].
[1] A. H. Abdi, S. Kasaei, and M. Mehdizadeh, “Automatic segmentation of mandible in panoramic x-ray,” J. Med. Imaging, vol. 2, no. 4, p. 44003, 2015.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data consist of a collection of legitimate as well as phishing website instances. Each website is represented by the set of features which denote, whether website is legitimate or not. Data can serve as an input for machine learning process.
In this repository the two variants of the Phishing Dataset are presented.
Full variant - dataset_full.csv Short description of the full variant dataset: Total number of instances: 88,647 Number of legitimate website instances (labeled as 0): 58,000 Number of phishing website instances (labeled as 1): 30,647 Total number of features: 111
Small variant - dataset_small.csv Short description of the small variant dataset: Total number of instances: 58,645 Number of legitimate website instances (labeled as 0): 27,998 Number of phishing website instances (labeled as 1): 30,647 Total number of features: 111