17 datasets found

Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset
figshare.com
application/x-rar
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anh Pham Tuan; An Tran Hung Phuong; Nguyen Vu Thanh; Toan Nguyen Van (2023). Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.6635642.v1
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6635642.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Anh Pham Tuan; An Tran Hung Phuong; Nguyen Vu Thanh; Toan Nguyen Van
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset contains 8970 malware and 1000 benign binaries files. Malware files which are divided into 5 types: Locker (300), Mediyes (1450), Winwebsec (4400), Zbot (2100), Zeroaccess (690). All of malware files are collected from https://virusshare.com/ and malicia-project.com. Benign excutable files are taken from installed folders of applications of legitimate software from different categories. They can be downloaded in https://download.cnet.com/windows/. All of files are verified by VirusTotal (https://www.virustotal.com) to make sure each file belong to their type. Note: This dataset includes malware so it can harm your computer.
The files on your computer
kaggle.com
Updated Jan 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cogs (2017). The files on your computer [Dataset]. https://www.kaggle.com/cogitoe/crab/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 15, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
cogs
Description
Dataset: The files on your computer.

Crab is a command line tool for Mac and Windows that scans file data into a SQLite database, so you can run SQL queries over it.

e.g. (Win) C:> crab C:somepathMyProject or (Mac) $ crab /some/path/MyProject

You get a CRAB> prompt where you can enter SQL queries on the data, e.g. Count files by extension

SELECT extension, count(*) FROM files GROUP BY extension;

e.g. List the 5 biggest directories

SELECT parentpath, sum(bytes)/1e9 as GB FROM files GROUP BY parentpath ORDER BY sum(bytes) DESC LIMIT 5;

Crab provides a virtual table, fileslines, which exposes file contents to SQL

e.g. Count TODO and FIXME entries in any .c files, recursively

SELECT fullpath, count(*) FROM fileslines WHERE parentpath like '/Users/GN/HL3/%' and extension = '.c' and (data like '%TODO%' or data like '%FIXME%') GROUP BY fullpath;

As well there are functions to run programs or shell commands on any subset of files, or lines within files e.g. (Mac) unzip all the .zip files, recursively

SELECT exec('unzip', '-n', fullpath, '-d', '/Users/johnsmith/Target Dir/') FROM files WHERE parentpath like '/Users/johnsmith/Source Dir/%' and extension = '.zip';

(Here -n tells unzip not to overwrite anything, and -d specifies target directory)

There is also a function to write query output to file, e.g. (Win) Sort the lines of all the .txt files in a directory and write them to a new file

SELECT writeln('C:UsersSJohnsondictionary2.txt', data) FROM fileslines WHERE parentpath = 'C:UsersSJohnson' and extension = '.txt' ORDER BY data;

In place of the interactive prompt you can run queries in batch mode. E.g. Here is a one-liner that returns the full path all the files in the current directory

C:> crab -batch -maxdepth 1 . "SELECT fullpath FROM files"

Crab SQL can also be used in Windows batch files, or Bash scripts, e.g. for ETL processing.

Crab is free for personal use, $5/mo commercial

See more details here (mac): [http://etia.co.uk/][1] or here (win): [http://etia.co.uk/win/about/][2]

An example SQLite database (Mac data) has been uploaded for you to play with. It includes an example files table for the directory tree you get when downloading the Project Gutenberg corpus, which contains 95k directories and 123k files.

To scan your own files, and get access to the virtual tables and support functions you have to use the Crab SQLite shell, available for download from this page (Mac): [http://etia.co.uk/download/][3] or this page (Win): [http://etia.co.uk/win/download/][4]

Content

FILES TABLE

The FILES table contains details of every item scanned, file or directory. All columns are indexed except 'mode'

COLUMNS fileid (int) primary key -- files table row number, a unique id for each item name (text) -- item name e.g. 'Hei.ttf' bytes (int) -- item size in bytes e.g. 7502752 depth (int) -- how far scan recursed to find the item, starts at 0 accessed (text) -- datetime item was accessed modified (text) -- datetime item was modified basename (text) -- item name without path or extension, e.g. 'Hei' extension (text) -- item extension including the dot, e.g. '.ttf' type (text) -- item type, 'f' for file or 'd' for directory mode (text) -- further type info and permissions, e.g. 'drwxr-xr-x' parentpath (text) -- absolute path of directory containing the item, e.g. '/Library/Fonts/' fullpath (text) unique -- parentpath of the item concatenated with its name, e.g. '/Library/Fonts/Hei.ttf' PATHS 1) parentpath and fullpath don't support abbreviations such as ~ . or .. They're just strings. 2) Directory paths all have a '/' on the end.

FILESLINES TABLE

The FILESLINES table is for querying data content of files. It has line number and data columns, with one row for each line of data in each file scanned by Crab.

This table isn't available in the example dataset, because it's a virtual table and doesn't physically contain data.

COLUMNS linenumber (int) -- line number within file, restarts count from 1 at the first line of each file data (text) -- data content of the files, one entry for each line

FILESLINES also duplicates the columns of the FILES table: fileid, name, bytes, depth, accessed, modified, basename, extension, type, mode, parentpath, and fullpath. This way you can restrict which files are searched without having to join tables.

Example Gutenberg data

An example SQLite database (Mac data), database.sqlite, has been uploaded for you to play with. It includes an example files table...
Aluminum alloy industrial materials defect
figshare.com
zip
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ying Han; Yugang Wang (2024). Aluminum alloy industrial materials defect [Dataset]. http://doi.org/10.6084/m9.figshare.27922929.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27922929.v3
Dataset updated
Dec 3, 2024
Dataset provided by
figshare
Authors
Ying Han; Yugang Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset used in this study experiment was from the preliminary competition dataset of the 2018 Guangdong Industrial Intelligent Manufacturing Big Data Intelligent Algorithm Competition organized by Tianchi Feiyue Cloud (https://tianchi.aliyun.com/competition/entrance/231682/introduction). We have selected the dataset, removing images that do not meet the requirements of our experiment. All datasets have been classified for training and testing. The image pixels are all 2560×1960. Before training, all defects need to be labeled using labelimg and saved as json files. Then, all json files are converted to txt files. Finally, the organized defect dataset is detected and classified.Description of the data and file structureThis is a project based on the YOLOv8 enhanced algorithm for aluminum defect classification and detection tasks.All code has been tested on Windows computers with Anaconda and CUDA-enabled GPUs. The following instructions allow users to run the code in this repository based on a Windows+CUDA GPU system already in use.Files and variablesFile: defeat_dataset.zipDescription:SetupPlease follow the steps below to set up the project:Download Project RepositoryDownload the project repository defeat_dataset.zip from the following location.Unzip and navigate to the project folder; it should contain a subfolder: quexian_datasetDownload data1.Download data .defeat_dataset.zip2.Unzip the downloaded data and move the 'defeat_dataset' folder into the project's main folder.3. Make sure that your defeat_dataset folder now contains a subfolder: quexian_dataset.4. Within the folder you should find various subfolders such as addquexian-13, quexian_dataset, new_dataset-13, etc.softwareSet up the Python environment1.Download and install the Anaconda.2.Once Anaconda is installed, activate the Anaconda Prompt. For Windows, click Start, search for Anaconda Prompt, and open it.3.Create a new conda environment with Python 3.8. You can name it whatever you like; for example. Enter the following command: conda create -n yolov8 python=3.84.Activate the created environment. If the name is , enter: conda activate yolov8Download and install the Visual Studio Code.Install PyTorch based on your system:For Windows/Linux users with a CUDA GPU: bash conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forgeInstall some necessary libraries:Install scikit-learn with the command: conda install anaconda scikit-learn=0.24.1Install astropy with: conda install astropy=4.2.1Install pandas using: conda install anaconda pandas=1.2.4Install Matplotlib with: conda install conda-forge matplotlib=3.5.3Install scipy by entering: conda install scipy=1.10.1RepeatabilityFor PyTorch, it's a well-known fact:There is no guarantee of fully reproducible results between PyTorch versions, individual commits, or different platforms. In addition, results may not be reproducible between CPU and GPU executions, even if the same seed is used.All results in the Analysis Notebook that involve only model evaluation are fully reproducible. However, when it comes to updating the model on the GPU, the results of model training on different machines vary.Access informationOther publicly accessible locations of the data:https://tianchi.aliyun.com/dataset/public/Data was derived from the following sources:https://tianchi.aliyun.com/dataset/140666Data availability statementThe ten datasets used in this study come from Guangdong Industrial Wisdom Big Data Innovation Competition - Intelligent Algorithm Competition Rematch. and the dataset download link is https://tianchi.aliyun.com/competition/entrance/231682/information?lang=en-us. Officially, there are 4,356 images, including single blemish images, multiple blemish images and no blemish images. The official website provides 4,356 images, including single defect images, multiple defect images and no defect images. We have selected only single defect images and multiple defect images, which are 3,233 images in total. The ten defects are non-conductive, effacement, miss bottom corner, orange, peel, varicolored, jet, lacquer bubble, jump into a pit, divulge the bottom and blotch. Each image contains one or more defects, and the resolution of the defect images are all 2560×1920.By investigating the literature, we found that most of the experiments were done with 10 types of defects, so we chose three more types of defects that are more different from these ten types and more in number, which are suitable for the experiments. The three newly added datasets come from the preliminary dataset of Guangdong Industrial Wisdom Big Data Intelligent Algorithm Competition. The dataset can be downloaded from https://tianchi.aliyun.com/dataset/140666. There are 3,000 images in total, among which 109, 73 and 43 images are for the defects of bruise, camouflage and coating cracking respectively. Finally, the 10 types of defects in the rematch and the 3 types of defects selected in the preliminary round are fused into a new dataset, which is examined in this dataset.In the processing of the dataset, we tried different division ratios, such as 8:2, 7:3, 7:2:1, etc. After testing, we found that the experimental results did not differ much for different division ratios. Therefore, we divide the dataset according to the ratio of 7:2:1, the training set accounts for 70%, the validation set accounts for 20%, and the testing set accounts for 10%. At the same time, the random number seed is set to 0 to ensure that the results obtained are consistent every time the model is trained.Finally, the mean Average Precision (mAP) metric obtained from the experiment was tested on the dataset a total of three times. Each time the results differed very little, but for the accuracy of the experimental results, we took the average value derived from the highest and lowest results. The highest was 71.5% and the lowest was 71.1%, resulting in an average detection accuracy of 71.3% for the final experiment.All data and images utilized in this research are from publicly available sources, and the original creators have given their consent for these materials to be published in open-access formats.The settings for other parameters are as follows. epochs: 200，patience: 50，batch: 16，imgsz: 640，pretrained: true，optimizer: SGD，close_mosaic: 10，iou: 0.7，momentum: 0.937，weight_decay: 0.0005，box: 7.5，cls: 0.5，dfl: 1.5，pose: 12.0，kobj: 1.0，save_dir: runs/trainThe defeat_dataset.(ZIP)is mentioned in the Supporting information section of our manuscript. The underlying data are held at Figshare. DOI: 10.6084/m9.figshare.27922929.The results_images.zipin the system contains the experimental results graphs.The images_1.zipand images_2.zipin the system contain all the images needed to generate the manuscript.tex manuscript.
g
COCO Dataset 2017
gts.ai
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS, COCO Dataset 2017 [Dataset]. https://gts.ai/dataset-download/coco-dataset-2017/
Explore at:
jsonAvailable download formats
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset.
i
Malware Analysis Datasets: Top-1000 PE Imports
ieee-dataport.org
Updated Nov 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angelo Oliveira (2019). Malware Analysis Datasets: Top-1000 PE Imports [Dataset]. https://ieee-dataport.org/open-access/malware-analysis-datasets-top-1000-pe-imports
Explore at:
Dataset updated
Nov 8, 2019
Authors
Angelo Oliveira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data: Top-1000 imported functions extracted from the 'pe_imports' elements of Cuckoo Sandbox reports. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.
SGA Pro (elevator storage)
catalog.data.gov
Updated Jun 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). SGA Pro (elevator storage) [Dataset]. https://catalog.data.gov/dataset/sga-pro-elevator-storage-0a249
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
What is Stored Grain Advisor? Stored Grain Advisor (SGA) is a decision support system for the managemement of insect pests of farm-stored wheat. SGA predicts the likelihood of insect infestation, and recommends preventative and remedial action. It also provides advice on how to sample and identify insect pests of stored wheat. Computer models of insect population growth allow SGA to predict future insect populations in the grain bin, as well as the breakdown of insecticides, the effects of fumigation, and cooling the wheat with aeration. The ability of Stored Grain Advisor to graphically show insect population trends makes it a powerful educational tool. Requirements Version 3.04 runs under Microsoft Windows 98, 2000, XP, and 32 bit Vista. Instructions Remove any previous versions of SGA using the uninstaller included with the program. Download SgaSetup.exe to your computer. Run SgaSetup.exe and follow the Installer's instructions. Delete SgaSetup.exe. SGA Pro SGA Pro was designed for use in commercial elevators as part of the Areawide IPM Project for stored grain. Grain samples are taken with a vacuum probe and processed over an inclined sieve. SGA Pro analyzes the insect data, grain temperatures and moistures, and determines which bins need to be fumigated. (NOTE: available but unsupported.) This program runs under Microsoft Windows 98, 2000, XP, Vista, and Win7. Note: Win7 may require Windows Classic theme to display properly. SGA Pro was designed for use in commercial elevators (concrete silos, etc). This system takes a sampling based approach to managing insect pests. Grain samples are taken with a vacuum probe, and processed over an inclined sieve. SGA Pro analyzes the insect data, grain temperatures and moistures, and determines which bins need to be fumigated. This software was developed for the Areawide IPM Project. Resources in this dataset:Resource Title: SGA Pro download page. File Name: Web Page, url: https://www.ars.usda.gov/research/software/download/?softwareid=81&modecode=30-20-05-20
COKI Language Dataset
zenodo.org
application/gzip, csv
Updated Jun 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon (2022). COKI Language Dataset [Dataset]. http://doi.org/10.5281/zenodo.6636625
Explore at:
application/gzip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6636625
Dataset updated
Jun 16, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The COKI Language Dataset contains predictions for 122 million academic publications. The dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

Methodology
A subset of the COKI Academic Observatory Dataset, which is produced by the Academic Observatory Workflows codebase [1], was extracted and converted to CSV with Bigquery and downloaded to a virtual machine. The subset consists of all publications with DOIs in our dataset, including each publication’s title and abstract from both Crossref Metadata and Microsoft Academic Graph. The CSV files were then processed with a Python script. The titles and abstracts for each record were pre-processed, concatenated together and analysed with fastText. The titles and abstracts from Crossref Metadata were used first, with the MAG titles and abstracts serving as a fallback when the Crossref Metadata information was empty. Language was predicted for each publication using the fastText lid.176.bin language identification model [2]. fastText was chosen because of its high accuracy and fast runtime speed [3]. The final output dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

Query or Download
The data is publicly accessible in BigQuery in the following two tables:

coki-data-share.language.doi_language

coki-data-share.language.iso_language

When you make queries on these tables, make sure that you are in your own Google Cloud project, otherwise the queries will fail.

See the COKI Language Detection README for instructions on how to download the data from Zenodo and load it into BigQuery.

Code
The code that generated this dataset, the BigQuery schemas and instructions for loading the data into BigQuery can be found here: https://github.com/The-Academic-Observatory/coki-language

License
COKI Language Dataset © 2022 by Curtin University is licenced under CC BY 4.0.

Attributions
This work contains information from:

Microsoft Academic Graph which is made available under the ODC Attribution Licence.

Crossref Metadata via the Metadata Plus program. Bibliographic metadata is made available without copyright restriction and Crossref generated data under a CC0 licence. See metadata licence information for more details.

References
[1] https://doi.org/10.5281/zenodo.6366695
[2] https://fasttext.cc/docs/en/language-identification.html
[3] https://modelpredict.com/language-identification-survey
e
ECOLANG Corpus - Dataset - B2FIND
b2find.eudat.eu
Updated Jan 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ECOLANG Corpus - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/37ea2085-c62e-51f9-b84b-5f070b00b7dc
Explore at:
Dataset updated
Jan 10, 2025
Description
The ECOLANG Multimodal Corpus of adult-child and adult-adult conversation provides audiovisual recordings and annotation of multimodal communicative behaviours by English-speaking adults and children engaged in semi-naturalistic conversation.CorpusThe corpus provides audiovisual recordings and annotation of multimodal behaviours (speech transcription, gesture, object manipulation, and eye gaze) by British and American English-speaking adults engaged in semi-naturalistic conversation with their child (N = 38, children 3-4 years old) or a familiar adult (N = 31). Speakers were asked to talk about objects (familiar or unfamiliar) to their interlocutors both when the objects were physically present or absent. Thus, the corpus characterises the use of multimodal signals in social interaction and their modulations depending upon the age of the interlocutor (child or adult); whether the interlocutor is learning new concepts/words (unfamiliar or familiar objects) and whether they can see and manipulate (present or absent) the objects.ApplicationThe corpus provides ecologically-valid data about the distribution and cooccurrence of the multimodal signals for cognitive scientists and neuroscientists to address questions about real-world language learning and processing; and for computer scientists to develop more human-like artificial agents.Data access requires permission.To obtain permission to view or download the video data (either viewing in your browser or downloading to your computer), please download the user license at https://www.ucl.ac.uk/pals/sites/pals/files/eula_ecolang.pdf, fill in the form and return it to ecolang@ucl.ac.uk. User licenses are granted in batches every few weeks.To view the eaf annotation files, you will need to download and install the software ELAN, available for free for Mac, Windows and Linux.
a
BrowardCountyBuildingFootprints
hub.arcgis.com
data.pompanobeachfl.gov
+1more
Updated Dec 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BCGISData (2020). BrowardCountyBuildingFootprints [Dataset]. https://hub.arcgis.com/datasets/35570179437d4ab49379fd30e94a8c28
Explore at:
Dataset updated
Dec 3, 2020
Dataset authored and provided by
BCGISData
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Area covered

Description
Polygons of the buildings footprints clipped Broward County. This is a product MicroSoft. The orginal dataset This dataset contains 125,192,184 computer generated building footprints in all 50 US states. This data is freely available for download and use.The data set was clipped to the Broward County developed boundary.Additional information
R
5 Eq School Dataset
universe.roboflow.com
zip
Updated Feb 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sample Model (2024). 5 Eq School Dataset [Dataset]. https://universe.roboflow.com/sample-model-lhqki/5-eq-school/model/2
Explore at:
zipAvailable download formats
Dataset updated
Feb 16, 2024
Dataset authored and provided by
Sample Model
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Fans Shelf Pc Powerstrip Windows Bounding Boxes
Description
5 EQ School

## Overview 5 EQ School is a dataset for object detection tasks - it contains Fans Shelf Pc Powerstrip Windows annotations for 7,280 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
Global Wetlands: Luderick Seagrass Dataset - Test Set Image Patches
zenodo.org
explore.openaire.eu
+1more
zip
Updated Mar 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scarlett Raine; Scarlett Raine; Ross Marchant; Ross Marchant; Brano Kusy; Brano Kusy; Frederic Maire; Frederic Maire; Tobias Fischer; Tobias Fischer (2023). Global Wetlands: Luderick Seagrass Dataset - Test Set Image Patches [Dataset]. http://doi.org/10.5281/zenodo.7659203
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7659203
Dataset updated
Mar 4, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Scarlett Raine; Scarlett Raine; Ross Marchant; Ross Marchant; Brano Kusy; Brano Kusy; Frederic Maire; Frederic Maire; Tobias Fischer; Tobias Fischer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is a test dataset of image patches created from the 'novel-test' split of the Global Wetlands Luderick-Seagrass dataset. The original images were divided as a grid into 50 image patches. The image patches were manually labeled into 'Background', 'Fish' and 'Seagrass' sets. The images were otherwise unaltered.

We contribute this test dataset of underwater image patches to facilitate evaluation of coarse segmentation seagrass methods.

Original dataset description: "This dataset comprises of annotated footage of Girella tricuspidata in two estuary systems in South East Queensland, Australia. This data is suitable for a range of classification and object detection research in unconstrained underwater environments."

Original dataset citation: Ditria, Ellen M; Connolly, Rod M; Jinks, Eric L; Lopez-Marcano, Sebastian (2021): Annotated video footage for automated identification and counting of fish in unconstrained marine environments. PANGAEA, https://doi.org/10.1594/PANGAEA.926930.

The original dataset is available at:
https://github.com/globalwetlands/luderick-seagrass
https://download.pangaea.de/dataset/926930/files/Fish_automated_identification_and_counting.zip
https://globalwetlands.blob.core.windows.net/globalwetlands-public/datasets/luderick-seagrass/luderick-seagrass.zip
h
Malawi Spectrum AIM 2025 - Dataset - The Document Management System
dms.hiv.health.gov.mw
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Malawi Spectrum AIM 2025 - Dataset - The Document Management System [Dataset]. https://dms.hiv.health.gov.mw/dataset/malawi-spectrum-aims-2025
Explore at:
Dataset updated
Jun 13, 2025
Area covered
Malawi
Description
To view results of the Malawi 2025 National AIM, download the "Malawi_2025_National_HIV_estimates_Spectrum_AIM_model" digest file from our website and open it within Spectrum. If the HIV Spectrum has not been installed on your machine, download the latest version of the Spectrum app from Avenir Health website on the following link https://avenirhealth.org/Download/Spectrum/SpecInstall.EXE and save it on your hard disk. Spectrum will run on any computer running Windows Vista, 7, 8, 10 or 11. The application requires about 500MB of hard disk space. Once Spectrum is downloaded from the internet, double click on the file named “SpecInstallAIM2021.exe”. This will start the installation program. Follow the instructions on the screen to complete the installation. If you have trouble installing Spectrum you may not have permission to install programs on your computer. In that case, contact your IT office to install Spectrum for you. After installing Spectrum, check your computer to make sure you have Java version 8 installed on your system. The easiest way to determine which version of Java you have is to click on the Windows start menu, select ‘All Apps’, click on Java and select ‘About Java’. If you do not have version 8 (or do not have Java), please download or update the software at www.java.com. Next, make sure that Windows can find Java on your computer. To do this, start Spectrum and open your country file. Select Modules from the Spectrum menu and click the AIM icon to display the AIM menu. Select Incidence and Configuration (EPP). If EPP starts after a few seconds, then you are ready to use EPP. If it does not start, then you need to tell Windows where to find Java. To do that select File and Options. Click the box next to Use custom java.exe to add a check mark. Then click the button Select java.exe. This will open Windows Explorer. You need to indicate the location of the java.exe file. To find it select the C: drive, then click Program Files (x86), then Java, then click the folder for the most recent release of Java, then click bin, and, finally, click java.exe. This location will be saved so that Spectrum will always be able to find Java. If you update your version of Java, you will need to repeat this process to ensure Spectrum has the latest Java location. Upon successful installation of the latest spectrum file, open the Malawi_2025_National_HIV_estimates_Spectrum_AIM_model" digest file from within the Spectrum
h
Malawi Spectrum AIM 2022 - Dataset - The Document Management System
dms.hiv.health.gov.mw
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dms.hiv.health.gov.mw, Malawi Spectrum AIM 2022 - Dataset - The Document Management System [Dataset]. https://dms.hiv.health.gov.mw/dataset/malawi-spectrum-aim-2022
Explore at:
Area covered
Malawi
Description
To view results of the Malawi 2022 National AIM, download the "Malawi_2022_National_HIV_estimates_Spectrum_AIM_model" digest file from our website and open it within Spectrum. If the HIV Spectrum has not been installed on your machine, download the latest version of the Spectrum app from Avenir Health website on the following link https://avenirhealth.org/Download/Spectrum/SpecInstall.EXE and save it on your hard disk. Spectrum will run on any computer running Windows Vista, 7, 8, 10 or 11. The application requires about 500MB of hard disk space. Once Spectrum is downloaded from the internet, double click on the file named “SpecInstallAIM2021.exe”. This will start the installation program. Follow the instructions on the screen to complete the installation. If you have trouble installing Spectrum you may not have permission to install programs on your computer. In that case, contact your IT office to install Spectrum for you. After installing Spectrum, check your computer to make sure you have Java version 8 installed on your system. The easiest way to determine which version of Java you have is to click on the Windows start menu, select ‘All Apps’, click on Java and select ‘About Java’. If you do not have version 8 (or do not have Java), please download or update the software at www.java.com. Next, make sure that Windows can find Java on your computer. To do this, start Spectrum and open your country file. Select Modules from the Spectrum menu and click the AIM icon to display the AIM menu. Select Incidence and Configuration (EPP). If EPP starts after a few seconds, then you are ready to use EPP. If it does not start, then you need to tell Windows where to find Java. To do that select File and Options. Click the box next to Use custom java.exe to add a check mark. Then click the button Select java.exe. This will open Windows Explorer. You need to indicate the location of the java.exe file. To find it select the C: drive, then click Program Files (x86), then Java, then click the folder for the most recent release of Java, then click bin, and, finally, click java.exe. This location will be saved so that Spectrum will always be able to find Java. If you update your version of Java, you will need to repeat this process to ensure Spectrum has the latest Java location. Upon successful installation of the latest spectrum file, open the Malawi_2022_National_HIV_estimates_Spectrum_AIM_model" digest file from within the Spectrum app.
i
Malware Analysis Datasets: Raw PE as Image
ieee-dataport.org
Updated Nov 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angelo Oliveira (2019). Malware Analysis Datasets: Raw PE as Image [Dataset]. https://ieee-dataport.org/open-access/malware-analysis-datasets-raw-pe-image
Explore at:
Dataset updated
Nov 7, 2019
Authors
Angelo Oliveira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data: Raw PE byte stream rescaled to a 32 x 32 greyscale image using the Nearest Neighbor Interpolation algorithm and then flattened to a 1024 bytes vector. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.
f
A Representative User-centric GitHub Developers Dataset for Malicious...
figshare.com
png
Updated Aug 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yushan Liu (2022). A Representative User-centric GitHub Developers Dataset for Malicious Account Detection [Dataset]. http://doi.org/10.6084/m9.figshare.20325195.v5
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20325195.v5
Dataset updated
Aug 22, 2022
Dataset provided by
figshare
Authors
Yushan Liu
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Using GitHub APIs, we construct an unbiased dataset of over 10 million GitHub users. The data was collected between Jul. 20 and Aug. 27, 2018, covering 10,000 users. Each data entry is stored in JSON format, representing one GitHub user, and containing the descriptive information in the user’s profile page, the information of her commit activities and created/forked public repositories.

Please cite the following paper when using the dataset: Qingyuan Gong, Jiayun Zhang, Yang Chen, Qi Li, Yu Xiao, Xin Wang, Pan Hui. Detecting Malicious Accounts in Online Developer Communities Using Deep Learning. Proc. of the 28th ACM International Conference on Information and Knowledge Management (CIKM'19), Beijing, China, Nov. 2019.

Download the file, and decompress the files with the following steps: (1) In MacOSX/Linux systems: (i) open the terminal and switch to the directory where you have downloaded the files (e.g. "cd Downloads/") (ii) run the command: "unzip Github_dataset.zip". (2) In Windows systems: use decompression software (e.g. WinZIP) to decompress the file "Github_dataset.zip".

Microsoft Stock Data 2025

kaggle.com

Updated Feb 4, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Umer Haddii (2025). Microsoft Stock Data 2025 [Dataset]. https://www.kaggle.com/datasets/umerhaddii/microsoft-stock-data-2025/code

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 4, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Umer Haddii

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Microsoft is an American company that develops and distributes software and services such as: a search engine (Bing), cloud solutions and the computer operating system Windows.

Market cap

Market capitalization of Microsoft (MSFT)

Market cap: $3.085 Trillion USD

As of February 2025 Microsoft has a market cap of $3.085 Trillion USD. This makes Microsoft the world's 2nd most valuable company by market cap according to our data. The market capitalization, commonly called market cap, is the total market value of a publicly traded company's outstanding shares and is commonly used to measure how much a company is worth.

Revenue

Revenue for Microsoft (MSFT)
Revenue in 2024 (TTM): $254.19 Billion USD

According to Microsoft's latest financial reports the company's current revenue (TTM ) is $254.19 Billion USD. In 2023 the company made a revenue of $227.58 Billion USD an increase over the revenue in the year 2022 that were of $204.09 Billion USD. The revenue is the total amount of income that a company generates by the sale of goods or services. Unlike with the earnings no expenses are subtracted.

Earnings

Earnings for Microsoft (MSFT)
Earnings in 2024 (TTM): $110.77 Billion USD

According to Microsoft's latest financial reports the company's current earnings are $254.19 Billion USD. In 2023 the company made an earning of $101.21 Billion USD, an increase over its 2022 earnings that were of $82.58 Billion USD. The earnings displayed on this page are the earnings before interest and taxes or simply EBIT.

End of Day market cap according to different sources On Feb 2nd, 2025 the market cap of Microsoft was reported to be:

$3.085 Trillion USD by Nasdaq
$3.085 Trillion USD by CompaniesMarketCap
$3.085 Trillion USD by Yahoo Finance

Content

Geography: USA

Time period: March 1986- February 2025

Unit of analysis: Microsoft Stock Data 2025

Variables

Variable	Description

date	date
open	The price at market open.
high	The highest price for that day.
low	The lowest price for that day.
close	The price at market close, adjusted for splits.
adj_close	The closing price after adjustments for all applicable splits and dividend distributions. Data is adjusted using appropriate split and dividend multipliers, adhering to Center for Research in Security Prices (CRSP) standards.
volume	The number of shares traded on that day.

Acknowledgements

This dataset belongs to me. I’m sharing it here for free. You may do with it as you wish.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18335022%2F0304ad0416e7e55515daf890288d7f7f%2FScreenshot%202025-02-03%20152019.png?generation=1738662588735376&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18335022%2Fba7629dd0c4dc3e2ea1dbac361b94de1%2FScreenshot%202025-02-03%20152147.png?generation=1738662611945343&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18335022%2Fa9f48f1ec5fdf2a363a138389294d5b0%2FScreenshot%202025-02-03%20152159.png?generation=1738662631268574&alt=media" alt="">

h
Malawi Spectrum AIM 2024 - Dataset - The Document Management System
dms.hiv.health.gov.mw
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dms.hiv.health.gov.mw, Malawi Spectrum AIM 2024 - Dataset - The Document Management System [Dataset]. https://dms.hiv.health.gov.mw/dataset/malawi-spectrum-aim-2024-file
Explore at:
Area covered
Malawi
Description
To view results of the Malawi 2024 National AIM, download the "Malawi_2024_National_HIV_estimates_Spectrum_AIM_model" digest file from our website and open it within Spectrum. If the HIV Spectrum has not been installed on your machine, download the latest version of the Spectrum app from Avenir Health website on the following link https://avenirhealth.org/Download/Spectrum/SpecInstall.EXE and save it on your hard disk. Spectrum will run on any computer running Windows Vista, 7, 8, 10 or 11. The application requires about 500MB of hard disk space. Once Spectrum is downloaded from the internet, double click on the file named “SpecInstallAIM2021.exe”. This will start the installation program. Follow the instructions on the screen to complete the installation. If you have trouble installing Spectrum you may not have permission to install programs on your computer. In that case, contact your IT office to install Spectrum for you. After installing Spectrum, check your computer to make sure you have Java version 8 installed on your system. The easiest way to determine which version of Java you have is to click on the Windows start menu, select ‘All Apps’, click on Java and select ‘About Java’. If you do not have version 8 (or do not have Java), please download or update the software at www.java.com. Next, make sure that Windows can find Java on your computer. To do this, start Spectrum and open your country file. Select Modules from the Spectrum menu and click the AIM icon to display the AIM menu. Select Incidence and Configuration (EPP). If EPP starts after a few seconds, then you are ready to use EPP. If it does not start, then you need to tell Windows where to find Java. To do that select File and Options. Click the box next to Use custom java.exe to add a check mark. Then click the button Select java.exe. This will open Windows Explorer. You need to indicate the location of the java.exe file. To find it select the C: drive, then click Program Files (x86), then Java, then click the folder for the most recent release of Java, then click bin, and, finally, click java.exe. This location will be saved so that Spectrum will always be able to find Java. If you update your version of Java, you will need to repeat this process to ensure Spectrum has the latest Java location. Upon successful installation of the latest spectrum file, open the Malawi_2024_National_HIV_estimates_Spectrum_AIM_model" digest file from within the Spectrum
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Anh Pham Tuan; An Tran Hung Phuong; Nguyen Vu Thanh; Toan Nguyen Van (2023). Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.6635642.v1

Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset

Explore at:

10 scholarly articles cite this dataset (View in Google Scholar)

application/x-rarAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.6635642.v1

Dataset updated

May 30, 2023

Dataset provided by

Figsharehttp://figshare.com/
figshare

Authors

Anh Pham Tuan; An Tran Hung Phuong; Nguyen Vu Thanh; Toan Nguyen Van

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset contains 8970 malware and 1000 benign binaries files. Malware files which are divided into 5 types: Locker (300), Mediyes (1450), Winwebsec (4400), Zbot (2100), Zeroaccess (690). All of malware files are collected from https://virusshare.com/ and malicia-project.com. Benign excutable files are taken from installed folders of applications of legitimate software from different categories. They can be downloaded in https://download.cnet.com/windows/. All of files are verified by VirusTotal (https://www.virustotal.com) to make sure each file belong to their type. Note: This dataset includes malware so it can harm your computer.

Clear search

Close search

Google apps

Main menu

Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset

The files on your computer

Dataset: The files on your computer.

Content

FILES TABLE

FILESLINES TABLE

Example Gutenberg data

Aluminum alloy industrial materials defect

COCO Dataset 2017

Malware Analysis Datasets: Top-1000 PE Imports

SGA Pro (elevator storage)

COKI Language Dataset

ECOLANG Corpus - Dataset - B2FIND

BrowardCountyBuildingFootprints

5 Eq School Dataset

5 EQ School

Global Wetlands: Luderick Seagrass Dataset - Test Set Image Patches

Malawi Spectrum AIM 2025 - Dataset - The Document Management System

Malawi Spectrum AIM 2022 - Dataset - The Document Management System

Malware Analysis Datasets: Raw PE as Image

A Representative User-centric GitHub Developers Dataset for Malicious...

Microsoft Stock Data 2025

Context

Market cap

Revenue

Earnings

Content

Variables

Acknowledgements

Malawi Spectrum AIM 2024 - Dataset - The Document Management System

Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset