17 datasets found
  1. Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset

    • figshare.com
    application/x-rar
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anh Pham Tuan; An Tran Hung Phuong; Nguyen Vu Thanh; Toan Nguyen Van (2023). Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.6635642.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Anh Pham Tuan; An Tran Hung Phuong; Nguyen Vu Thanh; Toan Nguyen Van
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset contains 8970 malware and 1000 benign binaries files. Malware files which are divided into 5 types: Locker (300), Mediyes (1450), Winwebsec (4400), Zbot (2100), Zeroaccess (690). All of malware files are collected from https://virusshare.com/ and malicia-project.com. Benign excutable files are taken from installed folders of applications of legitimate software from different categories. They can be downloaded in https://download.cnet.com/windows/. All of files are verified by VirusTotal (https://www.virustotal.com) to make sure each file belong to their type. Note: This dataset includes malware so it can harm your computer.

  2. The files on your computer

    • kaggle.com
    Updated Jan 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cogs (2017). The files on your computer [Dataset]. https://www.kaggle.com/cogitoe/crab/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    cogs
    Description

    Dataset: The files on your computer.

    Crab is a command line tool for Mac and Windows that scans file data into a SQLite database, so you can run SQL queries over it.

    e.g. (Win)    C:> crab C:somepathMyProject
    or (Mac)    $ crab /some/path/MyProject
    

    You get a CRAB> prompt where you can enter SQL queries on the data, e.g. Count files by extension

    SELECT extension, count(*) 
    FROM files 
    GROUP BY extension;
    

    e.g. List the 5 biggest directories

    SELECT parentpath, sum(bytes)/1e9 as GB 
    FROM files 
    GROUP BY parentpath 
    ORDER BY sum(bytes) DESC LIMIT 5;
    

    Crab provides a virtual table, fileslines, which exposes file contents to SQL

    e.g. Count TODO and FIXME entries in any .c files, recursively

    SELECT fullpath, count(*) FROM fileslines 
    WHERE parentpath like '/Users/GN/HL3/%' and extension = '.c'
      and (data like '%TODO%' or data like '%FIXME%')
    GROUP BY fullpath;
    

    As well there are functions to run programs or shell commands on any subset of files, or lines within files e.g. (Mac) unzip all the .zip files, recursively

    SELECT exec('unzip', '-n', fullpath, '-d', '/Users/johnsmith/Target Dir/') 
    FROM files 
    WHERE parentpath like '/Users/johnsmith/Source Dir/%' and extension = '.zip';
    

    (Here -n tells unzip not to overwrite anything, and -d specifies target directory)

    There is also a function to write query output to file, e.g. (Win) Sort the lines of all the .txt files in a directory and write them to a new file

    SELECT writeln('C:UsersSJohnsondictionary2.txt', data) 
    FROM fileslines 
    WHERE parentpath = 'C:UsersSJohnson' and extension = '.txt'
    ORDER BY data;
    

    In place of the interactive prompt you can run queries in batch mode. E.g. Here is a one-liner that returns the full path all the files in the current directory

    C:> crab -batch -maxdepth 1 . "SELECT fullpath FROM files"
    

    Crab SQL can also be used in Windows batch files, or Bash scripts, e.g. for ETL processing.

    Crab is free for personal use, $5/mo commercial

    See more details here (mac): [http://etia.co.uk/][1] or here (win): [http://etia.co.uk/win/about/][2]

    An example SQLite database (Mac data) has been uploaded for you to play with. It includes an example files table for the directory tree you get when downloading the Project Gutenberg corpus, which contains 95k directories and 123k files.

    To scan your own files, and get access to the virtual tables and support functions you have to use the Crab SQLite shell, available for download from this page (Mac): [http://etia.co.uk/download/][3] or this page (Win): [http://etia.co.uk/win/download/][4]

    Content

    FILES TABLE

    The FILES table contains details of every item scanned, file or directory. All columns are indexed except 'mode'

    COLUMNS
     fileid (int) primary key -- files table row number, a unique id for each item
     name (text)        -- item name e.g. 'Hei.ttf'
     bytes (int)        -- item size in bytes e.g. 7502752
     depth (int)        -- how far scan recursed to find the item, starts at 0
     accessed (text)      -- datetime item was accessed
     modified (text)      -- datetime item was modified
     basename (text)      -- item name without path or extension, e.g. 'Hei'
     extension (text)     -- item extension including the dot, e.g. '.ttf'
     type (text)        -- item type, 'f' for file or 'd' for directory
     mode (text)        -- further type info and permissions, e.g. 'drwxr-xr-x'
     parentpath (text)     -- absolute path of directory containing the item, e.g. '/Library/Fonts/'
     fullpath (text) unique  -- parentpath of the item concatenated with its name, e.g. '/Library/Fonts/Hei.ttf'
    
    PATHS
    1) parentpath and fullpath don't support abbreviations such as ~ . or .. They're just strings.
    2) Directory paths all have a '/' on the end.
    

    FILESLINES TABLE

    The FILESLINES table is for querying data content of files. It has line number and data columns, with one row for each line of data in each file scanned by Crab.

    This table isn't available in the example dataset, because it's a virtual table and doesn't physically contain data.

    COLUMNS
     linenumber (int) -- line number within file, restarts count from 1 at the first line of each file
     data (text)    -- data content of the files, one entry for each line
    

    FILESLINES also duplicates the columns of the FILES table: fileid, name, bytes, depth, accessed, modified, basename, extension, type, mode, parentpath, and fullpath. This way you can restrict which files are searched without having to join tables.

    Example Gutenberg data

    An example SQLite database (Mac data), database.sqlite, has been uploaded for you to play with. It includes an example files table...

  3. Aluminum alloy industrial materials defect

    • figshare.com
    zip
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ying Han; Yugang Wang (2024). Aluminum alloy industrial materials defect [Dataset]. http://doi.org/10.6084/m9.figshare.27922929.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    figshare
    Authors
    Ying Han; Yugang Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset used in this study experiment was from the preliminary competition dataset of the 2018 Guangdong Industrial Intelligent Manufacturing Big Data Intelligent Algorithm Competition organized by Tianchi Feiyue Cloud (https://tianchi.aliyun.com/competition/entrance/231682/introduction). We have selected the dataset, removing images that do not meet the requirements of our experiment. All datasets have been classified for training and testing. The image pixels are all 2560×1960. Before training, all defects need to be labeled using labelimg and saved as json files. Then, all json files are converted to txt files. Finally, the organized defect dataset is detected and classified.Description of the data and file structureThis is a project based on the YOLOv8 enhanced algorithm for aluminum defect classification and detection tasks.All code has been tested on Windows computers with Anaconda and CUDA-enabled GPUs. The following instructions allow users to run the code in this repository based on a Windows+CUDA GPU system already in use.Files and variablesFile: defeat_dataset.zipDescription:SetupPlease follow the steps below to set up the project:Download Project RepositoryDownload the project repository defeat_dataset.zip from the following location.Unzip and navigate to the project folder; it should contain a subfolder: quexian_datasetDownload data1.Download data .defeat_dataset.zip2.Unzip the downloaded data and move the 'defeat_dataset' folder into the project's main folder.3. Make sure that your defeat_dataset folder now contains a subfolder: quexian_dataset.4. Within the folder you should find various subfolders such as addquexian-13, quexian_dataset, new_dataset-13, etc.softwareSet up the Python environment1.Download and install the Anaconda.2.Once Anaconda is installed, activate the Anaconda Prompt. For Windows, click Start, search for Anaconda Prompt, and open it.3.Create a new conda environment with Python 3.8. You can name it whatever you like; for example. Enter the following command: conda create -n yolov8 python=3.84.Activate the created environment. If the name is , enter: conda activate yolov8Download and install the Visual Studio Code.Install PyTorch based on your system:For Windows/Linux users with a CUDA GPU: bash conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forgeInstall some necessary libraries:Install scikit-learn with the command: conda install anaconda scikit-learn=0.24.1Install astropy with: conda install astropy=4.2.1Install pandas using: conda install anaconda pandas=1.2.4Install Matplotlib with: conda install conda-forge matplotlib=3.5.3Install scipy by entering: conda install scipy=1.10.1RepeatabilityFor PyTorch, it's a well-known fact:There is no guarantee of fully reproducible results between PyTorch versions, individual commits, or different platforms. In addition, results may not be reproducible between CPU and GPU executions, even if the same seed is used.All results in the Analysis Notebook that involve only model evaluation are fully reproducible. However, when it comes to updating the model on the GPU, the results of model training on different machines vary.Access informationOther publicly accessible locations of the data:https://tianchi.aliyun.com/dataset/public/Data was derived from the following sources:https://tianchi.aliyun.com/dataset/140666Data availability statementThe ten datasets used in this study come from Guangdong Industrial Wisdom Big Data Innovation Competition - Intelligent Algorithm Competition Rematch. and the dataset download link is https://tianchi.aliyun.com/competition/entrance/231682/information?lang=en-us. Officially, there are 4,356 images, including single blemish images, multiple blemish images and no blemish images. The official website provides 4,356 images, including single defect images, multiple defect images and no defect images. We have selected only single defect images and multiple defect images, which are 3,233 images in total. The ten defects are non-conductive, effacement, miss bottom corner, orange, peel, varicolored, jet, lacquer bubble, jump into a pit, divulge the bottom and blotch. Each image contains one or more defects, and the resolution of the defect images are all 2560×1920.By investigating the literature, we found that most of the experiments were done with 10 types of defects, so we chose three more types of defects that are more different from these ten types and more in number, which are suitable for the experiments. The three newly added datasets come from the preliminary dataset of Guangdong Industrial Wisdom Big Data Intelligent Algorithm Competition. The dataset can be downloaded from https://tianchi.aliyun.com/dataset/140666. There are 3,000 images in total, among which 109, 73 and 43 images are for the defects of bruise, camouflage and coating cracking respectively. Finally, the 10 types of defects in the rematch and the 3 types of defects selected in the preliminary round are fused into a new dataset, which is examined in this dataset.In the processing of the dataset, we tried different division ratios, such as 8:2, 7:3, 7:2:1, etc. After testing, we found that the experimental results did not differ much for different division ratios. Therefore, we divide the dataset according to the ratio of 7:2:1, the training set accounts for 70%, the validation set accounts for 20%, and the testing set accounts for 10%. At the same time, the random number seed is set to 0 to ensure that the results obtained are consistent every time the model is trained.Finally, the mean Average Precision (mAP) metric obtained from the experiment was tested on the dataset a total of three times. Each time the results differed very little, but for the accuracy of the experimental results, we took the average value derived from the highest and lowest results. The highest was 71.5% and the lowest was 71.1%, resulting in an average detection accuracy of 71.3% for the final experiment.All data and images utilized in this research are from publicly available sources, and the original creators have given their consent for these materials to be published in open-access formats.The settings for other parameters are as follows. epochs: 200,patience: 50,batch: 16,imgsz: 640,pretrained: true,optimizer: SGD,close_mosaic: 10,iou: 0.7,momentum: 0.937,weight_decay: 0.0005,box: 7.5,cls: 0.5,dfl: 1.5,pose: 12.0,kobj: 1.0,save_dir: runs/trainThe defeat_dataset.(ZIP)is mentioned in the Supporting information section of our manuscript. The underlying data are held at Figshare. DOI: 10.6084/m9.figshare.27922929.The results_images.zipin the system contains the experimental results graphs.The images_1.zipand images_2.zipin the system contain all the images needed to generate the manuscript.tex manuscript.

  4. g

    COCO Dataset 2017

    • gts.ai
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS, COCO Dataset 2017 [Dataset]. https://gts.ai/dataset-download/coco-dataset-2017/
    Explore at:
    jsonAvailable download formats
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset.

  5. i

    Malware Analysis Datasets: Top-1000 PE Imports

    • ieee-dataport.org
    Updated Nov 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angelo Oliveira (2019). Malware Analysis Datasets: Top-1000 PE Imports [Dataset]. https://ieee-dataport.org/open-access/malware-analysis-datasets-top-1000-pe-imports
    Explore at:
    Dataset updated
    Nov 8, 2019
    Authors
    Angelo Oliveira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data: Top-1000 imported functions extracted from the 'pe_imports' elements of Cuckoo Sandbox reports. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.

  6. SGA Pro (elevator storage)

    • catalog.data.gov
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). SGA Pro (elevator storage) [Dataset]. https://catalog.data.gov/dataset/sga-pro-elevator-storage-0a249
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    What is Stored Grain Advisor? Stored Grain Advisor (SGA) is a decision support system for the managemement of insect pests of farm-stored wheat. SGA predicts the likelihood of insect infestation, and recommends preventative and remedial action. It also provides advice on how to sample and identify insect pests of stored wheat. Computer models of insect population growth allow SGA to predict future insect populations in the grain bin, as well as the breakdown of insecticides, the effects of fumigation, and cooling the wheat with aeration. The ability of Stored Grain Advisor to graphically show insect population trends makes it a powerful educational tool. Requirements Version 3.04 runs under Microsoft Windows 98, 2000, XP, and 32 bit Vista. Instructions Remove any previous versions of SGA using the uninstaller included with the program. Download SgaSetup.exe to your computer. Run SgaSetup.exe and follow the Installer's instructions. Delete SgaSetup.exe. SGA Pro SGA Pro was designed for use in commercial elevators as part of the Areawide IPM Project for stored grain. Grain samples are taken with a vacuum probe and processed over an inclined sieve. SGA Pro analyzes the insect data, grain temperatures and moistures, and determines which bins need to be fumigated. (NOTE: available but unsupported.) This program runs under Microsoft Windows 98, 2000, XP, Vista, and Win7. Note: Win7 may require Windows Classic theme to display properly. SGA Pro was designed for use in commercial elevators (concrete silos, etc). This system takes a sampling based approach to managing insect pests. Grain samples are taken with a vacuum probe, and processed over an inclined sieve. SGA Pro analyzes the insect data, grain temperatures and moistures, and determines which bins need to be fumigated. This software was developed for the Areawide IPM Project. Resources in this dataset:Resource Title: SGA Pro download page. File Name: Web Page, url: https://www.ars.usda.gov/research/software/download/?softwareid=81&modecode=30-20-05-20

  7. COKI Language Dataset

    • zenodo.org
    application/gzip, csv
    Updated Jun 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon (2022). COKI Language Dataset [Dataset]. http://doi.org/10.5281/zenodo.6636625
    Explore at:
    application/gzip, csvAvailable download formats
    Dataset updated
    Jun 16, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The COKI Language Dataset contains predictions for 122 million academic publications. The dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

    Methodology
    A subset of the COKI Academic Observatory Dataset, which is produced by the Academic Observatory Workflows codebase [1], was extracted and converted to CSV with Bigquery and downloaded to a virtual machine. The subset consists of all publications with DOIs in our dataset, including each publication’s title and abstract from both Crossref Metadata and Microsoft Academic Graph. The CSV files were then processed with a Python script. The titles and abstracts for each record were pre-processed, concatenated together and analysed with fastText. The titles and abstracts from Crossref Metadata were used first, with the MAG titles and abstracts serving as a fallback when the Crossref Metadata information was empty. Language was predicted for each publication using the fastText lid.176.bin language identification model [2]. fastText was chosen because of its high accuracy and fast runtime speed [3]. The final output dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

    Query or Download
    The data is publicly accessible in BigQuery in the following two tables:

    When you make queries on these tables, make sure that you are in your own Google Cloud project, otherwise the queries will fail.

    See the COKI Language Detection README for instructions on how to download the data from Zenodo and load it into BigQuery.

    Code
    The code that generated this dataset, the BigQuery schemas and instructions for loading the data into BigQuery can be found here: https://github.com/The-Academic-Observatory/coki-language

    License
    COKI Language Dataset © 2022 by Curtin University is licenced under CC BY 4.0.

    Attributions
    This work contains information from:

    References
    [1] https://doi.org/10.5281/zenodo.6366695
    [2] https://fasttext.cc/docs/en/language-identification.html
    [3] https://modelpredict.com/language-identification-survey

  8. e

    ECOLANG Corpus - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Jan 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ECOLANG Corpus - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/37ea2085-c62e-51f9-b84b-5f070b00b7dc
    Explore at:
    Dataset updated
    Jan 10, 2025
    Description

    The ECOLANG Multimodal Corpus of adult-child and adult-adult conversation provides audiovisual recordings and annotation of multimodal communicative behaviours by English-speaking adults and children engaged in semi-naturalistic conversation.CorpusThe corpus provides audiovisual recordings and annotation of multimodal behaviours (speech transcription, gesture, object manipulation, and eye gaze) by British and American English-speaking adults engaged in semi-naturalistic conversation with their child (N = 38, children 3-4 years old) or a familiar adult (N = 31). Speakers were asked to talk about objects (familiar or unfamiliar) to their interlocutors both when the objects were physically present or absent. Thus, the corpus characterises the use of multimodal signals in social interaction and their modulations depending upon the age of the interlocutor (child or adult); whether the interlocutor is learning new concepts/words (unfamiliar or familiar objects) and whether they can see and manipulate (present or absent) the objects.ApplicationThe corpus provides ecologically-valid data about the distribution and cooccurrence of the multimodal signals for cognitive scientists and neuroscientists to address questions about real-world language learning and processing; and for computer scientists to develop more human-like artificial agents.Data access requires permission.To obtain permission to view or download the video data (either viewing in your browser or downloading to your computer), please download the user license at https://www.ucl.ac.uk/pals/sites/pals/files/eula_ecolang.pdf, fill in the form and return it to ecolang@ucl.ac.uk. User licenses are granted in batches every few weeks.To view the eaf annotation files, you will need to download and install the software ELAN, available for free for Mac, Windows and Linux.

  9. a

    BrowardCountyBuildingFootprints

    • hub.arcgis.com
    • data.pompanobeachfl.gov
    • +1more
    Updated Dec 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BCGISData (2020). BrowardCountyBuildingFootprints [Dataset]. https://hub.arcgis.com/datasets/35570179437d4ab49379fd30e94a8c28
    Explore at:
    Dataset updated
    Dec 3, 2020
    Dataset authored and provided by
    BCGISData
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Area covered
    Description

    Polygons of the buildings footprints clipped Broward County. This is a product MicroSoft. The orginal dataset This dataset contains 125,192,184 computer generated building footprints in all 50 US states. This data is freely available for download and use.The data set was clipped to the Broward County developed boundary.Additional information

  10. R

    5 Eq School Dataset

    • universe.roboflow.com
    zip
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sample Model (2024). 5 Eq School Dataset [Dataset]. https://universe.roboflow.com/sample-model-lhqki/5-eq-school/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 16, 2024
    Dataset authored and provided by
    Sample Model
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Fans Shelf Pc Powerstrip Windows Bounding Boxes
    Description

    5 EQ School

    ## Overview
    
    5 EQ School is a dataset for object detection tasks - it contains Fans Shelf Pc Powerstrip Windows annotations for 7,280 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
    
  11. Global Wetlands: Luderick Seagrass Dataset - Test Set Image Patches

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated Mar 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scarlett Raine; Scarlett Raine; Ross Marchant; Ross Marchant; Brano Kusy; Brano Kusy; Frederic Maire; Frederic Maire; Tobias Fischer; Tobias Fischer (2023). Global Wetlands: Luderick Seagrass Dataset - Test Set Image Patches [Dataset]. http://doi.org/10.5281/zenodo.7659203
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 4, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Scarlett Raine; Scarlett Raine; Ross Marchant; Ross Marchant; Brano Kusy; Brano Kusy; Frederic Maire; Frederic Maire; Tobias Fischer; Tobias Fischer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a test dataset of image patches created from the 'novel-test' split of the Global Wetlands Luderick-Seagrass dataset. The original images were divided as a grid into 50 image patches. The image patches were manually labeled into 'Background', 'Fish' and 'Seagrass' sets. The images were otherwise unaltered.

    We contribute this test dataset of underwater image patches to facilitate evaluation of coarse segmentation seagrass methods.

    Original dataset description: "This dataset comprises of annotated footage of Girella tricuspidata in two estuary systems in South East Queensland, Australia. This data is suitable for a range of classification and object detection research in unconstrained underwater environments."

    Original dataset citation: Ditria, Ellen M; Connolly, Rod M; Jinks, Eric L; Lopez-Marcano, Sebastian (2021): Annotated video footage for automated identification and counting of fish in unconstrained marine environments. PANGAEA, https://doi.org/10.1594/PANGAEA.926930.

    The original dataset is available at:
    https://github.com/globalwetlands/luderick-seagrass
    https://download.pangaea.de/dataset/926930/files/Fish_automated_identification_and_counting.zip
    https://globalwetlands.blob.core.windows.net/globalwetlands-public/datasets/luderick-seagrass/luderick-seagrass.zip

  12. h

    Malawi Spectrum AIM 2025 - Dataset - The Document Management System

    • dms.hiv.health.gov.mw
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Malawi Spectrum AIM 2025 - Dataset - The Document Management System [Dataset]. https://dms.hiv.health.gov.mw/dataset/malawi-spectrum-aims-2025
    Explore at:
    Dataset updated
    Jun 13, 2025
    Area covered
    Malawi
    Description

    To view results of the Malawi 2025 National AIM, download the "Malawi_2025_National_HIV_estimates_Spectrum_AIM_model" digest file from our website and open it within Spectrum. If the HIV Spectrum has not been installed on your machine, download the latest version of the Spectrum app from Avenir Health website on the following link https://avenirhealth.org/Download/Spectrum/SpecInstall.EXE and save it on your hard disk. Spectrum will run on any computer running Windows Vista, 7, 8, 10 or 11. The application requires about 500MB of hard disk space. Once Spectrum is downloaded from the internet, double click on the file named “SpecInstallAIM2021.exe”. This will start the installation program. Follow the instructions on the screen to complete the installation. If you have trouble installing Spectrum you may not have permission to install programs on your computer. In that case, contact your IT office to install Spectrum for you. After installing Spectrum, check your computer to make sure you have Java version 8 installed on your system. The easiest way to determine which version of Java you have is to click on the Windows start menu, select ‘All Apps’, click on Java and select ‘About Java’. If you do not have version 8 (or do not have Java), please download or update the software at www.java.com. Next, make sure that Windows can find Java on your computer. To do this, start Spectrum and open your country file. Select Modules from the Spectrum menu and click the AIM icon to display the AIM menu. Select Incidence and Configuration (EPP). If EPP starts after a few seconds, then you are ready to use EPP. If it does not start, then you need to tell Windows where to find Java. To do that select File and Options. Click the box next to Use custom java.exe to add a check mark. Then click the button Select java.exe. This will open Windows Explorer. You need to indicate the location of the java.exe file. To find it select the C: drive, then click Program Files (x86), then Java, then click the folder for the most recent release of Java, then click bin, and, finally, click java.exe. This location will be saved so that Spectrum will always be able to find Java. If you update your version of Java, you will need to repeat this process to ensure Spectrum has the latest Java location. Upon successful installation of the latest spectrum file, open the Malawi_2025_National_HIV_estimates_Spectrum_AIM_model" digest file from within the Spectrum

  13. h

    Malawi Spectrum AIM 2022 - Dataset - The Document Management System

    • dms.hiv.health.gov.mw
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dms.hiv.health.gov.mw, Malawi Spectrum AIM 2022 - Dataset - The Document Management System [Dataset]. https://dms.hiv.health.gov.mw/dataset/malawi-spectrum-aim-2022
    Explore at:
    Area covered
    Malawi
    Description

    To view results of the Malawi 2022 National AIM, download the "Malawi_2022_National_HIV_estimates_Spectrum_AIM_model" digest file from our website and open it within Spectrum. If the HIV Spectrum has not been installed on your machine, download the latest version of the Spectrum app from Avenir Health website on the following link https://avenirhealth.org/Download/Spectrum/SpecInstall.EXE and save it on your hard disk. Spectrum will run on any computer running Windows Vista, 7, 8, 10 or 11. The application requires about 500MB of hard disk space. Once Spectrum is downloaded from the internet, double click on the file named “SpecInstallAIM2021.exe”. This will start the installation program. Follow the instructions on the screen to complete the installation. If you have trouble installing Spectrum you may not have permission to install programs on your computer. In that case, contact your IT office to install Spectrum for you. After installing Spectrum, check your computer to make sure you have Java version 8 installed on your system. The easiest way to determine which version of Java you have is to click on the Windows start menu, select ‘All Apps’, click on Java and select ‘About Java’. If you do not have version 8 (or do not have Java), please download or update the software at www.java.com. Next, make sure that Windows can find Java on your computer. To do this, start Spectrum and open your country file. Select Modules from the Spectrum menu and click the AIM icon to display the AIM menu. Select Incidence and Configuration (EPP). If EPP starts after a few seconds, then you are ready to use EPP. If it does not start, then you need to tell Windows where to find Java. To do that select File and Options. Click the box next to Use custom java.exe to add a check mark. Then click the button Select java.exe. This will open Windows Explorer. You need to indicate the location of the java.exe file. To find it select the C: drive, then click Program Files (x86), then Java, then click the folder for the most recent release of Java, then click bin, and, finally, click java.exe. This location will be saved so that Spectrum will always be able to find Java. If you update your version of Java, you will need to repeat this process to ensure Spectrum has the latest Java location. Upon successful installation of the latest spectrum file, open the Malawi_2022_National_HIV_estimates_Spectrum_AIM_model" digest file from within the Spectrum app.

  14. i

    Malware Analysis Datasets: Raw PE as Image

    • ieee-dataport.org
    Updated Nov 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angelo Oliveira (2019). Malware Analysis Datasets: Raw PE as Image [Dataset]. https://ieee-dataport.org/open-access/malware-analysis-datasets-raw-pe-image
    Explore at:
    Dataset updated
    Nov 7, 2019
    Authors
    Angelo Oliveira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data: Raw PE byte stream rescaled to a 32 x 32 greyscale image using the Nearest Neighbor Interpolation algorithm and then flattened to a 1024 bytes vector. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.

  15. f

    A Representative User-centric GitHub Developers Dataset for Malicious...

    • figshare.com
    png
    Updated Aug 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yushan Liu (2022). A Representative User-centric GitHub Developers Dataset for Malicious Account Detection [Dataset]. http://doi.org/10.6084/m9.figshare.20325195.v5
    Explore at:
    pngAvailable download formats
    Dataset updated
    Aug 22, 2022
    Dataset provided by
    figshare
    Authors
    Yushan Liu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Using GitHub APIs, we construct an unbiased dataset of over 10 million GitHub users. The data was collected between Jul. 20 and Aug. 27, 2018, covering 10,000 users. Each data entry is stored in JSON format, representing one GitHub user, and containing the descriptive information in the user’s profile page, the information of her commit activities and created/forked public repositories.

    Please cite the following paper when using the dataset: Qingyuan Gong, Jiayun Zhang, Yang Chen, Qi Li, Yu Xiao, Xin Wang, Pan Hui. Detecting Malicious Accounts in Online Developer Communities Using Deep Learning. Proc. of the 28th ACM International Conference on Information and Knowledge Management (CIKM'19), Beijing, China, Nov. 2019.

    Download the file, and decompress the files with the following steps: (1) In MacOSX/Linux systems: (i) open the terminal and switch to the directory where you have downloaded the files (e.g. "cd Downloads/") (ii) run the command: "unzip Github_dataset.zip". (2) In Windows systems: use decompression software (e.g. WinZIP) to decompress the file "Github_dataset.zip".

  16. Microsoft Stock Data 2025

    • kaggle.com
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umer Haddii (2025). Microsoft Stock Data 2025 [Dataset]. https://www.kaggle.com/datasets/umerhaddii/microsoft-stock-data-2025/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Umer Haddii
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Microsoft is an American company that develops and distributes software and services such as: a search engine (Bing), cloud solutions and the computer operating system Windows.

    Market cap

    Market capitalization of Microsoft (MSFT)
    
    Market cap: $3.085 Trillion USD
    

    As of February 2025 Microsoft has a market cap of $3.085 Trillion USD. This makes Microsoft the world's 2nd most valuable company by market cap according to our data. The market capitalization, commonly called market cap, is the total market value of a publicly traded company's outstanding shares and is commonly used to measure how much a company is worth.

    Revenue

    Revenue for Microsoft (MSFT)
    Revenue in 2024 (TTM): $254.19 Billion USD
    

    According to Microsoft's latest financial reports the company's current revenue (TTM ) is $254.19 Billion USD. In 2023 the company made a revenue of $227.58 Billion USD an increase over the revenue in the year 2022 that were of $204.09 Billion USD. The revenue is the total amount of income that a company generates by the sale of goods or services. Unlike with the earnings no expenses are subtracted.

    Earnings

    Earnings for Microsoft (MSFT)
    Earnings in 2024 (TTM): $110.77 Billion USD
    
    

    According to Microsoft's latest financial reports the company's current earnings are $254.19 Billion USD. In 2023 the company made an earning of $101.21 Billion USD, an increase over its 2022 earnings that were of $82.58 Billion USD. The earnings displayed on this page are the earnings before interest and taxes or simply EBIT.

    End of Day market cap according to different sources On Feb 2nd, 2025 the market cap of Microsoft was reported to be:

    • $3.085 Trillion USD by Nasdaq

    • $3.085 Trillion USD by CompaniesMarketCap

    • $3.085 Trillion USD by Yahoo Finance

    Content

    Geography: USA

    Time period: March 1986- February 2025

    Unit of analysis: Microsoft Stock Data 2025

    Variables

    VariableDescription
    datedate
    openThe price at market open.
    highThe highest price for that day.
    lowThe lowest price for that day.
    closeThe price at market close, adjusted for splits.
    adj_closeThe closing price after adjustments for all applicable splits and dividend distributions. Data is adjusted using appropriate split and dividend multipliers, adhering to Center for Research in Security Prices (CRSP) standards.
    volumeThe number of shares traded on that day.

    Acknowledgements

    This dataset belongs to me. I’m sharing it here for free. You may do with it as you wish.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18335022%2F0304ad0416e7e55515daf890288d7f7f%2FScreenshot%202025-02-03%20152019.png?generation=1738662588735376&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18335022%2Fba7629dd0c4dc3e2ea1dbac361b94de1%2FScreenshot%202025-02-03%20152147.png?generation=1738662611945343&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18335022%2Fa9f48f1ec5fdf2a363a138389294d5b0%2FScreenshot%202025-02-03%20152159.png?generation=1738662631268574&alt=media" alt="">

  17. h

    Malawi Spectrum AIM 2024 - Dataset - The Document Management System

    • dms.hiv.health.gov.mw
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dms.hiv.health.gov.mw, Malawi Spectrum AIM 2024 - Dataset - The Document Management System [Dataset]. https://dms.hiv.health.gov.mw/dataset/malawi-spectrum-aim-2024-file
    Explore at:
    Area covered
    Malawi
    Description

    To view results of the Malawi 2024 National AIM, download the "Malawi_2024_National_HIV_estimates_Spectrum_AIM_model" digest file from our website and open it within Spectrum. If the HIV Spectrum has not been installed on your machine, download the latest version of the Spectrum app from Avenir Health website on the following link https://avenirhealth.org/Download/Spectrum/SpecInstall.EXE and save it on your hard disk. Spectrum will run on any computer running Windows Vista, 7, 8, 10 or 11. The application requires about 500MB of hard disk space. Once Spectrum is downloaded from the internet, double click on the file named “SpecInstallAIM2021.exe”. This will start the installation program. Follow the instructions on the screen to complete the installation. If you have trouble installing Spectrum you may not have permission to install programs on your computer. In that case, contact your IT office to install Spectrum for you. After installing Spectrum, check your computer to make sure you have Java version 8 installed on your system. The easiest way to determine which version of Java you have is to click on the Windows start menu, select ‘All Apps’, click on Java and select ‘About Java’. If you do not have version 8 (or do not have Java), please download or update the software at www.java.com. Next, make sure that Windows can find Java on your computer. To do this, start Spectrum and open your country file. Select Modules from the Spectrum menu and click the AIM icon to display the AIM menu. Select Incidence and Configuration (EPP). If EPP starts after a few seconds, then you are ready to use EPP. If it does not start, then you need to tell Windows where to find Java. To do that select File and Options. Click the box next to Use custom java.exe to add a check mark. Then click the button Select java.exe. This will open Windows Explorer. You need to indicate the location of the java.exe file. To find it select the C: drive, then click Program Files (x86), then Java, then click the folder for the most recent release of Java, then click bin, and, finally, click java.exe. This location will be saved so that Spectrum will always be able to find Java. If you update your version of Java, you will need to repeat this process to ensure Spectrum has the latest Java location. Upon successful installation of the latest spectrum file, open the Malawi_2024_National_HIV_estimates_Spectrum_AIM_model" digest file from within the Spectrum

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Anh Pham Tuan; An Tran Hung Phuong; Nguyen Vu Thanh; Toan Nguyen Van (2023). Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.6635642.v1
Organization logoOrganization logo

Malware Detection PE-Based Analysis Using Deep Learning Algorithm Dataset

Explore at:
10 scholarly articles cite this dataset (View in Google Scholar)
application/x-rarAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Anh Pham Tuan; An Tran Hung Phuong; Nguyen Vu Thanh; Toan Nguyen Van
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset contains 8970 malware and 1000 benign binaries files. Malware files which are divided into 5 types: Locker (300), Mediyes (1450), Winwebsec (4400), Zbot (2100), Zeroaccess (690). All of malware files are collected from https://virusshare.com/ and malicia-project.com. Benign excutable files are taken from installed folders of applications of legitimate software from different categories. They can be downloaded in https://download.cnet.com/windows/. All of files are verified by VirusTotal (https://www.virustotal.com) to make sure each file belong to their type. Note: This dataset includes malware so it can harm your computer.

Search
Clear search
Close search
Google apps
Main menu