2 datasets found
  1. Primus-FineWeb

    • huggingface.co
    Updated Aug 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trend Cybertron (Trend Micro) (2025). Primus-FineWeb [Dataset]. https://huggingface.co/datasets/trend-cybertron/Primus-FineWeb
    Explore at:
    Dataset updated
    Aug 9, 2025
    Dataset provided by
    Trend Microhttp://trendmicro.com/
    Authors
    Trend Cybertron (Trend Micro)
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    ⭐ Please download the dataset from here.

      PRIMUS: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
    
    
    
    
    
      🤗 Primus-FineWeb
    

    The Primus-FineWeb dataset is constructed by filtering cybersecurity-related text from FineWeb, a refined version of Common Crawl. We began by leveraging Primus-Seed, a high-quality dataset of manually curated cybersecurity text, as positive samples. We then sampled ten times the amount of data from FineWeb as negative samples… See the full description on the dataset page: https://huggingface.co/datasets/trend-cybertron/Primus-FineWeb.

  2. Primus-FineWeb

    • huggingface.co
    Updated Feb 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trend Micro (AI Lab) (2025). Primus-FineWeb [Dataset]. https://huggingface.co/datasets/trendmicro-ailab/Primus-FineWeb
    Explore at:
    Dataset updated
    Feb 19, 2025
    Dataset provided by
    Trend Microhttp://trendmicro.com/
    Authors
    Trend Micro (AI Lab)
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    PRIMUS: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training

      🤗 Primus-FineWeb
    

    The Primus-FineWeb dataset is constructed by filtering cybersecurity-related text from FineWeb, a refined version of Common Crawl. We began by leveraging Primus-Seed, a high-quality dataset of manually curated cybersecurity text, as positive samples. We then sampled ten times the amount of data from FineWeb as negative samples and trained a binary cybersecurity… See the full description on the dataset page: https://huggingface.co/datasets/trendmicro-ailab/Primus-FineWeb.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Trend Cybertron (Trend Micro) (2025). Primus-FineWeb [Dataset]. https://huggingface.co/datasets/trend-cybertron/Primus-FineWeb
Organization logo

Primus-FineWeb

Primus-FineWeb

trend-cybertron/Primus-FineWeb

Explore at:
Dataset updated
Aug 9, 2025
Dataset provided by
Trend Microhttp://trendmicro.com/
Authors
Trend Cybertron (Trend Micro)
License

https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

Description

⭐ Please download the dataset from here.

  PRIMUS: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training





  🤗 Primus-FineWeb

The Primus-FineWeb dataset is constructed by filtering cybersecurity-related text from FineWeb, a refined version of Common Crawl. We began by leveraging Primus-Seed, a high-quality dataset of manually curated cybersecurity text, as positive samples. We then sampled ten times the amount of data from FineWeb as negative samples… See the full description on the dataset page: https://huggingface.co/datasets/trend-cybertron/Primus-FineWeb.

Search
Clear search
Close search
Google apps
Main menu