2 datasets found
  1. s

    The CommonCrawl Corpus

    • marketplace.sshopencloud.eu
    Updated Apr 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). The CommonCrawl Corpus [Dataset]. https://marketplace.sshopencloud.eu/dataset/93FNrL
    Explore at:
    Dataset updated
    Apr 24, 2020
    Description

    The Common Crawl corpus contains petabytes of data collected over 8 years of web crawling. The corpus contains raw web page data, metadata extracts and text extracts. Common Crawl data is stored on Amazon Web Services’ Public Data Sets and on multiple academic cloud platforms across the world.

  2. P

    Common Crawl Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Oct 8, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Common Crawl Dataset [Dataset]. https://paperswithcode.com/dataset/common-crawl
    Explore at:
    Dataset updated
    Oct 8, 2014
    Description

    The Common Crawl corpus contains petabytes of data collected over 12 years of web crawling. The corpus contains raw web page data, metadata extracts and text extracts. Common Crawl data is stored on Amazon Web Services’ Public Data Sets and on multiple academic cloud platforms across the world.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2020). The CommonCrawl Corpus [Dataset]. https://marketplace.sshopencloud.eu/dataset/93FNrL

The CommonCrawl Corpus

Explore at:
138 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 24, 2020
Description

The Common Crawl corpus contains petabytes of data collected over 8 years of web crawling. The corpus contains raw web page data, metadata extracts and text extracts. Common Crawl data is stored on Amazon Web Services’ Public Data Sets and on multiple academic cloud platforms across the world.

Search
Clear search
Close search
Google apps
Main menu