2 datasets found

s
The CommonCrawl Corpus
marketplace.sshopencloud.eu
Updated Apr 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). The CommonCrawl Corpus [Dataset]. https://marketplace.sshopencloud.eu/dataset/93FNrL
Explore at:
Dataset updated
Apr 24, 2020
Description
The Common Crawl corpus contains petabytes of data collected over 8 years of web crawling. The corpus contains raw web page data, metadata extracts and text extracts. Common Crawl data is stored on Amazon Web Services’ Public Data Sets and on multiple academic cloud platforms across the world.
P
Common Crawl Dataset
paperswithcode.com
opendatalab.com
Updated Oct 8, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Common Crawl Dataset [Dataset]. https://paperswithcode.com/dataset/common-crawl
Explore at:
Dataset updated
Oct 8, 2014
Description
The Common Crawl corpus contains petabytes of data collected over 12 years of web crawling. The corpus contains raw web page data, metadata extracts and text extracts. Common Crawl data is stored on Amazon Web Services’ Public Data Sets and on multiple academic cloud platforms across the world.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2020). The CommonCrawl Corpus [Dataset]. https://marketplace.sshopencloud.eu/dataset/93FNrL

The CommonCrawl Corpus

Explore at:

138 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Apr 24, 2020

Description

The Common Crawl corpus contains petabytes of data collected over 8 years of web crawling. The corpus contains raw web page data, metadata extracts and text extracts. Common Crawl data is stored on Amazon Web Services’ Public Data Sets and on multiple academic cloud platforms across the world.

Clear search

Close search

Google apps

Main menu

The CommonCrawl Corpus

Common Crawl Dataset

The CommonCrawl Corpus