2 datasets found

s
The CommonCrawl Corpus
marketplace.sshopencloud.eu
Updated Apr 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). The CommonCrawl Corpus [Dataset]. https://marketplace.sshopencloud.eu/dataset/93FNrL
Explore at:
Dataset updated
Apr 24, 2020
Description
The Common Crawl corpus contains petabytes of data collected over 8 years of web crawling. The corpus contains raw web page data, metadata extracts and text extracts. Common Crawl data is stored on Amazon Web Services’ Public Data Sets and on multiple academic cloud platforms across the world.
O
Common Crawl
opendatalab.com
zip
Updated Jan 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institut national de recherche en informatique et en automatique (2019). Common Crawl [Dataset]. https://opendatalab.com/OpenDataLab/Common_Crawl
Explore at:
zipAvailable download formats
Dataset updated
Jan 1, 2019
Dataset provided by
Sorbonne University
Institut national de recherche en informatique et en automatique
License
https://commoncrawl.org/terms-of-use/https://commoncrawl.org/terms-of-use/
Description
The Common Crawl corpus contains petabytes of data collected over 12 years of web crawling. The corpus contains raw web page data, metadata extracts and text extracts. Common Crawl data is stored on Amazon Web Services’ Public Data Sets and on multiple academic cloud platforms across the world.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2020). The CommonCrawl Corpus [Dataset]. https://marketplace.sshopencloud.eu/dataset/93FNrL

The CommonCrawl Corpus

Explore at:

133 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Apr 24, 2020

Description

The Common Crawl corpus contains petabytes of data collected over 8 years of web crawling. The corpus contains raw web page data, metadata extracts and text extracts. Common Crawl data is stored on Amazon Web Services’ Public Data Sets and on multiple academic cloud platforms across the world.

Clear search

Close search

Google apps

Main menu