Common Crawl Dataset

From GM-RKB
(Redirected from Common Crawl dataset)
Jump to navigation Jump to search

A Common Crawl Dataset is a web snapshot created by a Common Crawl Foundation.



References

2014

   [ARC]  s3://aws-publicdatasets/common-crawl/crawl-001/ - Crawl #1 (2008/2009)
   [ARC]  s3://aws-publicdatasets/common-crawl/crawl-002/ - Crawl #2 (2009/2010)
   [ARC]  s3://aws-publicdatasets/common-crawl/parse-output/ - Crawl #3 (2012)
   [WARC] s3://aws-publicdatasets/common-crawl/crawl-data/CC-MAIN-2013-20/ - Summer 2013
   [WARC] s3://aws-publicdatasets/common-crawl/crawl-data/CC-MAIN-2013-48/ - Winter 2013
   [WARC] s3://aws-publicdatasets/common-crawl/crawl-data/CC-MAIN-2014-10/ - March 2014
   [WARC] s3://aws-publicdatasets/common-crawl/crawl-data/CC-MAIN-2014-15/ - April 2014
   [WARC] s3://aws-publicdatasets/common-crawl/crawl-data/CC-MAIN-2014-23/ - July 2014
   [WARC] s3://aws-publicdatasets/common-crawl/crawl-data/CC-MAIN-2014-35/ - August 2014