Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file collects the tweets generated before and after the Jacathon Aragón Open Data. The collection of tweets has been generated based on the listening of a certain number of hashtags related to the Jacathon, specifically the terms jacathon, maddata, medialab, jacaton, datathon, aragonopendata, hackathon, hackaton, opendata, jaca
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Web archive derivatives of the Literary Authors from Europe and Eurasia Web Archive collection from the Ivy Plus Libraries Confederation. The derivatives were created with the Archives Unleashed Toolkit and Archives Unleashed Cloud.
The ivy-12172-parquet.tar.gz derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples.
Domains
.webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc)
Produces a DataFrame with the following columns:
Web Pages
.webpages().select($"crawl_date", $"url", $"mime_type_web_server", $"mime_type_tika", RemoveHTMLDF(RemoveHTTPHeaderDF(($"content"))).alias("content"))
Produces a DataFrame with the following columns:
Web Graph
.webgraph()
Produces a DataFrame with the following columns:
Image Links
.imageLinks()
Produces a DataFrame with the following columns:
The ivy-12172-auk.tar.gz derivatives are the standard set of web archive derivatives produced by the Archives Unleashed Cloud.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file collects the tweets generated before and after the Jacathon Aragón Open Data. The collection of tweets has been generated based on the listening of a certain number of hashtags related to the Jacathon, specifically the terms jacathon, maddata, medialab, jacaton, datathon, aragonopendata, hackathon, hackaton, opendata, jaca