2 datasets found
  1. e

    Tweets at the 2014 Jacathon by Aragon Open Data

    • data.europa.eu
    json
    Updated Oct 14, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gobierno de Aragón (2020). Tweets at the 2014 Jacathon by Aragon Open Data [Dataset]. https://data.europa.eu/data/datasets/https-opendata-aragon-es-datos-catalogo-dataset-tweets-en-el-jacathon-2014-de-aragon-open-data
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 14, 2020
    Dataset authored and provided by
    Gobierno de Aragón
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file collects the tweets generated before and after the Jacathon Aragón Open Data. The collection of tweets has been generated based on the listening of a certain number of hashtags related to the Jacathon, specifically the terms jacathon, maddata, medialab, jacaton, datathon, aragonopendata, hackathon, hackaton, opendata, jaca

  2. Literary Authors from Europe and Eurasia Web Archive collection derivatives

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jan 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Ruest; Nick Ruest; Anna Rakityanskaya; Thomas Keenan; Robert Davis; Anna Arays; Samantha Abrams; Anna Rakityanskaya; Thomas Keenan; Robert Davis; Anna Arays; Samantha Abrams (2020). Literary Authors from Europe and Eurasia Web Archive collection derivatives [Dataset]. http://doi.org/10.5281/zenodo.3632728
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 31, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nick Ruest; Nick Ruest; Anna Rakityanskaya; Thomas Keenan; Robert Davis; Anna Arays; Samantha Abrams; Anna Rakityanskaya; Thomas Keenan; Robert Davis; Anna Arays; Samantha Abrams
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Eurasia, Europe
    Description

    Web archive derivatives of the Literary Authors from Europe and Eurasia Web Archive collection from the Ivy Plus Libraries Confederation. The derivatives were created with the Archives Unleashed Toolkit and Archives Unleashed Cloud.

    The ivy-12172-parquet.tar.gz derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples.

    Domains

    .webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc)

    Produces a DataFrame with the following columns:

    • domain
    • count

    Web Pages

    .webpages().select($"crawl_date", $"url", $"mime_type_web_server", $"mime_type_tika", RemoveHTMLDF(RemoveHTTPHeaderDF(($"content"))).alias("content"))

    Produces a DataFrame with the following columns:

    • crawl_date
    • url
    • mime_type_web_server
    • mime_type_tika
    • content

    Web Graph

    .webgraph()

    Produces a DataFrame with the following columns:

    • crawl_date
    • src
    • dest
    • anchor

    Image Links

    .imageLinks()

    Produces a DataFrame with the following columns:

    • src
    • image_url

    Binary Analysis

    • Audio
    • Images
    • PDFs
    • Presentation program files
    • Spreadsheets
    • Text files
    • Word processor files

    The ivy-12172-auk.tar.gz derivatives are the standard set of web archive derivatives produced by the Archives Unleashed Cloud.

    • Gephi file, which can be loaded into Gephi. It will have basic characteristics already computed and a basic layout.
    • Raw Network file, which can also be loaded into Gephi. You will have to use that network program to lay it out yourself.
    • Full text file. In it, each website within the web archive collection will have its full text presented on one line, along with information around when it was crawled, the name of the domain, and the full URL of the content.
    • Domains count file. A text file containing the frequency count of domains captured within your web archive.
  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gobierno de Aragón (2020). Tweets at the 2014 Jacathon by Aragon Open Data [Dataset]. https://data.europa.eu/data/datasets/https-opendata-aragon-es-datos-catalogo-dataset-tweets-en-el-jacathon-2014-de-aragon-open-data

Tweets at the 2014 Jacathon by Aragon Open Data

Explore at:
jsonAvailable download formats
Dataset updated
Oct 14, 2020
Dataset authored and provided by
Gobierno de Aragón
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This file collects the tweets generated before and after the Jacathon Aragón Open Data. The collection of tweets has been generated based on the listening of a certain number of hashtags related to the Jacathon, specifically the terms jacathon, maddata, medialab, jacaton, datathon, aragonopendata, hackathon, hackaton, opendata, jaca

Search
Clear search
Close search
Google apps
Main menu