27 datasets found
  1. MOESM1 of Wikipedia traffic data and electoral prediction: towards...

    • springernature.figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taha Yasseri; Jonathan Bright (2023). MOESM1 of Wikipedia traffic data and electoral prediction: towards theoretically informed models [Dataset]. http://doi.org/10.6084/m9.figshare.c.3698467_D1.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Taha Yasseri; Jonathan Bright
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Party List: A table containing countries, name of the parties (English and local), election dates, party abbreviations, election vote share, change in the vote share from the previous election, number of news mentions, and the link to the Wikipedia page. (csv)

  2. a

    Wikilinks: A Large-scale Cross-Document Coreference Corpus Labeled via Links...

    • academictorrents.com
    bittorrent
    Updated Mar 4, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sameer Singh and Amarnag Subramanya and Fernando Pereira and Andrew McCallum (2017). Wikilinks: A Large-scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia (Extended Dataset) [Dataset]. https://academictorrents.com/details/689af6f153e097538ad7b8fd4ea3e87ce8f6bc42
    Explore at:
    bittorrentAvailable download formats
    Dataset updated
    Mar 4, 2017
    Dataset authored and provided by
    Sameer Singh and Amarnag Subramanya and Fernando Pereira and Andrew McCallum
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Cross-document coreference resolution is the task of grouping the entity mentions in a collection of documents into sets that each represent a distinct entity. It is central to knowledge base construction and also useful for joint inference with other NLP components. Obtaining large, organic labeled datasets for training and testing cross-document coreference has previously been difficult. We use a method for automatically gathering massive amounts of naturally-occurring cross-document reference data to create the Wikilinks dataset comprising of 40 million mentions over 3 million entities. Our method is based on finding hyperlinks to Wikipedia from a web crawl and using anchor text as mentions. In addition to providing large-scale labeled data without human effort, we are able to include many styles of text beyond newswire and many entity types beyond people. ### Introduction The Wikipedia links (WikiLinks) data consists of web pages that satisfy the following two constraints: a. conta

  3. f

    Predicted relative errors.

    • figshare.com
    xls
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Predicted relative errors. [Dataset]. https://figshare.com/articles/dataset/Predicted_relative_errors_/14894022/1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Anna Tovo; Samuele Stivanello; Amos Maritan; Samir Suweis; Stefano Favaro; Marco Formentin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Upscaling results for the number of species of the four analysed datasets from local samples covering a fraction p* = 5% of the corresponding global dataset. For each human activity, we display the number of species (users, hashtags, words) and individuals (sent mails, posts, occurrences) at the global scale together with the average fitted RSA distribution parameters at the sampled scale and the relative percentage error (mean and standard deviation among 100 trials) between the true number of species and the one predicted by our framework. See S1 Fig in S1 Appendix for the corresponding fitting curves and predicted global RSA patterns.

  4. Data from: WikiHist.html: English Wikipedia's Full Revision History in HTML...

    • zenodo.org
    application/gzip, zip
    Updated Jun 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blagoj Mitrevski; Tiziano Piccardi; Tiziano Piccardi; Robert West; Robert West; Blagoj Mitrevski (2020). WikiHist.html: English Wikipedia's Full Revision History in HTML Format [Dataset]. http://doi.org/10.5281/zenodo.3605388
    Explore at:
    application/gzip, zipAvailable download formats
    Dataset updated
    Jun 8, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Blagoj Mitrevski; Tiziano Piccardi; Tiziano Piccardi; Robert West; Robert West; Blagoj Mitrevski
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Introduction

    Wikipedia is written in the wikitext markup language. When serving content, the MediaWiki software that powers Wikipedia parses wikitext to HTML, thereby inserting additional content by expanding macros (templates and modules). Hence, researchers who intend to analyze Wikipedia as seen by its readers should work with HTML, rather than wikitext. Since Wikipedia’s revision history is made publicly available by the Wikimedia Foundation exclusively in wikitext format, researchers have had to produce HTML themselves, typically by using Wikipedia’s REST API for ad-hoc wikitext-to-HTML parsing. This approach, however, (1) does not scale to very large amounts of data and (2) does not correctly expand macros in historical article revisions.

    We have solved these problems by developing a parallelized architecture for parsing massive amounts of wikitext using local instances of MediaWiki, enhanced with the capacity of correct historical macro expansion. By deploying our system, we produce and hereby release WikiHist.html, English Wikipedia’s full revision history in HTML format. It comprises the HTML content of 580M revisions of 5.8M articles generated from the full English Wikipedia history spanning 18 years from 1 January 2001 to 1 March 2019. Boilerplate content such as page headers, footers, and navigation sidebars are not included in the HTML.

    For more details, please refer to the description below and to the dataset paper:
    Blagoj Mitrevski, Tiziano Piccardi, and Robert West: WikiHist.html: English Wikipedia’s Full Revision History in HTML Format. In Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020.
    https://arxiv.org/abs/2001.10256

    When using the dataset, please cite the above paper.

    Dataset summary

    The dataset consists of three parts:

    1. English Wikipedia’s full revision history parsed to HTML,
    2. a table of the creation times of all Wikipedia pages (page_creation_times.json.gz),
    3. a table that allows for resolving redirects for any point in time (redirect_history.json.gz).

    Part 1 is our main contribution, while parts 2 and 3 contain complementary information that can aid researchers in their analyses.

    Getting the data

    Parts 2 and 3 are hosted in this Zenodo repository. Part 1 is 7TB large -- too large for Zenodo -- and is therefore hosted externally on the Internet Archive. For downloading part 1, you have multiple options:

    Dataset details

    Part 1: HTML revision history
    The data is split into 558 directories, named enwiki-20190301-pages-meta-history$1.xml-p$2p$3, where $1 ranges from 1 to 27, and p$2p$3 indicates that the directory contains revisions for pages with ids between $2 and $3. (This naming scheme directly mirrors that of the wikitext revision history from which WikiHist.html was derived.) Each directory contains a collection of gzip-compressed JSON files, each containing 1,000 HTML article revisions. Each row in the gzipped JSON files represents one article revision. Rows are sorted by page id, and revisions of the same page are sorted by revision id. We include all revision information from the original wikitext dump, the only difference being that we replace the revision’s wikitext content with its parsed HTML version (and that we store the data in JSON rather than XML):

    • id: id of this revision
    • parentid: id of revision modified by this revision
    • timestamp: time when revision was made
    • cont_username: username of contributor
    • cont_id: id of contributor
    • cont_ip: IP address of contributor
    • comment: comment made by contributor
    • model: content model (usually "wikitext")
    • format: content format (usually "text/x-wiki")
    • sha1: SHA-1 hash
    • title: page title
    • ns: namespace (always 0)
    • page_id: page id
    • redirect_title: if page is redirect, title of target page
    • html: revision content in HTML format

    Part 2: Page creation times (page_creation_times.json.gz)

    This JSON file specifies the creation time of each English Wikipedia page. It can, e.g., be used to determine if a wiki link was blue or red at a specific time in the past. Format:

    • page_id: page id
    • title: page title
    • ns: namespace (0 for articles)
    • timestamp: time when page was created

    Part 3: Redirect history (redirect_history.json.gz)

    This JSON file specifies all revisions corresponding to redirects, as well as the target page to which the respective page redirected at the time of the revision. This information is useful for reconstructing Wikipedia's link network at any time in the past. Format:

    • page_id: page id of redirect source
    • title: page title of redirect source
    • ns: namespace (0 for articles)
    • revision_id: revision id of redirect source
    • timestamp: time at which redirect became active
    • redirect: page title of redirect target (in 1st item of array; 2nd item can be ignored)

    The repository also contains two additional files, metadata.zip and mysql_database.zip. These two files are not part of WikiHist.html per se, and most users will not need to download them manually. The file metadata.zip is required by the download script (and will be fetched by the script automatically), and mysql_database.zip is required by the code used to produce WikiHist.html. The code that uses these files is hosted at GitHub, but the files are too big for GitHub and are therefore hosted here.

    WikiHist.html was produced by parsing the 1 March 2019 dump of https://dumps.wikimedia.org/enwiki/20190301 from wikitext to HTML. That old dump is not available anymore on Wikimedia's servers, so we make a copy available at https://archive.org/details/enwiki-20190301-original-full-history-dump_dlab .

  5. WikiMed and PubMedDS: Two large-scale datasets for medical concept...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Dec 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shikhar Vashishth; Shikhar Vashishth; Denis Newman-Griffis; Denis Newman-Griffis; Rishabh Joshi; Ritam Dutt; Carolyn P Rosé; Rishabh Joshi; Ritam Dutt; Carolyn P Rosé (2021). WikiMed and PubMedDS: Two large-scale datasets for medical concept extraction and normalization research [Dataset]. http://doi.org/10.5281/zenodo.5753476
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 4, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shikhar Vashishth; Shikhar Vashishth; Denis Newman-Griffis; Denis Newman-Griffis; Rishabh Joshi; Ritam Dutt; Carolyn P Rosé; Rishabh Joshi; Ritam Dutt; Carolyn P Rosé
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Two large-scale, automatically-created datasets of medical concept mentions, linked to the Unified Medical Language System (UMLS).

    WikiMed

    Derived from Wikipedia data. Mappings of Wikipedia page identifiers to UMLS Concept Unique Identifiers (CUIs) was extracted by crosswalking Wikipedia, Wikidata, Freebase, and the NCBI Taxonomy to reach existing mappings to UMLS CUIs. This created a 1:1 mapping of approximately 60,500 Wikipedia pages to UMLS CUIs. Links to these pages were then extracted as mentions of the corresponding UMLS CUIs.

    WikiMed contains:

    • 393,618 Wikipedia page texts
    • 1,067,083 mentions of medical concepts
    • 57,739 unique UMLS CUIs

    Manual evaluation of 100 random samples of WikiMed found 91% accuracy in the automatic annotations at the level of UMLS CUIs, and 95% accuracy in terms of semantic type.

    PubMedDS

    Derived from biomedical literature abstracts from PubMed. Mentions were automatically identified using distant supervision based on Medical Subject Heading (MeSH) headers assigned to the papers in PubMed, and recognition of medical concept mentions using the high-performance scispaCy model. MeSH header codes are included as well as their mappings to UMLS CUIs.

    PubMedDS contains:

    • 13,197,430 abstract texts
    • 57,943,354 medical concept mentions
    • 44,881 unique UMLS CUIs

    Comparison with existing manually-annotated datasets (NCBI Disease Corpus, BioCDR, and MedMentions) found 75-90% precision in automatic annotations. Please note this dataset is not a comprehensive annotation of medical concept mentions in these abstracts (only mentions located through distant supervision from MeSH headers were included), but is intended as data for concept normalization research.

    Due to its size, PubMedDS is distributed as 30 individual files of approximately 1.5 million mentions each.

    Data format

    Both datasets use JSON format with one document per line. Each document has the following structure:

    {
      "_id": "A unique identifier of each document",
      "text": "Contains text over which mentions are ",
      "title": "Title of Wikipedia/PubMed Article",
      "split": "[Not in PubMedDS] Dataset split: 

  6. P

    WIT Dataset

    • paperswithcode.com
    • huggingface.co
    Updated Jun 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krishna Srinivasan; Karthik Raman; Jiecao Chen; Michael Bendersky; Marc Najork (2023). WIT Dataset [Dataset]. https://paperswithcode.com/dataset/wit
    Explore at:
    Dataset updated
    Jun 14, 2023
    Authors
    Krishna Srinivasan; Karthik Raman; Jiecao Chen; Michael Bendersky; Marc Najork
    Description

    Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages. Its size enables WIT to be used as a pretraining dataset for multimodal machine learning models.

    Key Advantages

    A few unique advantages of WIT:

    The largest multimodal dataset (time of this writing) by the number of image-text examples. A massively multilingual (first of its kind) with coverage for over 100+ languages. A collection of diverse set of concepts and real world entities. Brings forth challenging real-world test sets.

  7. Z

    Single Ground Based AIS Receiver Vessel Tracking Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vodas M. (2021). Single Ground Based AIS Receiver Vessel Tracking Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3754480
    Explore at:
    Dataset updated
    Apr 19, 2021
    Dataset provided by
    Zissis D.
    Vodas M.
    Tserpes K.
    Spiliopoulos G.
    Kontopoulos I.
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Nowadays, a multitude of tracking systems produce massive amounts of maritime data on a daily basis. The most commonly used is the Automatic Identification System (AIS), a collaborative, self-reporting system that allows vessels to broadcast their identification information, characteristics and destination, along with other information originating from on-board devices and sensors, such as location, speed and heading. AIS messages are broadcast periodically and can be received by other vessels equipped with AIS transceivers, as well as by on the ground or satellite-based sensors.

    Since becoming obligatory by the International Maritime Organisation (IMO) for vessels above 300 gross tonnage to carry AIS transponders, large datasets are gradually becoming available and are now being considered as a valid method for maritime intelligence [4].There is now a growing body of literature on methods of exploiting AIS data for safety and optimisation of seafaring, namely traffic analysis, anomaly detection, route extraction and prediction, collision detection, path planning, weather routing, etc., [5].

    As the amount of available AIS data grows to massive scales, researchers are realising that computational techniques must contend with difficulties faced when acquiring, storing, and processing the data. Traditional information systems are incapable of dealing with such firehoses of spatiotemporal data where they are required to ingest thousands of data units per second, while performing sub-second query response times.

    Processing streaming data seems to exhibit similar characteristics with other big data challenges, such as handling high data volumes and complex data types. While for many applications, big data batch processing techniques are sufficient, for applications such as navigation and others, timeliness is a top priority; making the right decision steering a vessel away from danger, is only useful if it is a decision made in due time. The true challenge lies in the fact that, in order to satisfy real-time application needs, high velocity, unbounded sized data needs to be processed in constraint, in relation to the data size and finite memory. Research on data streams is gaining attention as a subset of the more generic Big Data research field.

    Research on such topics requires an uncompressed unclean dataset similar to what would be collected in real world conditions. This dataset contains all decoded messages collected within a 24h period (starting from 29/02/2020 10PM UTC) from a single receiver located near the port of Piraeus (Greece). All vessels identifiers such as IMO and MMSI have been anonymised and no down-sampling procedure, filtering or cleaning has been applied.

    The schema of the dataset is provided below:

    · t: the time at which the message was received (UTC)

    · shipid: the anonymized id of the ship

    · lon: the longitude of the current ship position

    · lat: the latitude of the current ship position

    · heading: (see: https://en.wikipedia.org/wiki/Course_(navigation))

    · course: the direction in which the ship moves (see: https://en.wikipedia.org/wiki/Course_(navigation))

    · speed: the speed of the ship (measured in knots)

    · shiptype: AIS reported ship-type

    · destination: AIS reported destination

  8. A

    ‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-list-of-top-data-breaches-2004-2021-e7ac/latest
    Explore at:
    Dataset updated
    Sep 9, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘List of Top Data Breaches (2004 - 2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hishaamarmghan/list-of-top-data-breaches-2004-2021 on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    This is a dataset containing all the major data breaches in the world from 2004 to 2021

    As we know, there is a big issue related to the privacy of our data. Many major companies in the world still to this day face this issue every single day. Even with a great team of people working on their security, many still suffer. In order to tackle this situation, it is only right that we must study this issue in great depth and therefore I pulled this data from Wikipedia to conduct data analysis. I would encourage others to take a look at this as well and find as many insights as possible.

    This data contains 5 columns: 1. Entity: The name of the company, organization or institute 2. Year: In what year did the data breach took place 3. Records: How many records were compromised (can include information like email, passwords etc.) 4. Organization type: Which sector does the organization belong to 5. Method: Was it hacked? Were the files lost? Was it an inside job?

    Here is the source for the dataset: https://en.wikipedia.org/wiki/List_of_data_breaches

    Here is the GitHub link for a guide on how it was scraped: https://github.com/hishaamarmghan/Data-Breaches-Scraping-Cleaning

    --- Original source retains full ownership of the source dataset ---

  9. Bitcoin Blockchain Historical Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Bitcoin Blockchain Historical Data [Dataset]. https://www.kaggle.com/bigquery/bitcoin-blockchain
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Blockchain technology, first implemented by Satoshi Nakamoto in 2009 as a core component of Bitcoin, is a distributed, public ledger recording transactions. Its usage allows secure peer-to-peer communication by linking blocks containing hash pointers to a previous block, a timestamp, and transaction data. Bitcoin is a decentralized digital currency (cryptocurrency) which leverages the Blockchain to store transactions in a distributed manner in order to mitigate against flaws in the financial industry.

    Nearly ten years after its inception, Bitcoin and other cryptocurrencies experienced an explosion in popular awareness. The value of Bitcoin, on the other hand, has experienced more volatility. Meanwhile, as use cases of Bitcoin and Blockchain grow, mature, and expand, hype and controversy have swirled.

    Content

    In this dataset, you will have access to information about blockchain blocks and transactions. All historical data are in the bigquery-public-data:crypto_bitcoin dataset. It’s updated it every 10 minutes. The data can be joined with historical prices in kernels. See available similar datasets here: https://www.kaggle.com/datasets?search=bitcoin.

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_bitcoin.[TABLENAME]. Fork this kernel to get started.

    Method & Acknowledgements

    Allen Day (Twitter | Medium), Google Cloud Developer Advocate & Colin Bookman, Google Cloud Customer Engineer retrieve data from the Bitcoin network using a custom client available on GitHub that they built with the bitcoinj Java library. Historical data from the origin block to 2018-01-31 were loaded in bulk to two BigQuery tables, blocks_raw and transactions. These tables contain fresh data, as they are now appended when new blocks are broadcast to the Bitcoin network. For additional information visit the Google Cloud Big Data and Machine Learning Blog post "Bitcoin in BigQuery: Blockchain analytics on public data".

    Photo by Andre Francois on Unsplash.

    Inspiration

    • How many bitcoins are sent each day?
    • How many addresses receive bitcoin each day?
    • Compare transaction volume to historical prices by joining with other available data sources
  10. Replication dataset and calculations for PIIE WP 16-8, Large Depreciations:...

    • piie.com
    Updated May 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José De Gregorio (2016). Replication dataset and calculations for PIIE WP 16-8, Large Depreciations: Recent Experience in Historical Perspective, by José De Gregorio. (2016). [Dataset]. https://www.piie.com/publications/working-papers/large-depreciations-recent-experience-historical-perspective
    Explore at:
    Dataset updated
    May 16, 2016
    Dataset provided by
    Peterson Institute for International Economicshttp://www.piie.com/
    Authors
    José De Gregorio
    Description

    This data package includes the underlying data and files to replicate the calculations, charts, and tables presented in Large Depreciations: Recent Experience in Historical Perspective, PIIE Working Paper 16-8. If you use the data, please cite as: De Gregorio, José. (2016). Large Depreciations: Recent Experience in Historical Perspective. PIIE Working Paper 16-8. Peterson Institute for International Economics.

  11. BLM CO Closed to Fluid Mineral Leasing

    • catalog.data.gov
    • gbp-blm-egis.hub.arcgis.com
    Updated Nov 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Land Management (2024). BLM CO Closed to Fluid Mineral Leasing [Dataset]. https://catalog.data.gov/dataset/blm-co-closed-to-fluid-mineral-leasing
    Explore at:
    Dataset updated
    Nov 20, 2024
    Dataset provided by
    Bureau of Land Managementhttp://www.blm.gov/
    Description

    These land parcels managed by BLM have been designated as "closed to fluid mineral leasing" per the individual field office plan for the area. This data was compiled for the Big Game Corridor Planning effort, but may be used as a statewide representation for areas closed to leasing

  12. Data from: Big Tom

    • wikipedia.tr-tr.nina.az
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    www.wikipedia.tr-tr.nina.az (2025). Big Tom [Dataset]. https://www.wikipedia.tr-tr.nina.az/Big_Tom.html
    Explore at:
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    Vikipedi//www.wikipedia.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bu madde öksüz maddedir zira herhangi bir maddeden bu maddeye verilmiş bir bağlantı yoktur Lütfen ilgili maddelerd

  13. A Large-Scale AIS Datset from Finnish Water

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Debayan Bhattacharya; Debayan Bhattacharya; Ikram Ul Haq; Carlos Pichardo Vicencio; Sebastien Lafond; Sebastien Lafond; Ikram Ul Haq; Carlos Pichardo Vicencio (2024). A Large-Scale AIS Datset from Finnish Water [Dataset]. http://doi.org/10.5281/zenodo.8112336
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Debayan Bhattacharya; Debayan Bhattacharya; Ikram Ul Haq; Carlos Pichardo Vicencio; Sebastien Lafond; Sebastien Lafond; Ikram Ul Haq; Carlos Pichardo Vicencio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The proposed AIS dataset encompasses a substantial temporal span of 20 months, spanning from April 2021 to December 2022. This extensive coverage period empowers analysts to examine long-term trends and variations in vessel activities. Moreover, it facilitates researchers in comprehending the potential influence of external factors, including weather patterns, seasonal variations, and economic conditions, on vessel traffic and behavior within the Finnish waters.

    This dataset encompasses an extensive array of data pertaining to vessel movements and activities encompassing seas, rivers, and lakes. Anticipated to be comprehensive in nature, the dataset encompasses a diverse range of ship types, such as cargo ships, tankers, fishing vessels, passenger ships, and various other categories.

    The AIS dataset exhibits a prominent attribute in the form of its exceptional granularity with a total of 2 293 129 345 data points. The provision of such granular information proves can help analysts to comprehend vessel dynamics and operations within the Finnish waters. It enables the identification of patterns and anomalies in vessel behavior and facilitates an assessment of the potential environmental implications associated with maritime activities.

    Please cite the following publication when using the dataset:

    TBD

    The publication is available at: TBD

    A preprint version of the publication is available at TBD

    csv file structure

    YYYY-MM-DD-location.csv

    This file contains the received AIS position reports. The structure of the logged parameters is the following: [timestamp, timestampExternal, mmsi, lon, lat, sog, cog, navStat, rot, posAcc, raim, heading]

    timestamp I beleive this is the UTC second when the report was generated by the electronic position system (EPFS) (0-59, or 60 if time stamp is not available, which should also be the default value, or 61 if positioning system is in manual input mode, or 62 if electronic position fixing system operates in estimated (dead reckoning) mode, or 63 if the positioning system is inoperative).

    timestampExternal The timestamp associated with the MQTT message received from www.digitraffic.fi. It is assumed this timestamp is the Epoch time corresponding to when the AIS message was received by digitraffic.fi.

    mmsi MMSI number, Maritime Mobile Service Identity (MMSI) is a unique 9 digit number that is assigned to a (Digital Selective Calling) DSC radio or an AIS unit. Check https://en.wikipedia.org/wiki/Maritime_Mobile_Service_Identity

    lon Longitude, Longitude in 1/10 000 min (+/-180 deg, East = positive (as per 2's complement), West = negative (as per 2's complement). 181= (6791AC0h) = not available = default)

    lat Latitude, Latitude in 1/10 000 min (+/-90 deg, North = positive (as per 2's complement), South = negative (as per 2's complement). 91deg (3412140h) = not available = default)

    sog Speed over ground in 1/10 knot steps (0-102.2 knots) 1 023 = not available, 1 022 = 102.2 knots or higher

    cog Course over ground in 1/10 = (0-3599). 3600 (E10h) = not available = default. 3 601-4 095 should not be used

    navStat Navigational status, 0 = under way using engine, 1 = at anchor, 2 = not under command, 3 = restricted maneuverability, 4 = constrained by her draught, 5 = moored, 6 = aground, 7 = engaged in fishing, 8 = under way sailing, 9 = reserved for future amendment of navigational status for ships carrying DG, HS, or MP, or IMO hazard or pollutant category C, high speed craft (HSC), 10 = reserved for future amendment of navigational status for ships carrying dangerous goods (DG), harmful substances (HS) or marine pollutants (MP), or IMO hazard or pollutant category A, wing in ground (WIG); 11 = power-driven vessel towing astern (regional use); 12 = power-driven vessel pushing ahead or towing alongside (regional use); 13 = reserved for future use, 14 = AIS-SART (active), MOB-AIS, EPIRB-AIS 15 = undefined = default (also used by AIS-SART, MOB-AIS and EPIRB-AIS under test)

    rot ROTAIS Rate of turn

    • 0 to +126 = turning right at up to 708 deg per min or higher
    • 0 to -126 = turning left at up to 708 deg per min or higher
    • Values between 0 and 708 deg per min coded by ROTAIS = 4.733 SQRT(ROTsensor) degrees per min where ROTsensor is the Rate of Turn as input by an external Rate of Turn Indicator (TI). ROTAIS is rounded to the nearest integer value.
    • +127 = turning right at more than 5 deg per 30 s (No TI available)
    • -127 = turning left at more than 5 deg per 30 s (No TI available)
    • -128 (80 hex) indicates no turn information available (default).

    ROT data should not be derived from COG information.

    posAcc Position accuracy, The position accuracy (PA) flag should be determined in accordance with the table below:

    • 1 = high (<= 10 m)
    • 0 = low (> 10 m)
    • 0 = default

    See https://www.navcen.uscg.gov/?pageName=AISMessagesA#RAIM

    raim RAIM-flag Receiver autonomous integrity monitoring (RAIM) flag of electronic position fixing device; 0 = RAIM not in use = default; 1 = RAIM in use. See Table https://www.navcen.uscg.gov/?pageName=AISMessagesA#RAIM

    Check https://en.wikipedia.org/wiki/Receiver_autonomous_integrity_monitoring

    heading True heading, Degrees (0-359) (511 indicates not available = default)

    YYYY-MM-DD-metadata.csv

    This file contains the received AIS metadata: the ship static and voyage related data. The structure of the logged parameters is the following: [timestamp, destination, mmsi, callSign, imo, shipType, draught, eta, posType, pointA, pointB, pointC, pointD, name]

    timestamp The timestamp associated with the MQTT message received from www.digitraffic.fi. It is assumed this timestamp is the Epoch time corresponding to when the AIS message was received by digitraffic.fi.

    destination Maximum 20 characters using 6-bit ASCII; @@@@@@@@@@@@@@@@@@@@ = not available For SAR aircraft, the use of this field may be decided by the responsible administration

    mmsi MMSI number, Maritime Mobile Service Identity (MMSI) is a unique 9 digit number that is assigned to a (Digital Selective Calling) DSC radio or an AIS unit. Check https://en.wikipedia.org/wiki/Maritime_Mobile_Service_Identity

    callSign 7?=?6 bit ASCII characters, @@@@@@@ = not available = default Craft associated with a parent vessel, should use “A” followed by the last 6 digits of the MMSI of the parent vessel. Examples of these craft include towed vessels, rescue boats, tenders, lifeboats and liferafts.

    imo 0 = not available = default – Not applicable to SAR aircraft

    • 0000000001-0000999999 not used
    • 0001000000-0009999999 = valid IMO number;
    • 0010000000-1073741823 = official flag state number.

    Check: https://en.wikipedia.org/wiki/IMO_number

    shipType

    • 0 = not available or no ship = default
    • 1-99 = as defined below
    • 100-199 = reserved, for regional use
    • 200-255 = reserved, for future use Not applicable to SAR aircraft

    Check https://www.navcen.uscg.gov/pdf/AIS/AISGuide.pdf and https://www.navcen.uscg.gov/?pageName=AISMessagesAStatic

    draught In 1/10 m, 255 = draught 25.5 m or greater, 0 = not available = default; in accordance with IMO Resolution A.851 Not applicable to SAR aircraft, should be set to 0

    eta Estimated time of arrival; MMDDHHMM UTC

    • Bits 19-16: month; 1-12; 0 = not available = default
    • Bits 15-11: day; 1-31; 0 = not available = default
    • Bits 10-6: hour; 0-23; 24 = not available = default
    • Bits 5-0: minute; 0-59; 60 = not available = default

    For SAR aircraft, the use of this field may be decided by the responsible administration

    posType Type of electronic position fixing device

    • 0 = undefined (default)
    • 1 = GPS
    • 2 = GLONASS
    • 3 = combined

  14. Novel data streams used for capturing public reaction to Zika epidemic...

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicola Luigi Bragazzi; Cristiano Alicino; Cecilia Trucchi; Chiara Paganino; Ilaria Barberis; Mariano Martini; Laura Sticchi; Eugen Trinka; Francesco Brigo; Filippo Ansaldi; Giancarlo Icardi; Andrea Orsi (2023). Novel data streams used for capturing public reaction to Zika epidemic outbreak. [Dataset]. http://doi.org/10.1371/journal.pone.0185263.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nicola Luigi Bragazzi; Cristiano Alicino; Cecilia Trucchi; Chiara Paganino; Ilaria Barberis; Mariano Martini; Laura Sticchi; Eugen Trinka; Francesco Brigo; Filippo Ansaldi; Giancarlo Icardi; Andrea Orsi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Novel data streams used for capturing public reaction to Zika epidemic outbreak.

  15. Large Case List

    • data.wu.ac.at
    • data.gov.uk
    • +1more
    Updated Feb 10, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Her Majesty's Revenue and Customs (2016). Large Case List [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/MWQxMTBhOTgtZWQ5Yy00OTFhLWJjNmItYWIxZjZlZmNiMGIx
    Explore at:
    Dataset updated
    Feb 10, 2016
    Dataset provided by
    HM Revenue & Customs
    Description

    Cases within HMRC Local Compliance that have yielded in excess of £1m (Indirect Tax) or £5m (Direct Tax). Updated: monthly.

  16. Replication dataset and calculations for PIIE PB 22-9, The online gig...

    • piie.com
    Updated Jul 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lee G. Branstetter (2022). Replication dataset and calculations for PIIE PB 22-9, The online gig economy’s impact is not as big as many thought by Lee Branstetter (2022). [Dataset]. https://www.piie.com/publications/policy-briefs/2022/online-gig-economys-impact-not-big-many-thought
    Explore at:
    Dataset updated
    Jul 15, 2022
    Dataset provided by
    Peterson Institute for International Economicshttp://www.piie.com/
    Authors
    Lee G. Branstetter
    Description

    This data package includes the underlying data files to replicate the calculations and charts presented in The online gig economy’s impact is not as big as many thought, PIIE Policy Brief 22-9.

    If you use the data, please cite as: Branstetter, Lee (2022). The online gig economy’s impact is not as big as many thought, PIIE Policy Brief 22-9. Peterson Institute for International Economics.

  17. H

    Data from: A dataset of publication records for Nobel laureates

    • dataverse.harvard.edu
    Updated Dec 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jichao Li; Yian Yin; Santo Fortunato; Wang Dashun (2018). A dataset of publication records for Nobel laureates [Dataset]. http://doi.org/10.7910/DVN/6NJ5RN
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 4, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Jichao Li; Yian Yin; Santo Fortunato; Wang Dashun
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We constructed the publication records for almost all Nobel laureates in physics, chemistry, and physiology or medicine from 1900 to 2016 (545 out of 590, 92.4%). We first collected information manually from Nobel Prize official websites, their university websites, and Wikipedia. We then match it algorithmically with big data, tracing publication records from the MAG database.

  18. Replication dataset and calculations for PIIE WP 19-12, Aggregate Effects of...

    • piie.com
    Updated Jul 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jérémie Cohen-Setton; Egor Gornostay; Colombe Ladreit de Lacharrière (2019). Replication dataset and calculations for PIIE WP 19-12, Aggregate Effects of Budget Stimulus: Evidence from the Large Fiscal Expansions Database. by Jérémie Cohen-Setton, Egor Gornostay, and Colombe Ladreit de Lacharrière. (2019). [Dataset]. https://www.piie.com/publications/working-papers/aggregate-effects-budget-stimulus-evidence-large-fiscal-expansions
    Explore at:
    Dataset updated
    Jul 15, 2019
    Dataset provided by
    Peterson Institute for International Economicshttp://www.piie.com/
    Authors
    Jérémie Cohen-Setton; Egor Gornostay; Colombe Ladreit de Lacharrière
    Description

    This data package includes the underlying data and files to replicate the calculations, charts, and tables presented in Aggregate Effects of Budget Stimulus: Evidence from the Large Fiscal Expansions Database. PIIE Working Paper 19-12.

    If you use the data, please cite as: Cohen-Setton, Jeremie, Egor Gornostay, and Colombe Ladreit de Lacharrière (2019). Aggregate Effects of Budget Stimulus: Evidence from the Large Fiscal Expansions Database. PIIE Working Paper 19-12. Peterson Institute for International Economics.

  19. Replication dataset and calculations for PIIE PB 17-29, United States Is...

    • piie.com
    Updated Nov 2, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simeon Djankov (2017). Replication dataset and calculations for PIIE PB 17-29, United States Is Outlier in Tax Trends in Advanced and Large Emerging Economies, by Simeon Djankov. (2017). [Dataset]. https://www.piie.com/publications/policy-briefs/united-states-outlier-tax-trends-advanced-and-large-emerging-economies
    Explore at:
    Dataset updated
    Nov 2, 2017
    Dataset provided by
    Peterson Institute for International Economicshttp://www.piie.com/
    Authors
    Simeon Djankov
    Area covered
    United States
    Description

    This data package includes the underlying data and files to replicate the calculations, charts, and tables presented in United States Is Outlier in Tax Trends in Advanced and Large Emerging Economies, PIIE Policy Brief 17-29. If you use the data, please cite as: Djankov, Simeon. (2017). United States Is Outlier in Tax Trends in Advanced and Large Emerging Economies. PIIE Policy Brief 17-29. Peterson Institute for International Economics.

  20. Zika and Zika virus related Google Trends search queries at global level,...

    • figshare.com
    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicola Luigi Bragazzi; Cristiano Alicino; Cecilia Trucchi; Chiara Paganino; Ilaria Barberis; Mariano Martini; Laura Sticchi; Eugen Trinka; Francesco Brigo; Filippo Ansaldi; Giancarlo Icardi; Andrea Orsi (2023). Zika and Zika virus related Google Trends search queries at global level, November 2015 –October 2016. [Dataset]. http://doi.org/10.1371/journal.pone.0185263.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nicola Luigi Bragazzi; Cristiano Alicino; Cecilia Trucchi; Chiara Paganino; Ilaria Barberis; Mariano Martini; Laura Sticchi; Eugen Trinka; Francesco Brigo; Filippo Ansaldi; Giancarlo Icardi; Andrea Orsi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Zika and Zika virus related Google Trends search queries at global level, November 2015 –October 2016.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Taha Yasseri; Jonathan Bright (2023). MOESM1 of Wikipedia traffic data and electoral prediction: towards theoretically informed models [Dataset]. http://doi.org/10.6084/m9.figshare.c.3698467_D1.v1
Organization logoOrganization logo

MOESM1 of Wikipedia traffic data and electoral prediction: towards theoretically informed models

Related Article
Explore at:
txtAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Taha Yasseri; Jonathan Bright
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Party List: A table containing countries, name of the parties (English and local), election dates, party abbreviations, election vote share, change in the vote share from the previous election, number of news mentions, and the link to the Wikipedia page. (csv)

Search
Clear search
Close search
Google apps
Main menu