100+ datasets found

Data from: Quotebank: A Corpus of Quotations from a Decade of News
zenodo.org
bz2
Updated Jun 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timoté Vaucher; Andreas Spitz; Michele Catasta; Robert West; Timoté Vaucher; Andreas Spitz; Michele Catasta; Robert West (2023). Quotebank: A Corpus of Quotations from a Decade of News [Dataset]. http://doi.org/10.5281/zenodo.4277311
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.4277311
Dataset updated
Jun 18, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Timoté Vaucher; Andreas Spitz; Michele Catasta; Robert West; Timoté Vaucher; Andreas Spitz; Michele Catasta; Robert West
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

Quotebank is a dataset of 235 million unique, speaker-attributed quotations that were extracted from 196 million English news articles (127 million containing quotations) crawled from over 377 thousand web domains (15 thousand root domains) between September 2008 and April 2020. The quotations were extracted and attributed using Quobert, a distantly and minimally supervised end-to-end, language-agnostic framework for quotation attribution.

For further details, please refer to the description below and to the original paper:

Timoté Vaucher, Andreas Spitz, Michele Catasta, and Robert West
"Quotebank: A Corpus of Quotations from a Decade of News"
Proceedings of the 14th International ACM Conference on Web Search and Data Mining (WSDM), 2021.
https://doi.org/10.1145/3437963.3441760

When using the dataset, please cite the above paper (Note that the above numbers differ from those listed in the paper, as the updated data in this repository has been computed from an expanded set of input news articles).

Dataset summary

The dataset consists of two versions:

Quotation-centric version (quotes-YYYY.json.bz2)
An aggregated set of unique quotations with the most likely speaker. Each unique quotation occurs only once in this version of the data and the probabilities of the candidate speakers to which the quotation can be attributed are aggregated over all occurrences of the quotation. This version of the data is a minimal - but complete - list of attributed quotations that is aimed at users who only require quotation-speaker attributions, but no individual contexts for these quotations from the original articles.

Article-centric version (quotebank-YYYY.json.bz2)
A complete set of all individual quotation mentions with associated speaker as well as the article context in which they are mentioned. This larger version contains one entry per article in the news data. Each entry contains all speakers that appear in the news article as well as the (attributed) quotations, alongside a context window surrounding the quotations.

Both versions are split into 13 files (one per year) for ease of downloading and handling.

Dataset details

The following formatting applies to both versions of the dataset:

All data is made available in JSON format that has been compressed using bzip2.

The data is split per year (i.e., there is one file for each calendar year).

The offsets of quotations, contexts, and speaker annotations are given in units of Penn TreeBank Tokenizer tokens.

Offsets are zero-based and are computed from the start of the article.

When pairs of offsets are provided, the end offset is non-inclusive (e.g. in Python you can call tokens[start:end] without having to do end+1).

The Spinn3r data from which Quotebank was extracted had been collected over the course of over a decade. During this time, the client-side code used for collecting the data changed several times, and various character-encoding-related issues led to different representations of the original text at different times. We thus divide the 12 years spanned by the Spinn3r corpus into five phases (Phases A through E). A detailed description is available on GitHub; the key takeaways are that (1) text was lowercased in Phases A, B, and C, whereas the original capitalization was maintained in Phases D and E, and that (2) non-ASCII characters are properly represented only in Phase E.

Version 1: Quotation-centric data

In this version of the dataset, the quotations are aggregated across all their occurrences in the news article data, and assigned a probability for each speaker candidate. We consider two quotations to be equivalent and suitable for aggregation if they are identical after lower-casing and removing punctuation.

Quotation-centric data |-- quoteID: Primary key of the quotation (format: "YYYY-MM-DD-{increasing int:06d}") |-- quotation: Text of the longest encountered original form of the quotation |-- date: Earliest occurrence date of any version of the quotation |-- phase: Corresponding phase of the data in which the quotation first occurred (A-E) |-- probas: Array representing the probabilities of each speaker having uttered the quotation. The probabilities across different occurrences of the same quotation are summed for each distinct candidate speaker and then normalized |-- proba: Probability for a given speaker |-- speaker: Most frequent surface form for a given speaker in the articles where the quotation occurred |-- speaker: Selected most likely speaker. This matches the the first speaker entry in `probas` |-- qids: Wikidata IDs of all aliases that match the selected speaker |-- numOccurrences: Number of time this quotation occurs in the articles |-- urls: List of links to the original articles containing the quotation

Note that for some speakers there can be more than one Wikidata ID in the `qids` field. To access Wikidata information about those speakers it is necessary to disambiguate them, i.e., select one of the listed Wikidata IDs that most likely corresponds to the respective speaker. Speaker disambiguation can be done using scripts available in the quotebank-toolkit repository. Additionally, the repository contains useful scripts for cleaning and enriching Quotebank.

Version 2: Article-centric data

In this data set, individual quotations are not aggregated. For each article, one JSON entry contains all speakers that appear in the news article, the (attributed) quotations, and the text within a context window surrounding each of the quotations.

Article-centric data |-- articleID: Primary key |-- articleLength: Length of the article in PTB tokens |-- date: Publication date of the article |-- phase: Corresponding phase in which the article appeared (A-E) |-- title: Title of the article |-- url: Link to the original article |-- names: List of all extracted speakers that occur in the article |-- name: Surface form of the first occurrence of each speaker in the article |-- ids: List of Wikidata IDs that have `name` as a possible alias |-- offsets: List of pairs of start/end offset, signifying positions at which the speaker occurs in the article (full and partial mention of the speaker) |-- quotations: List of all the quotations that appear in the article |-- quoteID: Foreign key of the quotation (from the quotation-centric dataset) |-- quotation: Text of the quotation as it occurs in this article |-- quotationOffset: Index where the quotation starts in the article |-- leftContext: Text in the left context window of the quotation (used for the attribution) |-- rightContext: Text in the right context window (used for the attribution) |-- globalProbas: Array representing the probabilities of each speaker having uttered the quote *at the aggregated level*. Same as `probas` for a given `quoteID` |-- globalTopSpeaker: Most probable speaker *at the aggregated level*. Same as `speaker` for a given `quoteID` |-- localProbas: Array representing the probabilities of each speaker having said the quote *given this article context*. |-- proba: Probability for a given speaker |-- speaker: Name of the speaker as it first occurs in this article |-- localTopSpeaker: Selected speaker. Same name as the first entry in `localProbas` |-- numOccurrences: Number of times this quotation occurs in any article

Code repository

The code of Quobert that was used for the extraction and attribution of this data set is available and managed in a Github repository, which you can find here.
Z
EvoBib: A Bibliographic Database and Quote Collection for Historical...
data.niaid.nih.gov
zenodo.org
Updated Aug 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johann-Mattis List (2024). EvoBib: A Bibliographic Database and Quote Collection for Historical Linguistics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1181952
Explore at:
Dataset updated
Aug 22, 2024
Dataset authored and provided by
Johann-Mattis List
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This databases offers 4564 references dealing with computer-assisted language comparison in a broad sense. In addition, the database offers 8364 distinct quotes collected from 5063 references. The majority of the references in the quote database overlaps with those in the bibliographic database. The quotes are organized by keywords and can browsed with a full text and a keyword search.

The data (references and quotes) underlying each new release are provided here, the data can be browsed at https://evobib.digling.org/.

If you use the database, I would appreciate if you could this in your research:

List, Johann-Mattis (2024): EvoBib: A bibliographic database and quote collection [Database, Version 1.8.0]. Passau: Chair for Multilingual Computational Linguistics. URL: https://evobib.digling.org/
Goodreads Quotes Dataset
kaggle.com
Updated Oct 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dakidarts (2023). Goodreads Quotes Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/6605524
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/6605524
Dataset updated
Oct 3, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dakidarts
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F15886312%2F6afa14cd0d847cb7074243e0e56d804b%2Foldbook-bg.jpg?generation=1696377960854260&alt=media" alt="">

Explore a diverse and inspiring collection of quotes from the Goodreads website with our Goodreads Quotes Dataset. This dataset features a wide range of motivational, thought-provoking, and insightful quotes from various authors, thinkers, and personalities.

Dataset Details:

Format: JSON (JavaScript Object Notation), CSV (Comma-Separated Values) Columns: quote: The text of the quote. author: The author of the quote. tags: A list of tags or categories associated with the quote (e.g., ["inspiration", "motivation", "life"]).

Data Preprocessing:

The dataset has been scraped from the Goodreads website, ensuring the collection of accurate and attributed quotes. The authors' names and tags have been cleaned to remove any unnecessary characters or formatting. Tags are provided as a list of keywords for easy categorization.

Use Cases:

Natural Language Processing (NLP) tasks such as sentiment analysis, text generation, and language modeling. Content creation for websites, social media, and inspirational content. Analyzing trends in quotes, authors, and popular tags. Exploring the wisdom shared by authors and thinkers throughout history.

Acknowledgments:

The dataset was collected and curated by DWS Studio for educational and research purposes. We acknowledge Goodreads for hosting the quotes and providing valuable literary content.

License:

This dataset is provided under the terms of the Apache 2.0, ensuring that it can be used for research, analysis, and educational purposes while respecting the rights and attribution requirements of the original authors.

Disclaimer:

This dataset and its contents are intended for educational and research purposes only. Users are responsible for complying with the terms of service and policies of websites when using this data.

Start your journey with our Goodreads Quotes Dataset and let the power of words inspire your projects and analyses. If you have any questions or feedback, please feel free to contact us.
h
english_quotes
huggingface.co
opendatalab.com
Updated Dec 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abir ELTAIEF (2021). english_quotes [Dataset]. http://doi.org/10.57967/hf/1053
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/1053
Dataset updated
Dec 19, 2021
Authors
Abir ELTAIEF
Description
Dataset Card for English quotes

I-Dataset Summary

english_quotes is a dataset of all the quotes retrieved from goodreads quotes. This dataset can be used for multi-label text classification and text generation. The content of each quote is in English and concerns the domain of datasets for NLP and beyond.

II-Supported Tasks and Leaderboards

Multi-label text classification : The dataset can be used to train a model for text-classification, which consists of… See the full description on the dataset page: https://huggingface.co/datasets/Abirate/english_quotes.
o
Quotes From Goodread
opendatabay.com
.undefined
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Quotes From Goodread [Dataset]. https://www.opendatabay.com/data/ai-ml/f0d86cd4-fc04-46ae-8ef0-860d44d0a3bc
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
The data has been scraped from goodreads.com. Quotes from categories like death, inspiration , widom, love and 6 other categories have been scraped. Each one has 3000 quotes with authors who wrote the quote available in the dataset. Other tags from the quote are also mentioned in the dataset.

The data has been combined and shuffled from the 10 different categories. The total number of quotes present is 30,000.

License

CC0

Original Data Source: Quotes From Goodread
Dataset: Extracted Quotes from Community Reports Relevant to the Development...
catalog.data.gov
data.nist.gov
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). Dataset: Extracted Quotes from Community Reports Relevant to the Development of a NIST-MML Materials Data Strategy [Dataset]. https://catalog.data.gov/dataset/dataset-extracted-quotes-from-community-reports-relevant-to-the-development-of-a-nist-mml-
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The dataset consists of 375 extracted quotes from 31 community reports relevant to the development of a materials data strategy for the NIST Materials Measurement Laboratory (MML). The dataset is used in the NIST internal report "A Materials Data Strategy." In the past decade, numerous public and private sector documents have highlighted the need for materials data to facilitate advanced technologies in myriad industrial and economic sectors. These documents have been analyzed to identify prevalent gaps in the establishment of an interconnected materials data infrastructure akin to that envisioned in the federal agency-wide Materials Genome Initiative. The internal report uses a uniform schematic format to portray these gaps, illustrate progress in addressing the gaps, and propose an MML roadmap of action items to further address the gaps.
Consumer price inflation consumption segment indices and price quotes
ons.gov.uk
cy.ons.gov.uk
csv
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). Consumer price inflation consumption segment indices and price quotes [Dataset]. https://www.ons.gov.uk/economy/inflationandpriceindices/datasets/consumerpriceindicescpiandretailpricesindexrpiitemindicesandpricequotes
Explore at:
csvAvailable download formats
Dataset updated
Jun 18, 2025
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Price quote data (for locally collected data only) and consumption segment indices that underpin consumer price inflation statistics, giving users access to the detailed data that are used in the construction of the UK’s inflation figures. The data are being made available for research purposes only and are not an accredited official statistic. From October 2024, private school fees and part-time education classes have been included in the consumption segment indices file. For more information on the introduction of consumption segments, please see the Consumer Prices Indices Technical Manual, 2019. Note that this dataset was previously called the consumer price inflation item indices and price quotes dataset.
d
AlgoSeek Equity Trade and Quote Data US coverage - nanosecond timestamps...
datarade.ai
Updated Feb 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AlgoSeek (2021). AlgoSeek Equity Trade and Quote Data US coverage - nanosecond timestamps since 2016 [Dataset]. https://datarade.ai/data-products/algoseek-equity-trade-and-quote-data-algoseek
Explore at:
Dataset updated
Feb 3, 2021
Dataset authored and provided by
AlgoSeek
Area covered
United States
Description
algoseek Trade and Quote (TAQ) data contain all trades and top-of-book intraday quotes for all listed stocks, ETNs, ETFs, ADRs, and funds from 15+ US exchanges and marketplaces. TAQ data files are organized into a single format feed where events are ordered by the time received with nanosecond timestamps starting from 2016, and millisecond timestamps before. The entire trading session includes early and late hours from 04:00 to 20:00 EST
Willingness to share driving data for personalized insurance quotes U.S....
statista.com
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Willingness to share driving data for personalized insurance quotes U.S. 2017, by age [Dataset]. https://www.statista.com/statistics/719039/willingness-to-share-driving-data-for-personalized-insurance-quotes-usa-by-age/
Explore at:
Dataset updated
May 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2017
Area covered
United States
Description
This statistic shows the willingness to share recent driving data for personalized insurance quotes in the United States in 2017, by generation. Millennials were the most likely to share their recent driving data with 93 percent of those respondents saying that they would be willing to do that.
o
Quotes Dataset
opendatabay.com
.undefined
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Quotes Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/5998ae79-6192-483f-b5a5-8075cf335b18
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 24, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
Context: The data was created to build a Content Based Recommendation System using Text Data.

Ideas: Create a content-based recommendation engine based on user preference. Data preprocessing using NLP Methods. Analyze Textual Dataset.

License

CC0

Original Data Source: Quotes Dataset
M
33 Mindfulness Quotes Reference Table
7chakracolors.com
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mindfulness Wisdom Collection (2024). 33 Mindfulness Quotes Reference Table [Dataset]. https://www.7chakracolors.com/blog/33-powerful-mindfulness-quotes/
Explore at:
Dataset updated
2024
Dataset provided by
7 Chakra Colors
Authors
Mindfulness Wisdom Collection
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comprehensive reference table of 33 powerful mindfulness quotes organized by category and author, featuring wisdom from spiritual teachers like Thich Nhat Hanh, Buddha, Jon Kabat-Zinn, and others. Each quote is categorized by practical application including Present Moment Mastery, Inner Peace & Self-Compassion, Understanding & Managing Emotions, Awakening & Awareness, Life Philosophy & Wisdom, and Inner Wisdom & Intuition.
Stoic quotes
kaggle.com
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tejas Nisar (2025). Stoic quotes [Dataset]. https://www.kaggle.com/datasets/tejasnisar/stoic-quotes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 19, 2025
Dataset provided by
Kaggle
Authors
Tejas Nisar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📚 About This Dataset: Stoic Wisdom — Quotes from Prominent Stoic Philosophers

🌟 Overview

This dataset is a comprehensive collection of Stoic quotes sourced from some of the most prominent Stoic philosophers and writers. It includes timeless wisdom from the likes of Marcus Aurelius, Seneca, Epictetus, Zeno of Citium, Musonius Rufus, and others. These quotes cover a wide range of Stoic themes such as resilience, discipline, mindfulness, control, and virtue—offering practical guidance for living a fulfilling and rational life.

The data was scraped from Goodreads using Python.

📄 Dataset Details • Quote: The Stoic quote text. • Author: The philosopher or writer who authored the quote. • Book: The source/book where the quote is found (if available). • Tags: The main themes or topics related to the quote (e.g., “attitude,” “pain,” “stoicism,” “freedom”).

🔍 Potential Use Cases • Sentiment Analysis: Analyze Stoic sentiments related to topics like death, fate, and personal control. • Topic Modeling: Identify core Stoic themes using unsupervised NLP techniques. • Philosophical Comparisons: Compare Stoic quotes to other schools of thought (e.g., Epicureanism, Buddhism). • Machine Learning: Build classifiers to predict the author or theme of a quote based on textual features. • Personal Development Tools: Power applications that deliver daily Stoic reflections and insights.

📝 Acknowledgments • The quotes were sourced from Goodreads. • The dataset is intended for educational and research purposes, respecting the content’s original attribution.
d
AlgoSeek Futures Trade and Quote data US coverage - historic data till 2010
datarade.ai
Updated Jan 15, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AlgoSeek (2010). AlgoSeek Futures Trade and Quote data US coverage - historic data till 2010 [Dataset]. https://datarade.ai/data-products/algoseek-futures-trade-and-quote-data-algoseek
Explore at:
Dataset updated
Jan 15, 2010
Dataset authored and provided by
AlgoSeek
Area covered
United States of America
Description
algoseek Futures Trade and Quote data include trades and quotes with condition codes (including Aggressor Side). Both processed TAQ and unprocessed raw file are available. Processed TAQ dataset has millisecond timestamp. The data is from CME, CBOT, NYMEX, and Comex. Data is as far back as January 2010.
Mexico Avg Daily Quote Salaries: Expanded
ceicdata.com
Updated Jan 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2025). Mexico Avg Daily Quote Salaries: Expanded [Dataset]. https://www.ceicdata.com/en/indicator/mexico/data/avg-daily-quote-salaries-expanded
Explore at:
Dataset updated
Jan 15, 2025
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 1, 2018 - Feb 1, 2019
Area covered
Mexico
Description
Mexico Avg Daily Quote Salaries: Expanded data was reported at 373.600 MXN in Feb 2019. This records an increase from the previous number of 372.275 MXN for Jan 2019. Mexico Avg Daily Quote Salaries: Expanded data is updated monthly, averaging 241.011 MXN from Jan 2000 (Median) to Feb 2019, with 230 observations. The data reached an all-time high of 373.600 MXN in Feb 2019 and a record low of 129.283 MXN in Feb 2000. Mexico Avg Daily Quote Salaries: Expanded data remains active status in CEIC and is reported by Secretary of Labor and Social Security. The data is categorized under Global Database’s Mexico – Table MX.G042: Average Daily Quote Salaries: Expanded.
c
Crypto Quotes: Real-Time & Historical CEX/DEX Data | Crypto Data | Bid Price...
dataproducts.coinapi.io
Updated Oct 10, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CoinAPI (2018). Crypto Quotes: Real-Time & Historical CEX/DEX Data | Crypto Data | Bid Price | Ask Price [Dataset]. https://dataproducts.coinapi.io/products/coinapi-crypto-quotes-data-real-time-historical-quotes-coinapi
Explore at:
Dataset updated
Oct 10, 2018
Dataset provided by
Coinapi Ltd
Authors
CoinAPI
Area covered
Western Sahara, Kenya, Paraguay, Comoros, Niue, Djibouti, Slovakia, Benin, France, Saint Pierre and Miquelon
Description
CoinAPI offers digital asset data with crypto quotes from both CEX and DEX sources. Access real-time and historical market information including bid prices, ask prices, trading volumes, and precise timestamps. Our complete crypto data enables informed decisions through accurate market insights.
Mexico Avg Daily Quote Salaries: Expanded: Social Services
ceicdata.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com, Mexico Avg Daily Quote Salaries: Expanded: Social Services [Dataset]. https://www.ceicdata.com/en/mexico/average-daily-quote-salaries-expanded/avg-daily-quote-salaries-expanded-social-services
Explore at:
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 1, 2018 - Feb 1, 2019
Area covered
Mexico
Description
Mexico Avg Daily Quote Salaries: Expanded: Social Services data was reported at 502.300 MXN in Feb 2019. This records an increase from the previous number of 502.200 MXN for Jan 2019. Mexico Avg Daily Quote Salaries: Expanded: Social Services data is updated monthly, averaging 312.028 MXN from Jan 2000 (Median) to Feb 2019, with 230 observations. The data reached an all-time high of 502.300 MXN in Feb 2019 and a record low of 155.822 MXN in Jan 2000. Mexico Avg Daily Quote Salaries: Expanded: Social Services data remains active status in CEIC and is reported by Secretary of Labor and Social Security. The data is categorized under Global Database’s Mexico – Table MX.G042: Average Daily Quote Salaries: Expanded.
$AAPL Option Chains - Q1 2016 to Q1 2023
kaggle.com
Updated Apr 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kyle Graupe (2023). $AAPL Option Chains - Q1 2016 to Q1 2023 [Dataset]. https://www.kaggle.com/datasets/kylegraupe/aapl-options-data-2016-2020
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 7, 2023
Dataset provided by
Kaggle
Authors
Kyle Graupe
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
IF YOU FIND THIS CONTENT USEFUL, PLEASE LEAVE AN UPVOTE, COMMENT, AND/OR FOLLOW!

This dataset is a combination of four years of Apple ($AAPL) options end of day quotes ranging from 01-2016 to 03-2023. Each row represents the information associated with one contract's strike price and a given expiration date.

Dates quotes are given in in Unix and in "YYYY-MM-DD HH:MM" formats. Quote frequency is daily at 4:00 pm EST, which corresponds with end of day market closure.

REMEMBER: Apple stock split on August 28, 2020. This will be reflected in the data. Keep this in mind!

What is an option chain?

An option chain can be defined as the listing of all option contracts. It comes with two different sections: call and put. A call option means a contract that gives you the right but does not give you the obligation to buy an underlying asset at a particular price and within the option's expiration date. This means that in this dataset, there will be the entire option chain (all available option contracts for all expirations) for each business day between Q1 2016 and Q1 2023.

This dataset contains data for American options, which can be exercised on or before expiration date. This is unlike European options contracts, which can only be exercised on the expiration date.

I am also continuously working on the associated notebook to give a basic idea of how to load and explore the data. Stay tuned!

Similar Datasets: - $TSLA Option Chains - $SPY Option Chains - $NVDA Option Chains - $QQQ Option Chains
d
Historical Futures Trade and Quote Data (Europe, China, USA & Canada...
datarade.ai
Updated May 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olsen Data (2021). Historical Futures Trade and Quote Data (Europe, China, USA & Canada covered)⎢Olsen Data [Dataset]. https://datarade.ai/data-products/historical-futures-trade-and-quote-data-olsen-data
Explore at:
Dataset updated
May 1, 2021
Dataset provided by
Olsen Ltd.
Authors
Olsen Data
Area covered
United Kingdom, China, Canada, Japan, United States
Description
Futures data can be ordered as full month ranges. To control costs it is possible to order Nearest to Expiry (NTE) data with overlap between expiring future and the next future in the month of expiry or with overlap over more than 1 month is needed. Of course you can also select all active expiries if required.

The data is available at tick level with millisecond resolution as well as at regular intervals of 1 Min, 5 Min and so on.

Data is priced separately for Trades (Tx) and Quotes (Qt).

Tick level Tx data consists of a millisecond timestamp and trade price Tx with an option to include the Volume field. Tick level Qt data consists of millisecond timestamp and quote Qt with a flag to indicate whether it is a Bid or an Ask and optionally the Qt size field can be added.

Regular interval data is usually supplied as one of these sets: CloseTx CloseBid, CloseAsk OpenTx, HighTx, LowTx, CloseTx OpenBid, HighBid, LowBid, CloseBid OpenAsk, HighAsk, LowAsk, CloseAsk

Additional Fields: IntervalTxVolume, CloseBidSize, CloseAskSize and some others are available if required.

Timestamps are by default in GMT but data can be in any Time Zone requested.

Pricing depends on frequency and number of fields.

100s of papers in finance and economics have been written since 1986 onwards using our data and several reputed banks and hedge funds use our data for back testing and risk management.
EvoBib: A Bibliographic Database and Quote Collection for Historical...
zenodo.org
bin, csv
Updated Feb 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johann-Mattis List; Johann-Mattis List (2022). EvoBib: A Bibliographic Database and Quote Collection for Historical Linguistics [Dataset]. http://doi.org/10.5281/zenodo.3699172
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3699172
Dataset updated
Feb 21, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Johann-Mattis List; Johann-Mattis List
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This databases offers 3524 references dealing with computer-assisted language comparison in a broad sense. In addition, the database offers 5298 distinct quotes collected from 2835 references. The majority of the references in the quote database overlaps with those in the bibliographic database. The quotes are organized by keywords and can browsed with a full text and a keyword search.

The data (references and quotes) underlying each new release are provided here, the data can be browsed at https://digling.org/evobib/.

If you use the database, I would appreciate if you could this in your research:

> List, Johann-Mattis (2020): EvoBib: A bibliographical database and quote collection for historical linguistics. Version 1.1.0. Jena: Max Planck Institute for the Science of Human History. URL: https://digling.org/evobib/ DOI: 10.5281/zenodo.3699172
4
Quotation data from an embedded study at an airport organization about the...
data.4tu.nl
zip
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aniek Toet (2025). Quotation data from an embedded study at an airport organization about the transformation to a multimodal transport hub [Dataset]. http://doi.org/10.4121/42897af5-71dd-4896-b083-d5862ef0f7d1.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/42897af5-71dd-4896-b083-d5862ef0f7d1.v1
Dataset updated
Jun 26, 2025
Dataset provided by
4TU.ResearchData
Authors
Aniek Toet
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Time period covered
2021 - 2023
Description
Data set of an embedded study at an airport organization about the transformation towards an multimodal transport hub. Data is result of (in)formal conversations, diary notes, meetings notes and observations. This is a condensed data set, representing quotes and notes that were deemed relevant to the study's topic. The data has been collected over 16 months, and the data set includes (anonymous) quotations and parafrases, en also condensed meaning units (the interpretation of the researchers).

Facebook

Twitter

Click to copy link

Link copied

Cite

Timoté Vaucher; Andreas Spitz; Michele Catasta; Robert West; Timoté Vaucher; Andreas Spitz; Michele Catasta; Robert West (2023). Quotebank: A Corpus of Quotations from a Decade of News [Dataset]. http://doi.org/10.5281/zenodo.4277311

Data from: Quotebank: A Corpus of Quotations from a Decade of News

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

bz2Available download formats

Unique identifier

https://doi.org/10.5281/zenodo.4277311

Dataset updated

Jun 18, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Timoté Vaucher; Andreas Spitz; Michele Catasta; Robert West; Timoté Vaucher; Andreas Spitz; Michele Catasta; Robert West

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Introduction

Quotebank is a dataset of 235 million unique, speaker-attributed quotations that were extracted from 196 million English news articles (127 million containing quotations) crawled from over 377 thousand web domains (15 thousand root domains) between September 2008 and April 2020. The quotations were extracted and attributed using Quobert, a distantly and minimally supervised end-to-end, language-agnostic framework for quotation attribution.

For further details, please refer to the description below and to the original paper:

Timoté Vaucher, Andreas Spitz, Michele Catasta, and Robert West
"Quotebank: A Corpus of Quotations from a Decade of News"
Proceedings of the 14th International ACM Conference on Web Search and Data Mining (WSDM), 2021.
https://doi.org/10.1145/3437963.3441760

When using the dataset, please cite the above paper (Note that the above numbers differ from those listed in the paper, as the updated data in this repository has been computed from an expanded set of input news articles).

Dataset summary

The dataset consists of two versions:

Quotation-centric version (quotes-YYYY.json.bz2)
An aggregated set of unique quotations with the most likely speaker. Each unique quotation occurs only once in this version of the data and the probabilities of the candidate speakers to which the quotation can be attributed are aggregated over all occurrences of the quotation. This version of the data is a minimal - but complete - list of attributed quotations that is aimed at users who only require quotation-speaker attributions, but no individual contexts for these quotations from the original articles.
Article-centric version (quotebank-YYYY.json.bz2)
A complete set of all individual quotation mentions with associated speaker as well as the article context in which they are mentioned. This larger version contains one entry per article in the news data. Each entry contains all speakers that appear in the news article as well as the (attributed) quotations, alongside a context window surrounding the quotations.

Both versions are split into 13 files (one per year) for ease of downloading and handling.

Dataset details

The following formatting applies to both versions of the dataset:

All data is made available in JSON format that has been compressed using bzip2.
The data is split per year (i.e., there is one file for each calendar year).
The offsets of quotations, contexts, and speaker annotations are given in units of Penn TreeBank Tokenizer tokens.
Offsets are zero-based and are computed from the start of the article.
When pairs of offsets are provided, the end offset is non-inclusive (e.g. in Python you can call tokens[start:end] without having to do end+1).
The Spinn3r data from which Quotebank was extracted had been collected over the course of over a decade. During this time, the client-side code used for collecting the data changed several times, and various character-encoding-related issues led to different representations of the original text at different times. We thus divide the 12 years spanned by the Spinn3r corpus into five phases (Phases A through E). A detailed description is available on GitHub; the key takeaways are that (1) text was lowercased in Phases A, B, and C, whereas the original capitalization was maintained in Phases D and E, and that (2) non-ASCII characters are properly represented only in Phase E.

Version 1: Quotation-centric data

In this version of the dataset, the quotations are aggregated across all their occurrences in the news article data, and assigned a probability for each speaker candidate. We consider two quotations to be equivalent and suitable for aggregation if they are identical after lower-casing and removing punctuation.

Quotation-centric data
 |-- quoteID: Primary key of the quotation (format: "YYYY-MM-DD-{increasing int:06d}")
 |-- quotation: Text of the longest encountered original form of the quotation
 |-- date: Earliest occurrence date of any version of the quotation
 |-- phase: Corresponding phase of the data in which the quotation first occurred (A-E)
 |-- probas: Array representing the probabilities of each speaker having uttered the quotation.
   The probabilities across different occurrences of the same quotation are summed for
   each distinct candidate speaker and then normalized
   |-- proba: Probability for a given speaker
   |-- speaker: Most frequent surface form for a given speaker in the articles where the quotation occurred
 |-- speaker: Selected most likely speaker. This matches the the first speaker entry in `probas`
 |-- qids: Wikidata IDs of all aliases that match the selected speaker
 |-- numOccurrences: Number of time this quotation occurs in the articles
 |-- urls: List of links to the original articles containing the quotation

Note that for some speakers there can be more than one Wikidata ID in the `qids` field. To access Wikidata information about those speakers it is necessary to disambiguate them, i.e., select one of the listed Wikidata IDs that most likely corresponds to the respective speaker. Speaker disambiguation can be done using scripts available in the quotebank-toolkit repository. Additionally, the repository contains useful scripts for cleaning and enriching Quotebank.

Version 2: Article-centric data

In this data set, individual quotations are not aggregated. For each article, one JSON entry contains all speakers that appear in the news article, the (attributed) quotations, and the text within a context window surrounding each of the quotations.

Article-centric data
 |-- articleID: Primary key
 |-- articleLength: Length of the article in PTB tokens
 |-- date: Publication date of the article
 |-- phase: Corresponding phase in which the article appeared (A-E)
 |-- title: Title of the article
 |-- url: Link to the original article
 |-- names: List of all extracted speakers that occur in the article
   |-- name: Surface form of the first occurrence of each speaker in the article
   |-- ids: List of Wikidata IDs that have `name` as a possible alias
   |-- offsets: List of pairs of start/end offset, signifying positions at which the speaker occurs in the article (full and partial mention of the speaker)
 |-- quotations: List of all the quotations that appear in the article
   |-- quoteID: Foreign key of the quotation (from the quotation-centric dataset)
   |-- quotation: Text of the quotation as it occurs in this article
   |-- quotationOffset: Index where the quotation starts in the article
   |-- leftContext: Text in the left context window of the quotation (used for the attribution)
   |-- rightContext: Text in the right context window (used for the attribution)
   |-- globalProbas: Array representing the probabilities of each speaker having uttered the quote *at the aggregated level*. Same as `probas` for a given `quoteID`
   |-- globalTopSpeaker: Most probable speaker *at the aggregated level*. Same as `speaker` for a given `quoteID` 
   |-- localProbas: Array representing the probabilities of each speaker having said the quote *given this article context*.
      |-- proba: Probability for a given speaker
      |-- speaker: Name of the speaker as it first occurs in this article
   |-- localTopSpeaker: Selected speaker. Same name as the first entry in `localProbas`
   |-- numOccurrences: Number of times this quotation occurs in any article

Code repository

The code of Quobert that was used for the extraction and attribution of this data set is available and managed in a Github repository, which you can find here.

Clear search

Close search

Google apps

Main menu

Data from: Quotebank: A Corpus of Quotations from a Decade of News

EvoBib: A Bibliographic Database and Quote Collection for Historical...

Goodreads Quotes Dataset

english_quotes

Quotes From Goodread

License

Dataset: Extracted Quotes from Community Reports Relevant to the Development...

Consumer price inflation consumption segment indices and price quotes

AlgoSeek Equity Trade and Quote Data US coverage - nanosecond timestamps...

Willingness to share driving data for personalized insurance quotes U.S....

Quotes Dataset

License

33 Mindfulness Quotes Reference Table

Stoic quotes

AlgoSeek Futures Trade and Quote data US coverage - historic data till 2010

Mexico Avg Daily Quote Salaries: Expanded

Crypto Quotes: Real-Time & Historical CEX/DEX Data | Crypto Data | Bid Price...

Mexico Avg Daily Quote Salaries: Expanded: Social Services

$AAPL Option Chains - Q1 2016 to Q1 2023

Historical Futures Trade and Quote Data (Europe, China, USA & Canada...

EvoBib: A Bibliographic Database and Quote Collection for Historical...

Quotation data from an embedded study at an airport organization about the...

Data from: Quotebank: A Corpus of Quotations from a Decade of NewsSee More Versions

Data from: Quotebank: A Corpus of Quotations from a Decade of News