47 datasets found
  1. h

    Revenue

    • huggingface.co
    Updated May 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    InDyne (2024). Revenue [Dataset]. https://huggingface.co/datasets/InDyne/Revenue
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 5, 2024
    Dataset authored and provided by
    InDyne
    Description

    InDyne/Revenue dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. h

    earnings-calls-qa

    • huggingface.co
    Updated Dec 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lamini (2022). earnings-calls-qa [Dataset]. https://huggingface.co/datasets/lamini/earnings-calls-qa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 1, 2022
    Dataset authored and provided by
    Lamini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Lamini Earning Calls QA Dataset

      Description
    

    This dataset contains transcripts of earning calls for various companies, along with questions and answers related to the companies' financial performance and other relevant topics.

      Format
    

    The transcripts, questions, and answers are in the form of jsonlines files, with each json object in the file containing the transcript of an earning call for a single company.

      Data Pipeline Code
    

    The entire data pipeline… See the full description on the dataset page: https://huggingface.co/datasets/lamini/earnings-calls-qa.

  3. h

    finRAG

    • huggingface.co
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parsee.ai (2024). finRAG [Dataset]. https://huggingface.co/datasets/parsee-ai/finRAG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2024
    Dataset provided by
    Parsee.ai
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    finRAG Datasets

    This is the official Huggingface repo of the finRAG datasets published by parsee.ai. More detailed information about the 3 datasets and methodology can be found in the sub-directories for the individual datasets. We wanted to investigate how good the current state of the art (M)LLMs are at solving the relatively simple problem of extracting revenue figures from publicly available financial reports. To test this, we created 3 different datasets, all based on the same… See the full description on the dataset page: https://huggingface.co/datasets/parsee-ai/finRAG.

  4. h

    cqadupstack-gaming-top-20-gen-queries

    • huggingface.co
    Updated Apr 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INCOME (2023). cqadupstack-gaming-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/cqadupstack-gaming-top-20-gen-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 1, 2023
    Dataset authored and provided by
    INCOME
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NFCorpus: 20 generated queries (BEIR Benchmark)

    This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

    DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

    Below contains the old dataset card for the BEIR benchmark.

      Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/cqadupstack-gaming-top-20-gen-queries.
    
  5. h

    adult-census-income

    • huggingface.co
    • opendatalab.com
    • +1more
    Updated Feb 1, 2001
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    scikit-learn (2001). adult-census-income [Dataset]. https://huggingface.co/datasets/scikit-learn/adult-census-income
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 1, 2001
    Dataset authored and provided by
    scikit-learn
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Adult Census Income Dataset

    The following was retrieved from UCI machine learning repository. This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1) && (HRSWK>0)). The prediction task is to determine whether a person makes over $50K a year. Description of fnlwgt (final weight)… See the full description on the dataset page: https://huggingface.co/datasets/scikit-learn/adult-census-income.

  6. h

    revenue-estimate-stocks

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chuyin0321, revenue-estimate-stocks [Dataset]. https://huggingface.co/datasets/chuyin0321/revenue-estimate-stocks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    chuyin0321
    Description

    Dataset Card for "revenue-estimate-stocks"

    More Information needed

  7. h

    earnings22_baseline_5_gram

    • huggingface.co
    Updated Jul 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anton Lozhkov (2023). earnings22_baseline_5_gram [Dataset]. https://huggingface.co/datasets/anton-l/earnings22_baseline_5_gram
    Explore at:
    Dataset updated
    Jul 17, 2023
    Authors
    Anton Lozhkov
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Earnings 22 dataset ( also referred to as earnings22 ) is a 119-hour corpus of English-language earnings calls collected from global companies. The primary purpose is to serve as a benchmark for industrial and academic automatic speech recognition (ASR) models on real-world accented speech.

  8. h

    earnings22

    • huggingface.co
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Whisper Distillation (2024). earnings22 [Dataset]. https://huggingface.co/datasets/distil-whisper/earnings22
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 21, 2024
    Dataset authored and provided by
    Whisper Distillation
    Description

    Dataset Card for Earnings 22

      Dataset Summary
    

    Earnings-22 provides a free-to-use benchmark of real-world, accented audio to bridge academic and industrial research. This dataset contains 125 files totalling roughly 119 hours of English language earnings calls from global countries. This dataset provides the full audios, transcripts, and accompanying metadata such as ticker symbol, headquarters country, and our defined "Language Region".

      Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/distil-whisper/earnings22.
    
  9. h

    earnings_call

    • huggingface.co
    • dataverse.nl
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Henning, earnings_call [Dataset]. http://doi.org/10.34894/TJE0D0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    John Henning
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    The dataset reports a collection of earnings call transcripts, the related stock prices, and the sector index In terms of volume, there is a total of 188 transcripts, 11970 stock prices, and 1196 sector index values. Furthermore, all of these data originated in the period 2016-2020 and are related to the NASDAQ stock market. Furthermore, the data collection was made possible by Yahoo Finance and Thomson Reuters Eikon. Specifically, Yahoo Finance enabled the search for stock values and Thomson Reuters Eikon provided the earnings call transcripts. Lastly, the dataset can be used as a benchmark for the evaluation of several NLP techniques to understand their potential for financial applications. Moreover, it is also possible to expand the dataset by extending the period in which the data originated following a similar procedure.

  10. h

    trec-news-top-20-gen-queries

    • huggingface.co
    Updated Mar 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INCOME (2023). trec-news-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/trec-news-top-20-gen-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 13, 2023
    Dataset authored and provided by
    INCOME
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NFCorpus: 20 generated queries (BEIR Benchmark)

    This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

    DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

    Below contains the old dataset card for the BEIR benchmark.

      Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/trec-news-top-20-gen-queries.
    
  11. h

    CUADRevenueProfitSharingLegalBenchClassification

    • huggingface.co
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). CUADRevenueProfitSharingLegalBenchClassification [Dataset]. https://huggingface.co/datasets/mteb/CUADRevenueProfitSharingLegalBenchClassification
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CUADRevenueProfitSharingLegalBenchClassification An MTEB dataset Massive Text Embedding Benchmark

    This task was constructed from the CUAD dataset. It consists of determining if the clause require a party to share revenue or profit with the counterparty for any technology, goods, or services.

    Task category t2c

    Domains Legal, Written

    Reference https://huggingface.co/datasets/nguha/legalbench

      How to evaluate on this task
    

    You can evaluate an embedding… See the full description on the dataset page: https://huggingface.co/datasets/mteb/CUADRevenueProfitSharingLegalBenchClassification.

  12. h

    income

    • huggingface.co
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahul Sivaram (2024). income [Dataset]. https://huggingface.co/datasets/rahulisivaram5/income
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 10, 2024
    Authors
    Rahul Sivaram
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    rahulisivaram5/income dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    BusinessData

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mita D (2025). BusinessData [Dataset]. https://huggingface.co/datasets/mitadhamdhere13/BusinessData
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    Mita D
    Description

    language:

    en --Generate a clean Excel dataset with the following columns: Date (from 01-01-2023 to 31-12-2025), Region (North, South, East, West), Branch (Branch A to Branch E), Business Type (B2B & B2C), Partner ID (should be unique), Client ID (should be unique), Total Investment, Total Revenue, Revenue generated by B2B, Revenue generated by B2C, Revenue generated by Partner, Partner share of 40% from total revenue, Admin Expenses, Employee & HR Expenses, Marketing Expense, Technology… See the full description on the dataset page: https://huggingface.co/datasets/mitadhamdhere13/BusinessData.

  14. h

    scidocs-top-20-gen-queries

    • huggingface.co
    Updated Apr 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INCOME (2023). scidocs-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/scidocs-top-20-gen-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 1, 2023
    Dataset authored and provided by
    INCOME
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NFCorpus: 20 generated queries (BEIR Benchmark)

    This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

    DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

    Below contains the old dataset card for the BEIR benchmark.

      Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/scidocs-top-20-gen-queries.
    
  15. h

    Stocks-Quarterly-Earnings

    • huggingface.co
    Updated Aug 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Papers With Backtest (2024). Stocks-Quarterly-Earnings [Dataset]. https://huggingface.co/datasets/paperswithbacktest/Stocks-Quarterly-Earnings
    Explore at:
    Dataset updated
    Aug 22, 2024
    Dataset authored and provided by
    Papers With Backtest
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Information

    This dataset includes quarterly earnings reports for various US stocks.

      Instruments Included
    

    7000+ US Stocks

      Dataset Columns
    

    symbol: The stock ticker or financial instrument identifier associated with the data. date: The end date of the fiscal period for which the financial data is reported. reported_date: The actual date on which the company reported its earnings or financial results. reported_eps: The earnings per share (EPS) that the… See the full description on the dataset page: https://huggingface.co/datasets/paperswithbacktest/Stocks-Quarterly-Earnings.

  16. h

    climate-fever-top-20-gen-queries

    • huggingface.co
    Updated Mar 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INCOME (2023). climate-fever-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/climate-fever-top-20-gen-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2023
    Dataset authored and provided by
    INCOME
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NFCorpus: 20 generated queries (BEIR Benchmark)

    This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

    DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

    Below contains the old dataset card for the BEIR benchmark.

      Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/climate-fever-top-20-gen-queries.
    
  17. earnings-raw

    • huggingface.co
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lamini (2024). earnings-raw [Dataset]. https://huggingface.co/datasets/lamini/earnings-raw
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 1, 2024
    Dataset provided by
    PowerML, Inc.
    Authors
    Lamini
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    lamini/earnings-raw dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    adult

    • huggingface.co
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mattia (2023). adult [Dataset]. https://huggingface.co/datasets/mstz/adult
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 2, 2023
    Authors
    Mattia
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    Adult

    The Adult dataset from the UCI ML repository. Census dataset including personal characteristic of a person, and their income threshold.

      Configurations and tasks
    

    Configuration Task Description

    encoding

    Encoding dictionary showing original values of encoded features.

    income Binary classification Classify the person's income as over or under the threshold.

    income-no race Binary classification As income, but the race feature is removed.

    race Multiclass… See the full description on the dataset page: https://huggingface.co/datasets/mstz/adult.

  19. h

    cr-y2024-summer-556-profit-points-17k

    • huggingface.co
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semyon Volkov (2024). cr-y2024-summer-556-profit-points-17k [Dataset]. https://huggingface.co/datasets/7wolf/cr-y2024-summer-556-profit-points-17k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2024
    Authors
    Semyon Volkov
    Description

    7wolf/cr-y2024-summer-556-profit-points-17k dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    arguana-top-20-gen-queries

    • huggingface.co
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INCOME (2023). arguana-top-20-gen-queries [Dataset]. https://huggingface.co/datasets/income/arguana-top-20-gen-queries
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 8, 2023
    Dataset authored and provided by
    INCOME
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    NFCorpus: 20 generated queries (BEIR Benchmark)

    This HF dataset contains the top-20 synthetic queries generated for each passage in the above BEIR benchmark dataset.

    DocT5query model used: BeIR/query-gen-msmarco-t5-base-v1 id (str): unique document id in NFCorpus in the BEIR benchmark (corpus.jsonl). Questions generated: 20 Code used for generation: evaluate_anserini_docT5query_parallel.py

    Below contains the old dataset card for the BEIR benchmark.

      Dataset Card for BEIR… See the full description on the dataset page: https://huggingface.co/datasets/income/arguana-top-20-gen-queries.
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
InDyne (2024). Revenue [Dataset]. https://huggingface.co/datasets/InDyne/Revenue

Revenue

InDyne/Revenue

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 5, 2024
Dataset authored and provided by
InDyne
Description

InDyne/Revenue dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu