100+ datasets found
  1. NLP Mental Health Conversations

    • kaggle.com
    zip
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). NLP Mental Health Conversations [Dataset]. https://www.kaggle.com/datasets/thedevastator/nlp-mental-health-conversations
    Explore at:
    zip(1552188 bytes)Available download formats
    Dataset updated
    Nov 24, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    NLP Mental Health Conversations

    Stimulating AI-Driven Mental Health Guidance

    By Huggingface Hub [source]

    About this dataset

    This dataset contains conversations between users and experienced psychologists related to mental health topics. Carefully collected and anonymized, the data can be used to further the development of Natural Language Processing (NLP) models which focus on providing mental health advice and guidance. It consists of a variety of questions which will help train NLP models to provide users with appropriate advice in response to their queries. Whether you're an AI developer interested in building the next wave of mental health applications or a therapist looking for insights into how technology is helping people connect; this dataset provides invaluable support for advancing our understanding of human relationships through Artificial Intelligence

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This guide will provide you with the necessary knowledge to effectively use this dataset for Natural Language Processing (NLP)-based applications.

    • Download and install the dataset: To begin using the dataset, download it from Kaggle onto your system. Once downloaded, unzip and extract the .csv file into a directory of your choice.

    • Familiarize yourself with the columns: Before working with the data, it’s important to familiarize yourself with all of its components. This dataset contains two columns - Context and Response - which are intentionally structured to produce conversations between users and psychologists related to mental health topics for NLP models dedicated to providing mental health advice and guidance.

    • Analyze data entries: If possible or desired, take time now to analyze what is included in each entry; this may help you better untangle any challenges that come up during subsequent processes yet won't be required for most steps going forward if you prefer not too jump ahead of yourself at this juncture of your work process just yet! Examine questions asked by users as well as answers provided by experts in order glean an overall picture of what types of conversations are taking place within this pool of data that can help guide further work on NLP models for AI-driven mental health guidance purposes later on down the road!

    • Cleanse any information not applicable to NLP decisioning relevant application goals: It's important that only meaningful items related towards achieving AI-driven results remain within a clean copy of this Dataset going forward; consider removing all extra many verbatim entries or other pieces uneeded while also otherwise making sure all included content adheres closely enough one particular decisions purpose expected from an end goal perspective before proceeding onwards now until an ultimate end result has been successfully achieved eventually afterwards later on next afterward soon afterwards too following conveniently satisfyingly after accordingly shortly near therefore meaningfully likewise conclusively thoroughly properly productively purposely then eventually effectively finally indeed desirably plus concludingly enjoyably popularly splendidly attractively satisfactorally propitiously outstandingly fluently promisingly opportunely in conclusion efficiently hopefully progressively breathtaking deliciousness ideally genius mayhem invented unique impossibility everlastingly intense qualitative cohesiveness behaviorally affectionately fixed voraciously like alive supportively choicest decisively luckily chaotically co-creatively introducing ageless intricacy voicing auspicious promise enterprisingly preferred mathematically godly happening humorous respective achieve ultra favorability fundamentals essentials speciality grandiose selectively perfectly

    Research Ideas

    • Creating sentence-matching algorithms for natural language processing to accurately match given questions with appropriate advice and guidance.
    • Analyzing the psychological conversations to gain insights into topics such as stress, anxiety, and depression.
    • Developing personalized natural language processing models tailored to provide users with appropriate advice based on their queries and based on their individual state of mental health

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    **License: [CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication](https://creativec...

  2. NLP for German News Articles

    • kaggle.com
    zip
    Updated Oct 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Chauhan (2022). NLP for German News Articles [Dataset]. https://www.kaggle.com/datasets/whenamancodes/nlp-for-10k-german-news-articles
    Explore at:
    zip(128989980 bytes)Available download formats
    Dataset updated
    Oct 1, 2022
    Authors
    Aman Chauhan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    :::: Ten Thousand German News Articles Dataset ::::

    A dataset for topic extraction from 10k German News Articles and NLP for German language. English text classification datasets are common. Examples are the big AG News, the class-rich 20 Newsgroups and the large-scale DBpedia ontology datasets for topic classification and for example the commonly used IMDb and Yelp datasets for sentiment analysis. Non-english datasets, especially German datasets, are less common. There is a collection of sentiment analysis datasets assembled by the Interest Group on German Sentiment Analysis. To my knowledge the MLDoc contains German documents for classification. Due to grammatical differences between the English and the German language, a classifier might be effective on a English dataset, but not as effective on a German dataset. The German language has a higher inflection and long compound words are quite common compared to the English language. One would need to evaluate a classifier on multiple German datasets to get a sense of it’s effectiveness.

    :::: What It Cointains ::::

    The 10kGNAD dataset is intended to solve part of this problem as the first german topic classification dataset. It consists of 10273 german language news articles from an austrian online newspaper categorized into nine topics. These articles are a till now unused part of the One Million Posts Corpus. In the One Million Posts Corpus each article has a topic path. For example Newsroom/Wirtschaft/Wirtschaftpolitik/Finanzmaerkte/Griechenlandkrise. The 10kGNAD uses the second part of the topic path, here Wirtschaft, as class label. The article titles and texts are concatenated into one text and the authors are removed to avoid a keyword like classification on autors frequent in a class. I created and used this dataset in my thesis to train and evaluate four text classifiers on the German language. By publishing the dataset I hope to support the advancement of tools and models for the German language. Additionally this dataset can be used as a benchmark dataset for german topic classification.

    Citations:

    @InProceedings{Schabus2017, Author = {Dietmar Schabus and Marcin Skowron and Martin Trapp}, Title = {One Million Posts: A Data Set of German Online Discussions}, Booktitle = {Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)}, Pages = {1241--1244}, Year = {2017}, Address = {Tokyo, Japan}, Doi = {10.1145/3077136.3080711}, Month = aug } @InProceedings{Schabus2018, author = {Dietmar Schabus and Marcin Skowron}, title = {Academic-Industrial Perspective on the Development and Deployment of a Moderation System for a Newspaper Website}, booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC)}, year = {2018}, address = {Miyazaki, Japan}, month = may, pages = {1602-1605}, abstract = {This paper describes an approach and our experiences from the development, deployment and usability testing of a Natural Language Processing (NLP) and Information Retrieval system that supports the moderation of user comments on a large newspaper website. We highlight some of the differences between industry-oriented and academic research settings and their influence on the decisions made in the data collection and annotation processes, selection of document representation and machine learning methods. We report on classification results, where the problems to solve and the data to work with come from a commercial enterprise. In this context typical for NLP research, we discuss relevant industrial aspects. We believe that the challenges faced as well as the solutions proposed for addressing them can provide insights to others working in a similar setting.}, url = {http://www.lrec-conf.org/proceedings/lrec2018/summaries/8885.html}, }

    More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Hehe

  3. h

    Financial-NER-NLP

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph G Flowers, Financial-NER-NLP [Dataset]. https://huggingface.co/datasets/Josephgflowers/Financial-NER-NLP
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Joseph G Flowers
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Financial-NER-NLP Dataset Summary The Financial-NER-NLP Dataset is a derivative of the FiNER-139 dataset, which consists of 1.1 million sentences annotated with 139 XBRL tags. This new dataset transforms the original structured data into natural language prompts suitable for training language models. The dataset is designed to enhance models’ abilities in tasks such as named entity recognition (NER), summarization, and information extraction in the financial domain. The… See the full description on the dataset page: https://huggingface.co/datasets/Josephgflowers/Financial-NER-NLP.

  4. H

    Healthcare Natural Language Processing (NLP) Market Insights – Trends &...

    • futuremarketinsights.com
    html, pdf
    Updated Apr 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sabyasachi Ghosh (2025). Healthcare Natural Language Processing (NLP) Market Insights – Trends & Growth Forecast 2025 to 2035 [Dataset]. https://www.futuremarketinsights.com/reports/healthcare-natural-language-processing-market
    Explore at:
    html, pdfAvailable download formats
    Dataset updated
    Apr 4, 2025
    Authors
    Sabyasachi Ghosh
    License

    https://www.futuremarketinsights.com/privacy-policyhttps://www.futuremarketinsights.com/privacy-policy

    Time period covered
    2025 - 2035
    Area covered
    Worldwide
    Description

    The market is expected to hit USD 4,873.4 Million in 2025 and grow to USD 24,446.1 Million by 2035. It is set to grow at a rate of 17.5% in this time. The rise of tele-health, growth of AI medical chatbots, and use of NLP in electronic health records (EHRs) shape the industry's future. Also, increased rules on value-based care and use of cloud NLP options push market growth.

    MetricValue
    Market Size (2025E)USD 4,873.4 Million
    Market Value (2035F)USD 24,446.1 Million
    CAGR (2025 to 2035)17.5%

    Country-wise Insights

    CountryCAGR (2025 to 2035)
    USA17.8%
    CountryCAGR (2025 to 2035)
    UK17.2%
    CountryCAGR (2025 to 2035)
    European Union (EU)17.5%
    CountryCAGR (2025 to 2035)
    Japan17.6%
    CountryCAGR (2025 to 2035)
    South Korea17.9%

    Competitive Outlook

    Company NameEstimated Market Share (%)
    Microsoft (Nuance Communications)18-22%
    IBM Watson Health14-18%
    Amazon Web Services (AWS) HealthLake12-16%
    Google Cloud Healthcare API10-14%
    3M Health Information Systems6-10%
    Other Companies (combined)30-40%
  5. h

    bioinstruct

    • huggingface.co
    Updated Jul 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UMass BioNLP Lab (2024). bioinstruct [Dataset]. https://huggingface.co/datasets/bio-nlp-umass/bioinstruct
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2024
    Dataset authored and provided by
    UMass BioNLP Lab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for BioInstruct

    GitHub repo: https://github.com/bio-nlp/BioInstruct

      Dataset Summary
    

    BioInstruct is a dataset of 25k instructions and demonstrations generated by OpenAI's GPT-4 engine in July 2023. This instruction data can be used to conduct instruction-tuning for language models (e.g. Llama) and make the language model follow biomedical instruction better. Improvements of Llama on 9 common BioMedical tasks are shown in the result section. Taking… See the full description on the dataset page: https://huggingface.co/datasets/bio-nlp-umass/bioinstruct.

  6. High-Quality Financial News Dataset for NLP Tasks

    • kaggle.com
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sayel Abualigah (2025). High-Quality Financial News Dataset for NLP Tasks [Dataset]. https://www.kaggle.com/datasets/sayelabualigah/high-quality-financial-news-dataset-for-nlp-tasks
    Explore at:
    zip(1566953 bytes)Available download formats
    Dataset updated
    Nov 21, 2025
    Authors
    Sayel Abualigah
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    High-Quality Financial News Dataset

    Description

    This repository contains a meticulously scraped dataset from various financial websites. The data extraction process ensures high-quality and accurate text, including content from both the websites and their embedded PDFs.

    Dataset Features

    • Date: The date of the announcement.
    • Subject: The subject of the financial news.
    • Content: The full content of the announcement, including text from the website and PDFs.

    Additional Processed Fields

    We applied the advanced Mixtral 7X8 model to generate the following additional fields:

    • ParaphrasedSubject: A paraphrased version of the original subject.
    • CompactedSummary: A concise summary limited to 1.5 lines.
    • DetailedSummary: A detailed summary of the content.
    • Impact: The impact of the announcement, summarized in 2 lines.

    Methodology

    The prompt used to generate the additional fields was highly effective, thanks to extensive discussions and collaboration with the Mistral AI team. This ensures that the dataset provides valuable insights and is ready for further analysis and model training.

    Usage

    This dataset can be used for various applications, including but not limited to:

    • Financial news analysis
    • Abstractive/Exctractive Summarization tasks
    • Machine learning model training
    • Natural language processing tasks
  7. h

    feedbackQA

    • huggingface.co
    Updated Aug 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    McGill NLP Group (2022). feedbackQA [Dataset]. https://huggingface.co/datasets/McGill-NLP/feedbackQA
    Explore at:
    Dataset updated
    Aug 27, 2022
    Dataset authored and provided by
    McGill NLP Group
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    FeedbackQA is a retrieval-based QA dataset that contains interactive feedback from users. It has two parts: the first part contains a conventional RQA dataset, whilst this repo contains the second part, which contains feedback(ratings and natural language explanations) for QA pairs.

  8. NLP Research Papers Dataset

    • kaggle.com
    zip
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subham Surana (2024). NLP Research Papers Dataset [Dataset]. https://www.kaggle.com/datasets/subhamjain/natural-language-processing-research-papers
    Explore at:
    zip(1074694 bytes)Available download formats
    Dataset updated
    May 1, 2024
    Authors
    Subham Surana
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    The dataset appears to be a collection of NLP research papers, with the full text available in the "article" column, abstract summaries in the "abstract" column, and information about different sections in the "section_names" column. Researchers and practitioners in the field of natural language processing can use this dataset for various tasks, including text summarization, document classification, and analysis of research paper structures.

    Data Fields

    Here's a short description of the Natural Language Processing Research Papers dataset: 1. Article: This column likely contains the full text or content of the research papers related to Natural Language Processing (NLP). Each entry in this column represents the entire body of a specific research article. 2. Abstract: This column is likely to contain the abstracts of the NLP research papers. The abstract provides a concise summary of the paper, highlighting its key objectives, methods, and findings. 3. Section Names: This column probably contains information about the section headings within each research paper. It could include the names or titles of different sections such as Introduction, Methodology, Results, Conclusion, etc. This information can be useful for structuring and organizing the content of the research papers.

    File Description

    Content Overview: The dataset is valuable for researchers, students, and practitioners in the field of Natural Language Processing. File format: This file is csv format.

  9. r

    Natural Language Processing (NLP) in Healthcare and Life Sciences Market

    • rootsanalysis.com
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roots Analysis (2025). Natural Language Processing (NLP) in Healthcare and Life Sciences Market [Dataset]. https://www.rootsanalysis.com/reports/nlp-in-healthcare-and-life-sciences-market.html
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    Roots Analysis
    License

    https://www.rootsanalysis.com/privacy.htmlhttps://www.rootsanalysis.com/privacy.html

    Description

    Natural language processing (NLP) in healthcare and life sciences market is estimated to grow from USD 3.99 bn in 2025 to USD 20.04 bn by 2035, at a CAGR of 17.5%

  10. Natural Language Processing (NLP): Global Market Analysis and Insights

    • bccresearch.com
    html, pdf, xlsx
    Updated Jul 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BCC Research (2023). Natural Language Processing (NLP): Global Market Analysis and Insights [Dataset]. https://www.bccresearch.com/market-research/information-technology/natural-language-processing-market.html
    Explore at:
    xlsx, pdf, htmlAvailable download formats
    Dataset updated
    Jul 6, 2023
    Dataset authored and provided by
    BCC Research
    License

    https://www.bccresearch.com/aboutus/terms-conditionshttps://www.bccresearch.com/aboutus/terms-conditions

    Description

    BCC Research Market Report says global natural language processing market should reach $92.7 billion by 2028 from $29.1 billion in 2023 at a compound annual growth rate of 26.1%.

  11. Z

    Natural Language Processing (NLP) Market By Component (Solution, Services),...

    • zionmarketresearch.com
    pdf
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zion Market Research (2025). Natural Language Processing (NLP) Market By Component (Solution, Services), By Deployment (Cloud, On-Premises), By Enterprise Size (Large Enterprises, Small & Medium Enterprises), By Type (Statistical NLP, Rule Based NLP, Hybrid NLP), By Application (Sentiment Analysis, Data Extraction, Risk And Threat Detection, Automatic Summarization, Content Management, Language Scoring, Others (Portfolio Monitoring, HR & Recruiting, And Branding & Advertising)), By End-use (BFSI, IT & Telecommunication, Healthcare, Education, Media & Entertainment, Retail & E-commerce, Others), and By Region: Global and Regional Industry Overview, Market Intelligence, Comprehensive Analysis, Historical Data, and Forecasts 2025 - 2034 [Dataset]. https://www.zionmarketresearch.com/report/natural-language-processing-market
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 14, 2025
    Dataset authored and provided by
    Zion Market Research
    License

    https://www.zionmarketresearch.com/privacy-policyhttps://www.zionmarketresearch.com/privacy-policy

    Time period covered
    2022 - 2030
    Area covered
    Global
    Description

    Global natural language processing (NLP) market worth at USD 25.90 Billion in 2024, is expected to surpass USD 206.32 Billion by 2034, with a CAGR of 23.06%.

  12. Growth of the NLP market worldwide 2021-2031

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Growth of the NLP market worldwide 2021-2031 [Dataset]. https://www.statista.com/forecasts/1449874/world-nlp-market-size-growth
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In 2024, the market size change in the 'Natural Language Processing' segment of the artificial intelligence market worldwide was modeled to amount to ***** percent. Between 2021 and 2024, the market size change dropped by ***** percentage points. The market size change is forecast to decline by ***** percentage points from 2024 to 2031, fluctuating as it trends downward.Further information about the methodology, more market segments, and metrics can be found on the dedicated Market Insights page on Natural Language Processing.

  13. N

    Natural Language Processing Solution Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jun 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Natural Language Processing Solution Report [Dataset]. https://www.datainsightsmarket.com/reports/natural-language-processing-solution-1943950
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Jun 3, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Natural Language Processing (NLP) solutions market is experiencing robust growth, driven by the increasing adoption of AI-powered applications across various sectors. The market's expansion is fueled by the rising volume of unstructured data, the need for efficient data analysis and automation, and the growing demand for personalized customer experiences. Technological advancements, such as deep learning and improved algorithms, are enhancing NLP capabilities, enabling more accurate language understanding and generation. Key applications include chatbots, virtual assistants, sentiment analysis, machine translation, and text summarization. While market size data is not explicitly provided, based on the presence of major players like IBM, Google, and Microsoft, and considering the rapid growth of AI, we can estimate the 2025 market size to be around $15 billion. Assuming a conservative CAGR (Compound Annual Growth Rate) of 20% (a reasonable estimate given the current market dynamics), the market is projected to reach approximately $40 billion by 2033. The market is segmented across various industries, including healthcare, finance, retail, and customer service. Healthcare's adoption of NLP for medical record analysis and patient engagement is a significant growth driver. Financial institutions leverage NLP for fraud detection, risk management, and regulatory compliance. Retail businesses utilize NLP for personalized marketing and customer service automation. While there are restraining factors such as data privacy concerns and the need for high-quality training data, the overall market outlook remains positive. The competitive landscape is characterized by both large technology companies and specialized NLP solution providers, fostering innovation and competition. This leads to continuous improvement in accuracy, efficiency, and the affordability of NLP solutions, further accelerating market growth. The forecast period of 2025-2033 offers substantial opportunities for businesses to capitalize on this rapidly evolving technology.

  14. B5text dataset - Textual data for 5 class sentiment classification of...

    • figshare.com
    txt
    Updated Jun 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmud Hasan (2021). B5text dataset - Textual data for 5 class sentiment classification of manufacturing parts. [Dataset]. http://doi.org/10.6084/m9.figshare.14887932.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 30, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Mahmud Hasan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    processed and lemmatised manufacturing text data relevant to 5 classes of parts: bearings, collet, sprocket, bolt, spring webscraped from different web based platforms like mcmaster carr, traceparts etc.

  15. Portuguese Language Datasets | 300K Translations | Natural Language...

    • datarade.ai
    .json, .xml
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxford Languages (2025). Portuguese Language Datasets | 300K Translations | Natural Language Processing (NLP) Data | Dictionary Display | Translation | EU & LATAM Coverage [Dataset]. https://datarade.ai/data-products/portuguese-language-datasets-140k-words-300k-translations-oxford-languages
    Explore at:
    .json, .xmlAvailable download formats
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Oxford Languageshttps://lexico.com/es
    Area covered
    Macao, Mozambique, Cabo Verde, Brazil, Timor-Leste, Portugal, Angola, Guinea-Bissau, Sao Tome and Principe
    Description

    Comprehensive Portuguese language datasets with linguistic annotations, including headwords, definitions, word senses, usage examples, part-of-speech (POS) tags, semantic metadata, and contextual usage details. Perfect for powering dictionary platforms, NLP, AI models, and translation systems.

    Our Portuguese language datasets are carefully compiled and annotated by language and linguistic experts. The below datasets in Portuguese are available for license:

    1. Portuguese Monolingual Dictionary Data
    2. Portuguese Bilingual Dictionary Data

    Key Features (approximate numbers):

    1. Portuguese Monolingual Dictionary Data

    Our Portuguese monolingual covers both EU and LATAM varieties, featuring clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Portuguese language.

    • Words:143,600
    • Senses: 285,500
    • Example sentences: 69,300
    • Format: XML format
    • Delivery: Email (link-based file sharing)
    1. Portuguese Bilingual Dictionary Data

    The bilingual data provides translations in both directions, from English to Portuguese and from Portuguese to English. It is annually reviewed and updated by our in-house team of language experts. Offers comprehensive coverage of the language, providing a substantial volume of translated words of excellent quality that span both EU and LATAM Portuguese varieties.

    • Translations: 300,000
    • Senses: 158,000
    • Example translations: 117,800
    • Format: XML and JSON format
    • Delivery: Email (link-based file sharing) and REST API
    • Updated frequency: annually

    Use Cases:

    We consistently work with our clients on new use cases as language technology continues to evolve. These include Natural Language Processing (NLP) applications, TTS, dictionary display tools, games, translations, word embedding, and word sense disambiguation (WSD).

    If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation.

    Pricing:

    Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.

    Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals.

    About the sample:

    The samples offer a brief overview of one or two language datasets (monolingual or/and bilingual dictionary data). To help you explore the structure and features of our dataset, we provide a sample in CSV format for preview purposes only.

    If you need the complete original sample or more details about any dataset, please contact us (Growth.OL@oup.com) to request access or further information

  16. h

    ov-kit-files

    • huggingface.co
    Updated Apr 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PE-NLP (2024). ov-kit-files [Dataset]. https://huggingface.co/datasets/pe-nlp/ov-kit-files
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 24, 2024
    Dataset authored and provided by
    PE-NLP
    Description

    pe-nlp/ov-kit-files dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. N

    Natural Language Processing Technology Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Natural Language Processing Technology Report [Dataset]. https://www.archivemarketresearch.com/reports/natural-language-processing-technology-58326
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Mar 15, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Natural Language Processing (NLP) technology market is experiencing robust growth, projected to reach $2271.9 million in 2025, exhibiting a Compound Annual Growth Rate (CAGR) of 2.4% from 2019 to 2033. This growth is fueled by several key drivers. The increasing adoption of AI-powered solutions across diverse industries, including healthcare, finance, and customer service, is significantly boosting demand for NLP capabilities. Advancements in deep learning and machine learning algorithms are leading to more accurate and efficient NLP systems, further fueling market expansion. The growing availability of large, high-quality datasets for training NLP models is also a significant factor. Furthermore, the rising need for automated customer service and improved data analysis is driving the integration of NLP technologies into various business processes, generating significant market opportunities. The market is segmented into Natural Language Understanding (NLU) and Natural Language Generation (NLG), with applications spanning text retrieval, machine translation, and information extraction. Major players such as Google, Amazon Web Services, IBM, and Microsoft are actively investing in research and development, leading to continuous innovation and enhancing the market's overall competitiveness. While the market exhibits considerable growth potential, certain challenges remain. The complexity of natural language and the inherent ambiguity in human communication pose significant technical hurdles. Data privacy concerns and the ethical implications of using NLP technologies require careful consideration. Furthermore, the high cost of developing and implementing advanced NLP solutions can limit adoption, particularly among smaller businesses. Despite these challenges, the long-term outlook for the NLP market remains positive, driven by continuous technological advancements and the increasing reliance on data-driven decision-making across industries. The market's segmentation by application and region provides valuable insights for strategic planning and investment decisions. North America currently holds a significant market share, but the Asia-Pacific region is expected to demonstrate substantial growth in the coming years.

  18. g

    Data from: HoVer: A Dataset for Many-Hop Fact Extraction And Claim...

    • hover-nlp.github.io
    • hotpotqa.github.io
    • +1more
    json
    Updated Oct 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of North Carolina at Chapel Hill (2020). HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification [Dataset]. https://hover-nlp.github.io/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 13, 2020
    Dataset authored and provided by
    University of North Carolina at Chapel Hill
    Description

    HotpotQA is a question answering dataset featuring natural, multi-hop questions, with strong supervision for supporting facts to enable more explainable question answering systems built based on Wikipedia.

  19. T

    Natural Language Processing (NLP) Market Analysis by Technology, Type,...

    • futuremarketinsights.com
    html, pdf
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sudip Saha (2025). Natural Language Processing (NLP) Market Analysis by Technology, Type, Service, Deployment Model, Application, Vertical, and Region Through 2035 [Dataset]. https://www.futuremarketinsights.com/reports/natural-language-processing-nlp-market
    Explore at:
    html, pdfAvailable download formats
    Dataset updated
    Mar 26, 2025
    Authors
    Sudip Saha
    License

    https://www.futuremarketinsights.com/privacy-policyhttps://www.futuremarketinsights.com/privacy-policy

    Time period covered
    2025 - 2035
    Area covered
    Worldwide
    Description

    The Natural Language Processing (NLP) market will grow exponentially between 2025 and 2035, fueled by the growing adoption of AI-driven conversational systems, machine learning-enabled text analytics, and improvements in speech recognition technology. The industry is projected to reach USD 26.01 billion in 2025 and expand to USD 213.54 billion by 2035, reflecting a compound annual growth rate (CAGR) of 23.4% during the forecast period.

    Contract & Deals Analysis - Natural Language Processing Market

    CompanyContract Value (USD Million)
    Google CloudApproximately USD 80 - 90
    MicrosoftApproximately USD 70 - 80
    IBM WatsonApproximately USD 60 - 70
    OpenAIApproximately USD 90 - 100
    Nuance CommunicationsApproximately USD 50 - 60

    Country-Wise Analysis

    CountryCAGR (2025 to 2035)
    The USA12.5%
    The UK12.1%
    European Union (EU)12.3%
    Japan11.9%
    South Korea12.7%

    Competitive Outlook

    Company NameEstimated Market Share (%)
    Google AI (Alphabet)20-25%
    Microsoft Corporation15-20%
    IBM Watson12-16%
    Amazon Web Services (AWS)10-14%
    OpenAI6-10%
    Other Companies (combined)20-30%
  20. Trojan Detection Software Challenge -...

    • catalog.data.gov
    • nist.gov
    • +2more
    Updated Sep 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2023). Trojan Detection Software Challenge - nlp-sentiment-classification-apr2021-test [Dataset]. https://catalog.data.gov/dataset/trojan-detection-software-challenge-round-6-test-dataset
    Explore at:
    Dataset updated
    Sep 30, 2023
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    Round 6 Test DatasetThis is the test data used to construct and evaluate trojan detection software solutions. This data, generated at NIST, consists of natural language processing (NLP) AIs trained to perform text sentiment classification on English text. A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 480 sentiment classification AI models using a small set of model architectures. The models were trained on text data drawn from product reviews. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the input when the trigger is present.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). NLP Mental Health Conversations [Dataset]. https://www.kaggle.com/datasets/thedevastator/nlp-mental-health-conversations
Organization logo

NLP Mental Health Conversations

Stimulating AI-Driven Mental Health Guidance

Explore at:
zip(1552188 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
The Devastator
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

NLP Mental Health Conversations

Stimulating AI-Driven Mental Health Guidance

By Huggingface Hub [source]

About this dataset

This dataset contains conversations between users and experienced psychologists related to mental health topics. Carefully collected and anonymized, the data can be used to further the development of Natural Language Processing (NLP) models which focus on providing mental health advice and guidance. It consists of a variety of questions which will help train NLP models to provide users with appropriate advice in response to their queries. Whether you're an AI developer interested in building the next wave of mental health applications or a therapist looking for insights into how technology is helping people connect; this dataset provides invaluable support for advancing our understanding of human relationships through Artificial Intelligence

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide will provide you with the necessary knowledge to effectively use this dataset for Natural Language Processing (NLP)-based applications.

  • Download and install the dataset: To begin using the dataset, download it from Kaggle onto your system. Once downloaded, unzip and extract the .csv file into a directory of your choice.

  • Familiarize yourself with the columns: Before working with the data, it’s important to familiarize yourself with all of its components. This dataset contains two columns - Context and Response - which are intentionally structured to produce conversations between users and psychologists related to mental health topics for NLP models dedicated to providing mental health advice and guidance.

  • Analyze data entries: If possible or desired, take time now to analyze what is included in each entry; this may help you better untangle any challenges that come up during subsequent processes yet won't be required for most steps going forward if you prefer not too jump ahead of yourself at this juncture of your work process just yet! Examine questions asked by users as well as answers provided by experts in order glean an overall picture of what types of conversations are taking place within this pool of data that can help guide further work on NLP models for AI-driven mental health guidance purposes later on down the road!

  • Cleanse any information not applicable to NLP decisioning relevant application goals: It's important that only meaningful items related towards achieving AI-driven results remain within a clean copy of this Dataset going forward; consider removing all extra many verbatim entries or other pieces uneeded while also otherwise making sure all included content adheres closely enough one particular decisions purpose expected from an end goal perspective before proceeding onwards now until an ultimate end result has been successfully achieved eventually afterwards later on next afterward soon afterwards too following conveniently satisfyingly after accordingly shortly near therefore meaningfully likewise conclusively thoroughly properly productively purposely then eventually effectively finally indeed desirably plus concludingly enjoyably popularly splendidly attractively satisfactorally propitiously outstandingly fluently promisingly opportunely in conclusion efficiently hopefully progressively breathtaking deliciousness ideally genius mayhem invented unique impossibility everlastingly intense qualitative cohesiveness behaviorally affectionately fixed voraciously like alive supportively choicest decisively luckily chaotically co-creatively introducing ageless intricacy voicing auspicious promise enterprisingly preferred mathematically godly happening humorous respective achieve ultra favorability fundamentals essentials speciality grandiose selectively perfectly

Research Ideas

  • Creating sentence-matching algorithms for natural language processing to accurately match given questions with appropriate advice and guidance.
  • Analyzing the psychological conversations to gain insights into topics such as stress, anxiety, and depression.
  • Developing personalized natural language processing models tailored to provide users with appropriate advice based on their queries and based on their individual state of mental health

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

**License: [CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication](https://creativec...

Search
Clear search
Close search
Google apps
Main menu