24 datasets found
  1. h

    Bitext-insurance-llm-chatbot-training-dataset

    • huggingface.co
    Updated Aug 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-insurance-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-insurance-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 24, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Insurance Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [insurance] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-insurance-llm-chatbot-training-dataset.

  2. Open Source Data Labeling Tool Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Open Source Data Labeling Tool Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/open-source-data-labeling-tool-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Oct 16, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Open Source Data Labeling Tool Market Outlook



    The open source data labeling tool market size was valued at USD 0.5 billion in 2023 and is projected to reach USD 2.5 billion by 2032, growing at a CAGR of 19% during the forecast period. This robust growth can be attributed to the increasing adoption of artificial intelligence (AI) and machine learning (ML) across various industries, which necessitates large volumes of accurately labeled data to train these algorithms effectively.



    One of the primary growth factors driving the market is the surging demand for AI and ML applications, which are rapidly being integrated into a variety of business processes. As companies strive to improve their operational efficiency, customer experience, and decision-making capabilities, the need for high-quality labeled data has become paramount. Open source data labeling tools offer a cost-effective and customizable solution for businesses, thus fueling market growth. Additionally, the development of advanced technologies such as natural language processing (NLP) and computer vision has further spurred the demand for robust data labeling tools.



    Another significant growth factor is the growing focus on data privacy and security, which has led many organizations to adopt on-premises data labeling tools. While cloud-based solutions offer scalability and ease of use, on-premises tools provide enhanced control over sensitive data, making them an attractive option for industries with stringent regulatory requirements, such as healthcare and BFSI (Banking, Financial Services, and Insurance). The availability of open source alternatives allows businesses to customize and optimize these tools to meet their specific needs, thereby driving market expansion.



    The increasing support from governments and regulatory bodies for AI and ML initiatives is also contributing to market growth. Governments worldwide are investing in AI research and development, recognizing its potential to drive economic growth and innovation. This support includes funding for AI projects, creating AI-friendly policies, and fostering collaborations between public and private sectors. These initiatives are expected to propel the adoption of data labeling tools, including open source options, as they play a crucial role in the development and deployment of AI and ML systems.



    Regionally, North America is expected to dominate the open source data labeling tool market due to the high concentration of technology companies and early adoption of AI and ML technologies. The presence of leading AI research institutions and a robust startup ecosystem further solidify the region's market position. However, Asia Pacific is anticipated to witness the fastest growth during the forecast period, driven by increasing investments in AI and ML, a burgeoning technology sector, and supportive government policies. Europe, Latin America, and the Middle East & Africa regions are also expected to experience substantial growth, albeit at a slower pace compared to North America and Asia Pacific.



    Component Analysis



    The open source data labeling tool market can be segmented by component into software and services. The software segment is expected to hold the largest market share, driven by the increasing adoption of AI and ML applications across various industries. Open source data labeling software provides a cost-effective solution for businesses, allowing them to customize and optimize the tools to meet their specific needs. The availability of a wide range of open source data labeling software options, such as LabelImg, CVAT, and Labelbox, has made it easier for organizations to find the right tool for their requirements. Additionally, the continuous development and improvement of these tools by the open source community ensure that they remain up-to-date with the latest advancements in AI and ML technologies.



    The services segment, on the other hand, is expected to witness significant growth during the forecast period. As more companies adopt open source data labeling tools, the demand for related services, such as consulting, implementation, and training, is increasing. These services help organizations effectively deploy and utilize data labeling tools, ensuring that they achieve the desired results. Furthermore, the growing complexity of AI and ML projects necessitates specialized expertise, driving the demand for professional services. Companies offering open source data labeling tools are increasingly providing a range of value-added services to help their clients maximize the benefits of their solutions.



  3. D

    Data Collection and Labelling Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Data Collection and Labelling Report [Dataset]. https://www.archivemarketresearch.com/reports/data-collection-and-labelling-562772
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global data collection and labeling market is experiencing robust growth, driven by the escalating demand for high-quality training data to fuel the advancements in artificial intelligence (AI) and machine learning (ML). This market, estimated at $15 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching an impressive $70 billion by 2033. This significant expansion is fueled by several key factors. The increasing adoption of AI across diverse sectors, including IT, automotive, BFSI (Banking, Financial Services, and Insurance), healthcare, and retail and e-commerce, is a primary driver. Furthermore, the growing complexity of AI models necessitates larger and more diverse datasets, thereby increasing the demand for professional data labeling services. The emergence of innovative data annotation tools and techniques further contributes to market growth. However, challenges remain, including the high cost of data collection and labeling, data privacy concerns, and the need for skilled professionals capable of handling diverse data types. The market segmentation highlights the significant contributions from various sectors. The IT sector leads in adoption, followed closely by the automotive and BFSI sectors. Healthcare and retail/e-commerce are also exhibiting rapid growth due to the increasing reliance on AI-powered solutions for improved diagnostics, personalized medicine, and enhanced customer experiences. Geographically, North America currently holds a substantial market share, followed by Europe and Asia Pacific. However, the Asia Pacific region is poised for the fastest growth due to its large and rapidly developing digital economy and increasing government initiatives promoting AI adoption. Key players like Reality AI, Scale AI, and Labelbox are shaping the market landscape through continuous innovation and strategic acquisitions. The market's future trajectory will be significantly influenced by advancements in automation technologies, improvements in data annotation methodologies, and the growing awareness of the importance of high-quality data for successful AI deployments.

  4. M

    Manual Data Annotation Tools Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Manual Data Annotation Tools Report [Dataset]. https://www.marketresearchforecast.com/reports/manual-data-annotation-tools-33619
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Mar 14, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The manual data annotation tools market, valued at $949.7 million in 2025, is experiencing robust growth, projected to expand at a compound annual growth rate (CAGR) of 13.6% from 2025 to 2033. This surge is driven by the escalating demand for high-quality training data across diverse sectors. The increasing adoption of artificial intelligence (AI) and machine learning (ML) models necessitates large volumes of meticulously annotated data for optimal performance. Industries like IT & Telecom, BFSI (Banking, Financial Services, and Insurance), Healthcare, and Automotive are leading the charge, investing significantly in data annotation to improve their AI-powered applications, from fraud detection and medical image analysis to autonomous vehicle development and personalized customer experiences. The market is segmented by data type (image, video, text, audio) and application sector, reflecting the diverse needs of various industries. The rise of cloud-based annotation platforms is streamlining workflows and enhancing accessibility, while the increasing complexity of AI models is pushing the demand for more sophisticated and specialized annotation techniques. The competitive landscape is characterized by a mix of established players and emerging startups. Companies like Appen, Amazon Web Services, Google, and IBM are leveraging their extensive resources and technological capabilities to dominate the market. However, smaller, specialized companies are also making significant strides, catering to niche needs and offering innovative solutions. Geographic expansion is another key trend, with North America currently holding a substantial market share due to its advanced technology adoption and significant investments in AI research. However, Asia-Pacific, especially India and China, is witnessing rapid growth fueled by expanding digitalization and increasing government initiatives promoting AI development. Despite the rapid growth, challenges remain, including the high cost and time-consuming nature of manual annotation, alongside concerns around data privacy and security. The market's future trajectory will depend on technological advancements, evolving industry needs, and the effective addressal of these challenges.

  5. a

    ai training dataset Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). ai training dataset Report [Dataset]. https://www.datainsightsmarket.com/reports/ai-training-dataset-1502524
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    May 10, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    CA
    Variables measured
    Market Size
    Description

    The AI training dataset market is experiencing robust growth, driven by the increasing adoption of artificial intelligence across diverse sectors. The market's expansion is fueled by the need for high-quality, labeled data to train sophisticated AI models capable of handling complex tasks. Applications span various industries, including IT, automotive, healthcare, BFSI (Banking, Financial Services, and Insurance), and retail & e-commerce. The demand for diverse data types—text, image/video, and audio—further fuels market expansion. While precise market sizing is unavailable, considering the rapid growth of AI and the significant investment in data annotation services, a reasonable estimate places the 2025 market value at approximately $15 billion, with a compound annual growth rate (CAGR) of 25% projected through 2033. This growth reflects a rising awareness of the pivotal role high-quality datasets play in achieving accurate and reliable AI outcomes. Key restraining factors include the high cost of data acquisition and annotation, along with concerns around data privacy and security. However, these challenges are being addressed through advancements in automation and the emergence of innovative data synthesis techniques. The competitive landscape is characterized by a mix of established technology giants like Google, Amazon, and Microsoft, alongside specialized data annotation companies like Appen and Lionbridge. The market is expected to see continued consolidation as larger players acquire smaller firms to expand their data offerings and strengthen their market position. Regional variations exist, with North America and Europe currently dominating the market share, although regions like Asia-Pacific are projected to experience significant growth due to increasing AI adoption and investments.

  6. t

    Data Collection And Labeling Global Market Report 2025

    • thebusinessresearchcompany.com
    pdf,excel,csv,ppt
    Updated Jan 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Business Research Company (2025). Data Collection And Labeling Global Market Report 2025 [Dataset]. https://www.thebusinessresearchcompany.com/report/data-collection-and-labeling-global-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jan 8, 2025
    Dataset authored and provided by
    The Business Research Company
    License

    https://www.thebusinessresearchcompany.com/privacy-policyhttps://www.thebusinessresearchcompany.com/privacy-policy

    Description

    Global Data Collection And Labeling market size is expected to reach $12.08 billion by 2029 at 28.4%, autonomous vehicle surge fueling growth in data collection and labeling market

  7. f

    table1_Off-Label Use of Antineoplastic Drugs to Treat Malignancies: Evidence...

    • frontiersin.figshare.com
    docx
    Updated Jun 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guoxu Wei; Min Wu; He Zhu; Sheng Han; Jing Chen; Chenchen Zhai; Luwen Shi (2023). table1_Off-Label Use of Antineoplastic Drugs to Treat Malignancies: Evidence From China Based on a Nationwide Medical Insurance Data Analysis.docx [Dataset]. http://doi.org/10.3389/fphar.2021.616453.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    Frontiers
    Authors
    Guoxu Wei; Min Wu; He Zhu; Sheng Han; Jing Chen; Chenchen Zhai; Luwen Shi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Purpose: Cancer is a leading cause of morbidity and mortality worldwide. Off-label (OL) use of antineoplastic drugs to treat malignancies is prevalent. In this study, we quantified and characterized OL use of antineoplastic drugs to treat malignancies in China.Methods: This was a retrospective study using nationwide data collected from 2008 to 2010. Use of antineoplastic drugs was considered OL if they were used for indications not reflected in the package insert published by the National Medical Products Administration at the time of prescription. Descriptive analysis and Spearman rank correlation were used to evaluate the frequency and pattern of OL drug use.Results: In total, 51,382 patients with malignancies, 24 categories of antineoplastic drugs, and 77 types of malignancies treated with OL drugs were included in this study. Twenty commonly used antineoplastic drugs (ICD encoded as L01) were used OL in 10–61% of cases, and four commonly used endocrine therapy antineoplastic drugs (ICD encoded as L02) were used OL in 10–19% of cases. There was a significant negative correlation between the disease constituent ratio and the average OL use rate of antineoplastic drugs for various malignancies. In contrast, there was a significant positive correlation between the average OL use rate of antineoplastic drugs and the number of malignancies treated with OL drugs.Conclusion: This study provided information regarding OL use of antineoplastic drugs for treatment of malignancies, and showed that OL use was prevalent. In addition, uncommon malignancies were more likely to be treated with OL antineoplastic drugs. Furthermore, more commonly used antineoplastic drugs were more likely to be used OL.

  8. d

    Data from: Can plan recommendations improve the coverage decisions of...

    • datadryad.org
    • data.niaid.nih.gov
    • +3more
    zip
    Updated Mar 10, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew J. Barnes; Yaniv Hanoch; Thomas Rice (2017). Can plan recommendations improve the coverage decisions of vulnerable populations in health insurance marketplaces? [Dataset]. http://doi.org/10.5061/dryad.vq2s1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 10, 2017
    Dataset provided by
    Dryad
    Authors
    Andrew J. Barnes; Yaniv Hanoch; Thomas Rice
    Time period covered
    2017
    Description

    plan_reccomendations_PONE_D_15_42499R1_no_demographicsThis is a Stata data file (version 12). The data are labeled. At the request of Dryad and to limit the potential for indirect identification of study participants, demographic variables on age, race/ethnicity, income, marital status, and education are omitted from this data set. However, the estimates of interest for this paper are consistent across model specifications that do and do not include these demographic variables.

  9. Data from: Diagnostic labelling.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Susann Hueber; Thomas Kuehlein; Roman Gerlach; Martin Tauscher; Angela Schedlbauer (2023). Diagnostic labelling. [Dataset]. http://doi.org/10.1371/journal.pone.0188521.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Susann Hueber; Thomas Kuehlein; Roman Gerlach; Martin Tauscher; Angela Schedlbauer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Proportion of a specific RTI diagnosis on the sum all RTI diagnoses per practice. Mean values over all practices are shown in column 2. Mean values separately for high prescribers and low prescribers are shown column 3 and 4 respectively. Results of inferential statistical analyses can be seen in column 5.

  10. g

    Health Reform Monitoring Survey, United States, Third Quarter 2018 -...

    • search.gesis.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inter-University Consortium for Political and Social Research, Health Reform Monitoring Survey, United States, Third Quarter 2018 - Archival Version [Dataset]. http://doi.org/10.3886/ICPSR37487
    Explore at:
    Dataset provided by
    Inter-University Consortium for Political and Social Research
    GESIS search
    License

    https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de738519https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de738519

    Area covered
    United States
    Description

    Abstract (en): In January 2013, the Urban Institute launched the Health Reform Monitoring Survey (HRMS), a survey of the nonelderly population, to explore the value of cutting-edge, Internet-based survey methods to monitor the Affordable Care Act (ACA) before data from federal government surveys are available. Topics covered by the 16th round of the survey (third quarter 2018) include self-reported health status, health insurance coverage, access to and use of health care, out-of-pocket health care costs, health care affordability, work experience, awareness of Medicaid work requirements, experiences with health care and social service providers, and health plan choice. Additional information collected by the survey includes age, gender, sexual orientation, marital status, education, race, Hispanic origin, United States citizenship, housing type, home ownership, internet access, income, employment status, and employer size. This study was conducted to provide information on health insurance coverage, access to and use of health care, health care affordability, and self-reported health status, as well as timely data on important implementation issues under the Affordable Care Act (ACA). The Health Reform Monitoring Survey (HRMS) provides data on health insurance coverage, access to and use of health care, health care affordability, and self-reported health status. Beginning in the second quarter of 2013, each round of the HRMS also contains topical questions focusing on timely ACA policy issues. In the first quarter of 2015, the HRMS shifted from a quarterly fielding schedule to a semiannual schedule. The variables include original survey questions, household demographic profile data, and constructed variables which can be used to link panel members who participated in multiple rounds. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Created variable labels and/or value labels.; Created online analysis version with question text.; Performed recodes and/or calculated derived variables.; Checked for undocumented or out-of-range codes.. Response Rates: The HRMS response rate is roughly five percent each round. Datasets:DS0: Study-Level FilesDS1: Public-Use DataDS2: Restricted-Use Data Household population aged 18-64 Smallest Geographic Unit: Census region For each HRMS round a stratified random sample of adults ages 18-64 is drawn from the KnowledgePanel, a probability-based, nationally represented Internet panel maintained by Ipsos. The approximately 55,000 adults in the panel include households with and without Internet access. Panel members are recruited from an address-based sample frame derived from the United States Postal Service Delivery Sequence File, which covers 97 percent of United States households. The HRMS sample includes a random sample of approximately 9,500 nonelderly adults per quarter, including oversamples of adults with family incomes at or below 138 percent of the federal poverty line. Additional funders have supported oversamples of adults from individual states or subgroups of interest. However, the data file only includes data for adults in the general national sample and the income oversample. web-based survey

  11. f

    Data from: An Efficient, Amine-Specific, and Cost-Effective Method for TMT...

    • figshare.com
    xlsx
    Updated Apr 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan Cai; Chenchen Chang; Qin Yang; Rijing Liao (2024). An Efficient, Amine-Specific, and Cost-Effective Method for TMT 6/11-plex Labeling Improves the Proteome Coverage, Quantitative Accuracy and Precision [Dataset]. http://doi.org/10.1021/acs.jproteome.4c00129.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 26, 2024
    Dataset provided by
    ACS Publications
    Authors
    Yan Cai; Chenchen Chang; Qin Yang; Rijing Liao
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Tandem mass tags (TMT) are widely used in proteomics to simultaneously quantify multiple samples in a single experiment. The tags can be easily added to the primary amines of peptides/proteins through chemical reactions. In addition to amines, TMT reagents also partially react with the hydroxyl groups of serine, threonine, and tyrosine residues under alkaline conditions, which significantly compromises the analytical sensitivity and precision. Under alkaline conditions, reducing the TMT molar excess can partially mitigate overlabeling of histidine-free peptides, but has a limited effect on peptides containing histidine and hydroxyl groups. Here, we present a method under acidic conditions to suppress overlabeling while efficiently labeling amines, using only one-fifth of the TMT amount recommended by the manufacturer. In a deep-scale analysis of a yeast/human two-proteome sample, we systematically evaluated our method against the manufacturer’s method and a previously reported TMT-reduced method. Our method reduced overlabeled peptides by 9-fold and 6-fold, respectively, resulting in the substantial enhancement in peptide/protein identification rates. More importantly, the quantitative accuracy and precision were improved as overlabeling was reduced, endowing our method with greater statistical power to detect 42% and 12% more statistically significant yeast proteins compared to the standard and TMT-reduced methods, respectively. Mass spectrometric data have been deposited in the ProteomeXchange Consortium via the iProX partner repository with the data set identifier PXD047052.

  12. e

    Ultra-fast label-free quantification and comprehensive proteome coverage...

    • ebi.ac.uk
    Updated Jan 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ulises H Guzman (2024). Ultra-fast label-free quantification and comprehensive proteome coverage with narrow-window data-independent acquisition. Fractionation [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD046372
    Explore at:
    Dataset updated
    Jan 2, 2024
    Authors
    Ulises H Guzman
    Variables measured
    Proteomics
    Description

    Mass spectrometry (MS)-based proteomics aims to characterize comprehensive proteomes in a fast and reproducible manner. Here, we present an ultra-fast scanning data-independent acquisition (DIA) strategy consisting on 2-Th precursor isolation windows, dissolving the differences between data-dependent and independent methods. This is achieved by pairing a Quadrupole Orbitrap mass spectrometer with the asymmetric track lossless (Astral) analyzer that provides >200 Hz MS/MS scanning speed, high resolving power and sensitivity, as well as low ppm-mass accuracy. Narrowwindow DIA enables profiling of up to 100 full yeast proteomes per day, or ~10,000 human proteins in half-an-hour. Moreover, multi-shot acquisition of fractionated samples allows comprehensive coverage of human proteomes in ~3h, showing comparable depth to next-generation RNA sequencing and with 10x higher throughput compared to current state-of-the-art MS. High quantitative precision and accuracy is demonstrated with high peptide coverage in a 3-species proteome mixture, quantifying 14,000+ proteins in a single run in half-an-hour.

  13. o

    Global Employer Dataset (Wikidata)

    • opendatabay.com
    .undefined
    Updated Jul 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Global Employer Dataset (Wikidata) [Dataset]. https://www.opendatabay.com/data/ai-ml/e31ecab8-d78b-4108-89df-7ea2d5d3e09e
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    E-commerce & Online Transactions
    Description

    This dataset provides a curated and labeled subset of employer entries derived from Wikidata, with the goal of improving the quality and usability of employer data. While Wikidata is an invaluable open resource, direct use often necessitates cleaning. This dataset addresses that need by offering metadata, statistics, and labels to help users identify and utilise valid employer information. An employer is generally defined here as a company or entity that provides employment paying wages or a salary. The dataset specifically screens out entries that do not represent true employers, such as individuals or plurals. It is particularly useful for tasks involving data cleaning, entity recognition, and understanding employment nomenclature.

    Columns

    • item_id: The unique Wikidata item identifier (QCode without the 'Q' prefix).
    • employer_count: The number of Wikidata entries associated with this specific employer reference.
    • employer: The text label of the employer's name, sourced from Kensho's English labels.
    • description: The accompanying description of the Wikidata employer entry, also from Kensho.
    • in_google_news: A binary indicator (0 for no, 1 for yes) showing if the occupation exists within the GoogleNews embedding.
    • language_detected: A three-digit language code, identified using FastText language detection.
    • source: Indicates the origin of the information, such as Wikidata or Wikipedia.
    • label: A binary label (0 for invalid employer, 1 for valid employer) indicating the data's quality.
    • labeled_by: Specifies the method used for labeling, including human, classifier_gnew, classifier_bert, or cleanlab.
    • label_error_reason: Provides the specific reason if a label is deemed an error, such as 'domain' or 'plural'.

    Distribution

    This dataset is provided as a single CSV file, named employers.wikidata.all.labeled.csv. Its current version is 1.0, with a file size of approximately 5.98 MB. The dataset contains a substantial number of entries, with item_id having 60656 values, employer having 60456 values, and description having 60640 values.

    Usage

    This dataset is ideal for various applications, including: * Detecting new trends in employers, occupations, and employment terminology. * Automatic error correction of employer entries. * Converting plural forms of entities to singular forms. * Training Named Entity Recognition (NER) models to identify employer names. * Building Question/Answer models that can understand and respond to queries about employers. * Improving the accuracy of FastText language detection models. * Assessing FastText accuracy with limited data.

    Coverage

    The dataset's coverage is global, drawing data from a Wikidata dump dated 2 February 2020. It includes employer entries from various linguistic contexts, as indicated by the language_detected column, showcasing multilingual employer names and descriptions. The content primarily focuses on entities and organisations that meet the definition of an employer, rather than specific demographic groups.

    License

    CC BY-SA

    Who Can Use It

    This dataset is suitable for: * Data scientists and machine learning engineers working on natural language processing tasks. * Researchers interested in data quality, entity resolution, and knowledge graph analysis. * Developers building applications that require accurate employer information. * Anyone needing to clean and validate employer data for various analytical or operational purposes.

    Dataset Name Suggestions

    • Wikidata Labeled Employers
    • ML-Ready Wikidata Employer Data
    • Cleaned Wikidata Employer References
    • Global Employer Dataset (Wikidata)
    • Validated Employer Entities

    Attributes

    Original Data Source: ML-You-Can-Use Wikidata Employers labeled

  14. f

    Data from: Systematic Comparison of Label-Free, Metabolic Labeling, and...

    • figshare.com
    • acs.figshare.com
    application/cdfv2
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhou Li; Rachel M. Adams; Karuna Chourey; Gregory B. Hurst; Robert L. Hettich; Chongle Pan (2023). Systematic Comparison of Label-Free, Metabolic Labeling, and Isobaric Chemical Labeling for Quantitative Proteomics on LTQ Orbitrap Velos [Dataset]. http://doi.org/10.1021/pr200748h.s003
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    ACS Publications
    Authors
    Zhou Li; Rachel M. Adams; Karuna Chourey; Gregory B. Hurst; Robert L. Hettich; Chongle Pan
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A variety of quantitative proteomics methods have been developed, including label-free, metabolic labeling, and isobaric chemical labeling using iTRAQ or TMT. Here, these methods were compared in terms of the depth of proteome coverage, quantification accuracy, precision, and reproducibility using a high-performance hybrid mass spectrometer, LTQ Orbitrap Velos. Our results show that (1) the spectral counting method provides the deepest proteome coverage for identification, but its quantification performance is worse than labeling-based approaches, especially the quantification reproducibility; (2) metabolic labeling and isobaric chemical labeling are capable of accurate, precise, and reproducible quantification and provide deep proteome coverage for quantification; isobaric chemical labeling surpasses metabolic labeling in terms of quantification precision and reproducibility; and (3) iTRAQ and TMT perform similarly in all aspects compared in the current study using a CID-HCD dual scan configuration. On the basis of the unique advantages of each method, we provide guidance for selection of the appropriate method for a quantitative proteomics study.

  15. Uninsured Population Census Data 5-year estimates for release years...

    • data.pa.gov
    Updated Aug 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pennsylvania Department of Human Services (DHS) (2020). Uninsured Population Census Data 5-year estimates for release years 2017-Current County Human Services and Insurance [Dataset]. https://data.pa.gov/Health/Uninsured-Population-Census-Data-5-year-estimates-/neqb-cw4e
    Explore at:
    application/rssxml, application/rdfxml, csv, xml, tsv, application/geo+json, kmz, kmlAvailable download formats
    Dataset updated
    Aug 21, 2020
    Dataset provided by
    Pennsylvania Department of Human Serviceshttps://www.pa.gov/agencies/dhs.html
    Authors
    Pennsylvania Department of Human Services (DHS)
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The American Community Survey (ACS) helps local officials, community leaders, and businesses understand the changes taking place in their communities. It is the premier source for detailed population and housing information about our nation. This dataset provides estimates by county for Health Insurance Coverage and is summarized from summary table S2701: SELECTED CHARACTERISTICS OF HEALTH INSURANCE COVERAGE IN THE UNITED STATES. The 5-year estimates are used to provide detail on every county in Pennsylvania and includes breakouts by Age, Gender, Race, Ethnicity, Household Income, and the Ratio of Income to Poverty.

    An blank cell within the dataset indicates that either no sample observations or too few sample observations were available to compute the statistic for that area.

    Margin of error (MOE). Some ACS products provide an MOE instead of confidence intervals. An MOE is the difference between an estimate and its upper or lower confidence bounds. Confidence bounds can be created by adding the margin of error to the estimate (for the upper bound) and subtracting the margin of error from the estimate (for the lower bound). All published ACS margins of error are based on a 90-percent confidence level.

    While an ACS 1-year estimate includes information collected over a 12-month period, an ACS 5-year estimate includes data collected over a 60-month period. In the case of ACS 1-year estimates, the period is the calendar year (e.g., the 2015 ACS covers the period from January 2015 through December 2015).

    In the case of ACS multiyear estimates, the period is 5 calendar years (e.g., the 2011–2015 ACS estimates cover the period from January 2011 through December 2015). Therefore, ACS estimates based on data collected from 2011–2015 should not be labeled “2013,” even though that is the midpoint of the 5-year period.

    Multiyear estimates should be labeled to indicate clearly the full period of time (e.g., “The child poverty rate in 2011–2015 was X percent.”). They do not describe any specific day, month, or year within that time period.

  16. d

    Data from: Gridded 20-Year Parameterization of a Stochastic Weather...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Gridded 20-Year Parameterization of a Stochastic Weather Generator (CLIGEN) to Fill Gaps in Coverage South of the 40th Parallel [Dataset]. https://catalog.data.gov/dataset/gridded-20-year-parameterization-of-a-stochastic-weather-generator-cligen-to-fill-gaps-in--b16d1
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    CLImate GENerator (CLIGEN) is a stochastic weather generator that produces daily and sub-daily timeseries of weather variables. This gridded CLIGEN parameterization complements existing coverage for South America and Africa by adding new coverage for Central America, the Caribbean, the Middle East, South Asia, Southeast Asia, Australia, New Zealand, and various islands. This parameterization used the methodology and trained machine learning models discussed in a dataset article by Fullhart et al. (2022), https://doi.org/10.1080/20964471.2022.2136610. The primary dataset for South America and Africa may also be found in Ag Data Commons at https://doi.org/10.15482/USDA.ADC/1524754.The data are formatted as CLIGEN .par files, which are the only required input for CLIGEN. The files are contained in the "Grid Files" download with n=37105 files. The files are labeled according to grid point lat/lon coordinates (WGS84) in decimal degrees. The labeling convention uses 'N' and 'E' (north, east) to represent coordinates with a positive sign and 'S' and 'W' (south, west) to represent coordinates with a negative sign.Resources in this dataset:Resource Title: Grid Files.File Name: Grid Files.zipResource Description: CLIGEN input files (.par)Resource Title: Summary Table.File Name: SummaryTable.docxResource Description: Summary table that lists CLIGEN parameters and basic dataset characteristics of the gridded parameterization.Resource Title: Map Layer.File Name: Map Layer.kmzResource Description: Map layer showing point locations of the CLIGEN grid.

  17. Mawson Escarpment Geology GIS Dataset

    • data.aad.gov.au
    • researchdata.edu.au
    Updated Mar 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    THOST, DOUG; BAIN, JOHN (2019). Mawson Escarpment Geology GIS Dataset [Dataset]. http://doi.org/10.26179/5c7deb18226f9
    Explore at:
    Dataset updated
    Mar 5, 2019
    Dataset provided by
    Australian Antarctic Divisionhttps://www.antarctica.gov.au/
    Australian Antarctic Data Centre
    Authors
    THOST, DOUG; BAIN, JOHN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 10, 1998 - Jun 30, 1998
    Area covered
    Description

    There are several ArcInfo coverages described by this metadata record - FRAME, GEOL, MAPGRID, SITES, STRLINE and STRUC (in that order). Each coverage is described below. The data is also provided as shapefiles and ArcInfo interchange files. The data was used for the Mawson Escarpment Geology map published in 1998. This map is available from a URL provided in this metadata record.

    FRAME:

    The coverage FRAME contains (arcs) and (polygon, label) and forms the limits of the data sets or map coverage of the MAWSON ESCARPMENT area of the AUSTRALIAN ANTARCTIC TERRITORY.

    The purpose or intentions for this dataset is to form a cookie cutter for future data which may be aquired and require clipping to the map/data area.

    GEOL:

    The coverage GEOL is historical geological data covering the MAWSON ESCARPMENT area.

    The data were captured in ARC/INFO format and combined with geological outcrops that were accurately digitised over a March 1989 Landsat Thematic Mapper image at a scale of 1:100000. It is not recomended that this data be used beyond this scale.

    The coverage contains Arcs (lines) and polygons (polygon labels). These object are attributed as fully as possible in their .aat file for arcs and .pat for polygon labels and conform with the Geoscience Australia Geoscience Data Dictionary Version 98.04

    The purpose or intentions for the dataset is that it become part of a greater geological database of the Australian Antarctic Territory.

    (1998-04-10 - 1998-06-30)

    MAPGRID:

    MAPGRID is a graticule that was generated as a 5 minute by 5 minute grid mainly to allow for good location/registration of source materials for digitising and adding some locational anno.mapgrat

    This covers other function was to be used for a proof plot.

    (1998-04-22 - 1998-06-30)

    SITES:

    The purpose or intentions for this dataset is to provide the approximate location of this historic data on sample sites in the MAWSON ESCARPMENT region of the AUSTRALIAN ANTARCTIC TERRITORY, for future expansion or more accurate positioning when improved records of location are found.

    (1998-05-11 - 1998-06-30)

    STRLINE:

    This Structural lines for geology coverage is named (STRLINE).

    The purpose or intentions for the dataset is to have the linear structural features in their own coverage containing only structure which does not form polygon boundaries.

    (1998-05-28 - 1998-06-30)

    STRUC:

    This coverage called STRUC for structural measurements is a point coverage. It can be described as Mesoscopic structures at a site or outcrop.

    The purpose or intentions for the dataset is to provide all the known structural point data information in the one coverage.

    (1998-05-28 - 1998-06-30)

  18. 4

    Event Graph of BPI Challenge 2016

    • data.4tu.nl
    zip
    Updated Apr 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dirk Fahland; Stefan Esser (2021). Event Graph of BPI Challenge 2016 [Dataset]. http://doi.org/10.4121/14164220.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 22, 2021
    Dataset provided by
    4TU.ResearchData
    Authors
    Dirk Fahland; Stefan Esser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Business process event data modeled as labeled property graphs

    Data Format
    -----------

    The dataset comprises one labeled property graph in two different file formats.

    #1) Neo4j .dump format

    A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/

    /bin/neo4j-admin.(bat|sh) load --database=graph.db --from=

    The .dump was created with Neo4j v3.5.

    #2) .graphml format

    A .zip file containing a .graphml file of the entire graph


    Data Schema
    -----------

    The graph is a labeled property graph over business process event data. Each graph uses the following concepts

    :Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"

    :Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")

    :Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node

    :Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations

    :CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities

    :DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.

    :HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log

    :OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph

    :REL relationship - placeholder for any structural relationship between two :Entity nodes

    The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552


    Data Contents
    -------------

    neo4j-bpic16-2021-02-17 (.dump|.graphml.zip)

    An integrated graph describing the raw event data of the entire BPI Challenge 2016 dataset.
    Dees, Marcus; van Dongen, B.F. (Boudewijn) (2016): BPI Challenge 2016. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:360795c8-1dd6-4a5b-a443-185001076eab
    UWV (Employee Insurance Agency) is an autonomous administrative authority (ZBO) and is commissioned by the Ministry of Social Affairs and Employment (SZW) to implement employee insurances and provide labour market and data services in the Netherlands. The Dutch employee insurances are provided for via laws such as the WW (Unemployment Insurance Act), the WIA (Work and Income according to Labour Capacity Act, which contains the IVA (Full Invalidity Benefit Regulations), WGA (Return to Work (Partially Disabled) Regulations), the Wajong (Disablement Assistance Act for Handicapped Young Persons), the WAO (Invalidity Insurance Act), the WAZ (Self-employed Persons Disablement Benefits Act), the Wazo (Work and Care Act) and the Sickness Benefits Act. The data in this collection pertains to customer contacts over a period of 8 months and UWV is looking for insights into their customers' journeys. Data has been collected from several different sources, namely: 1) Clickdata from the site www.werk.nl collected from visitors that were not logged in, 2) Clickdata from the customer specific part of the site www.werk.nl (a link is made with the customer that logged in), 3) Werkmap Message data, showing when customers contacted the UWV through a digital channel, 4) Call data from the callcenter, showing when customers contacted the call center by phone, and 5) Complaint data showing when customers complained. All data is accompanied by data fields with anonymized information about the customer as well as data about the site visited or the contents of the call and/or complaint. The texts in the dataset are provided in both Dutch and English where applicable. URL's are included based on the structure of the site during the period the data has been collected. UWV is interested in insights on how their channels are being used, when customers move from one contact channel to the next and why and if there are clear customer profiles to be identified in the behavioral data. Furthermore, recommendations are sought on how to serve customers without the need to change the contact channel.
    The data contains the following entities and their events

    - Customer - customer of a Dutch public agency for handling unemployment benefits
    - Office_U - user or worker involved in an activity handling a customer interaction
    - Office_W - user or worker involved in an activity handling a customer interaction
    - Complaint - a complaint document handed in by a customer
    - ComplaintDossier - a collection of complaints by the same customer
    - Session - browser-session identifier of a user browsing the website of the agency
    - IP - IP address of a user browsing the website of the agency


    Data Size
    ---------

    BPIC16, nodes: 8109680, relationships: 86833139

  19. f

    Data from: Novel Insights into Quantitative Proteomics from an Innovative...

    • acs.figshare.com
    zip
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Sénécaut; Gelio Alves; Hendrik Weisser; Laurent Lignières; Samuel Terrier; Lilian Yang-Crosson; Pierre Poulain; Gaëlle Lelandais; Yi-Kuo Yu; Jean-Michel Camadro (2023). Novel Insights into Quantitative Proteomics from an Innovative Bottom-Up Simple Light Isotope Metabolic (bSLIM) Labeling Data Processing Strategy [Dataset]. http://doi.org/10.1021/acs.jproteome.0c00478.s004
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    ACS Publications
    Authors
    Nicolas Sénécaut; Gelio Alves; Hendrik Weisser; Laurent Lignières; Samuel Terrier; Lilian Yang-Crosson; Pierre Poulain; Gaëlle Lelandais; Yi-Kuo Yu; Jean-Michel Camadro
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Simple light isotope metabolic labeling (SLIM labeling) is an innovative method to quantify variations in the proteome based on an original in vivo labeling strategy. Heterotrophic cells grown in U-[12C] as the sole source of carbon synthesize U-[12C]-amino acids, which are incorporated into proteins, giving rise to U-[12C]-proteins. This results in a large increase in the intensity of the monoisotope ion of peptides and proteins, thus allowing higher identification scores and protein sequence coverage in mass spectrometry experiments. This method, initially developed for signal processing and quantification of the incorporation rate of 12C into peptides, was based on a multistep process that was difficult to implement for many laboratories. To overcome these limitations, we developed a new theoretical background to analyze bottom-up proteomics data using SLIM-labeling (bSLIM) and established simple procedures based on open-source software, using dedicated OpenMS modules, and embedded R scripts to process the bSLIM experimental data. These new tools allow computation of both the 12C abundance in peptides to follow the kinetics of protein labeling and the molar fraction of unlabeled and 12C-labeled peptides in multiplexing experiments to determine the relative abundance of proteins extracted under different biological conditions. They also make it possible to consider incomplete 12C labeling, such as that observed in cells with nutritional requirements for nonlabeled amino acids. These tools were validated on an experimental dataset produced using various yeast strains of Saccharomyces cerevisiae and growth conditions. The workflows are built on the implementation of appropriate calculation modules in a KNIME working environment. These new integrated tools provide a convenient framework for the wider use of the SLIM-labeling strategy.

  20. f

    Trends in the use of antimuscarinics and alpha-adrenergic blockers in women...

    • plos.figshare.com
    tiff
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu-Hua Lin; Wei-Yi Huang; Chi-Chih Chang; Yu-Fen Chen; Ling-Ying Wu; Hong-Chiang Chang; Kuo-How Huang (2023). Trends in the use of antimuscarinics and alpha-adrenergic blockers in women with lower urinary tract symptoms in Taiwan: A nationwide, population-based study, 2007-2012 [Dataset]. http://doi.org/10.1371/journal.pone.0220615
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Yu-Hua Lin; Wei-Yi Huang; Chi-Chih Chang; Yu-Fen Chen; Ling-Ying Wu; Hong-Chiang Chang; Kuo-How Huang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Taiwan
    Description

    BackgroundWe aim to examine the trend in the use of antimuscarinics and off-label alpha-adrenergic blockers for treatment of lower urinary tract symptoms (LUTS) in a Taiwanese Women Cohort between 2007 and 2012.MethodsThis population-based National Health Insurance Research Database (NHIRD) was used to examine the trends in the use of antimuscarinics or off-label alpha-adrenergic blockers in Taiwan. A sample of 1,000,000 individuals randomly drawn from the whole population of 23 million individuals who were registered in the NHI in 2005. From 2007 through 2012, women aged over 18 years whose claim record contained prescriptions of either of the two drugs for treatment of any of the LUTS-related diagnoses were identified and analyzed. The annual usage of the two drug classes were calculated by defined daily dose (DDD).ResultsFrom 2007–2012, there was a 0.80 fold (69676.8 to 125104.3) increase in DDD of antimuscarinics in our cohort. The overall healthcare seeking prevalence of LUTS was 7.33% in 2007 and 12.38% in 2012, in a rising trend. The prevalence of antimuscarinics-treated LUTS in our cohort increased from 2.53 in 2007 to 3.41 per 1000 women in 2012. The prevalence of LUTS treated by antimuscarinics increased especially for those older than 60 years during the study period.ConclusionsThis 6-year observational study provided the epidemiologic information of clinically significant LUTS of Asian female population. Moreover, there was a rising trend in the use of antimuscarinics and off-label alpha-adrenergic blockers in the population-based cohort.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bitext (2024). Bitext-insurance-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-insurance-llm-chatbot-training-dataset

Bitext-insurance-llm-chatbot-training-dataset

bitext/Bitext-insurance-llm-chatbot-training-dataset

Bitext - Insurance Tagged Training Dataset for LLM-based Virtual Assistants

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 24, 2024
Dataset authored and provided by
Bitext
License

https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

Description

Bitext - Insurance Tagged Training Dataset for LLM-based Virtual Assistants

  Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [insurance] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-insurance-llm-chatbot-training-dataset.

Search
Clear search
Close search
Google apps
Main menu