10 datasets found
  1. data-augmentation-ner-results

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robin Erd; Robin Erd; Leila Feddoul; Leila Feddoul; Clara Lachenmaier; Clara Lachenmaier; Marianne Jana Mauch; Marianne Jana Mauch (2023). data-augmentation-ner-results [Dataset]. http://doi.org/10.5281/zenodo.6956508
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Robin Erd; Robin Erd; Leila Feddoul; Leila Feddoul; Clara Lachenmaier; Clara Lachenmaier; Marianne Jana Mauch; Marianne Jana Mauch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Model evaluation results produced in the context of evaluating data augmentation for Named Entity Recognition over the German legal domain.

    Detailed information can be found on the Github page.

  2. Z

    Romanian Named Entity Recognition in the Legal domain (LegalNERo)

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Onuț, Andrei (2022). Romanian Named Entity Recognition in the Legal domain (LegalNERo) [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_4772094
    Explore at:
    Dataset updated
    Aug 26, 2022
    Dataset provided by
    Ianov, Alexandru
    Coneschi, Vlad Silviu
    Mitrofan, Maria
    Gasan, Carol Luca
    Păiș, Vasile
    Ghiță, Corvin
    Onuț, Andrei
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    LegalNERo is a manually annotated corpus for named entity recognition in the Romanian legal domain. It provides gold annotations for organizations, locations, persons, time and legal resources mentioned in legal documents. Additionally it offers GEONAMES codes for the named entities annotated as location (where a link could be established).

    The LegalNERo corpus is available in different formats: span-based, token-based and RDF. The Linguistic Linked Open Data (LLOD) version is provided in RDF-Turtle format.

    CONLLUP files conform to the CoNLL-U Plus format https://universaldependencies.org/ext-format.html . Part-of-speech tagging was realized using UDPIPE. Named entity annotations are placed in the column "RELATE:NE" (the 11th column) as defined in the "global.columns" metadata field. Similarly GEONAMES references are in the column "RELATE:GEONAMES" (the 12th column, last). Automatic processing was performed through the RELATE platform (https://relate.racai.ro).

    ANN files conform to BRAT format (https://brat.nlplab.org/).

    The archive contains:

    • ann_LEGAL_PER_LOC_ORG_TIME_overlap Folder in which all the files are in .ann format and contains annotations of: legal resources mentioned, persons, locations, organizations and time. Overlapping annotations of organizations and time entities inside legal references were allowed.

    • ann_LEGAL_PER_LOC_ORG_TIME Folder in which all the files are in .ann format and contains annotations of: legal resources mentioned, persons, locations, organizations and time. Overlapping annotations were not allowed and only the longest named entities were annotated.

    • ann_PER_LOC_ORG_TIME Folder in which all the files are in .ann format and contains annotations of: persons, locations, organizations and time. There are no overlapping annotations.

    • conllup_LEGAL_PER_LOC_ORG_TIME Folder in which all the files are in .conllup format and contains annotations of: legal resources mentioned, persons, locations, organizations and time. Overlapping annotations were not allowed and only the longest named entities were annotated. The annotation of these files was enhanced with GEONAMES codes (where linking was possible).

    • conllup_PER_LOC_ORG_TIME Folder in which all the files are in .conllup format and contains annotations of: persons, locations, organizations and time. Overlapping annotations were not allowed and only the longest named entities were annotated. The annotation of these files was enhanced with GEONAMES codes (where linking was possible).

    • rdf Folder containing the corpus in RDF-Turtle format. All the annotations are available here in both span and token format.

    • text Folder containing the raw texts.

    NER System

    A NER model generated using the LegalNERo corpus can be used online in the RELATE platform: https://relate.racai.ro/index.php?path=ner/demo

    This system was described in: Păiș, Vasile and Mitrofan, Maria and Gasan, Carol Luca and Coneschi, Vlad and Ianov, Alexandru. Named Entity Recognition in the Romanian Legal Domain. In Proceedings of the Natural Legal Language Processing Workshop 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 9--18, nov 2021

    LICENSING

    This work is provided under the license CC BY-NC-ND 4.0 (Attribution-NonCommercial-NoDerivatives 4.0 International). The license can be viewed online here: https://creativecommons.org/licenses/by-nc-nd/4.0/ and the full text here: https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode .

    CONTACT

    Research Institute for Artificial Intelligence "Mihai Draganescu", Romanian Academy Web: http://www.racai.ro Contact emails: vasile@racai.ro , maria@racai.ro

  3. g

    Data from: Named Entity Recognition for Legal Documents

    • gts.ai
    json
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2023). Named Entity Recognition for Legal Documents [Dataset]. https://gts.ai/case-study/named-entity-recognition-for-legal-documents/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Our latest project involved applying Named Entity Recognition (NER) to legal documents.

  4. h

    legal-ner

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    daishen, legal-ner [Dataset]. https://huggingface.co/datasets/daishen/legal-ner
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    daishen
    Description

    daishen/legal-ner dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. Named Entity Recognition

    • sdiinnovation-geoplatform.hub.arcgis.com
    • hub.arcgis.com
    Updated May 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2022). Named Entity Recognition [Dataset]. https://sdiinnovation-geoplatform.hub.arcgis.com/content/97369a6f1200428ba060410d13dbb078
    Explore at:
    Dataset updated
    May 27, 2022
    Dataset authored and provided by
    Esrihttp://esri.com/
    Description

    This deep learning model is used to identify or categorize entities in unstructured text. An entity may refer to a word or a sequence of words such as the name of “Organizations,” “Persons,” “Country,” or “Date” and “Time” in the text. This model detects entities from the given text and classifies them into pre-determined categories.

    Named entity recognition (NER) is useful when a high-level overview of a large quantity of text is required. NER can let you know crucial and important information in text by extracting the main entities from it. The extracted entities are categorized into pre-determined classes and can help in drawing meaningful decisions and conclusions.
    

    Using the model

    Follow the guide to use the model. Before using this model, ensure that the supported deep learning libraries are installed. For more details, check the Deep Learning Libraries Installer for ArcGIS.
    

    Fine-tuning the model

    This model cannot be fine-tuned using ArcGIS tools.

    Input

    Text files on which named entity extraction will be performed.
    

    Output

    Classified tokens into the following pre-defined entity classes:
    

    PERSON – People, including fictional NORP – Nationalities or religious or political groups FACILITY – Buildings, airports, highways, bridges, etc. ORGANIZATION – Companies, agencies, institutions, etc. GPE – Countries, cities, states LOCATION – Non-GPE locations, mountain ranges, bodies of water PRODUCT – Vehicles, weapons, foods, etc. (Not services) EVENT – Named hurricanes, battles, wars, sports events, etc. WORK OF ART – Titles of books, songs, etc. LAW – Named documents made into laws LANGUAGE – Any named language DATE – Absolute or relative dates or periods TIME – Times smaller than a day PERCENT – Percentage (including “%”) MONEY – Monetary values, including unit QUANTITY – Measurements, as of weight or distance ORDINAL – “first,” “second” CARDINAL – Numerals that do not fall under another type

    Model architecture

    This model uses the XLM-RoBERTa architecture implemented in Hugging Face transformers using the TNER library.
    

    Accuracy metrics

    This model has an accuracy of 91.6 percent.
    

    Training dataThe model has been trained on the OntoNotes Release 5.0 dataset.

    Sample resultsHere are a few results from the model.

    Citations

    Weischedel, Ralph, et al. OntoNotes Release 5.0 LDC2013T19. Web Download. Philadelphia: Linguistic Data Consortium, 2013. Asahi Ushio and Jose Camacho-Collados. 2021. TNER: An all-round Python library for transformer based named entity recognition. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pages 53–62, Online. Association for Computational Linguistics.

  6. O

    LeNER-Br

    • opendatalab.com
    • paperswithcode.com
    zip
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Brasilia (2025). LeNER-Br [Dataset]. https://opendatalab.com/OpenDataLab/LeNER-Br
    Explore at:
    zip(24756792 bytes)Available download formats
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    University of Brasilia
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    LeNER-Br is a dataset for named entity recognition (NER) in Brazilian Legal Text.

  7. Austrian court decisions

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Mar 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Sageder; Artem Revenko; Artem Revenko; Julian Moreno Schneider; Julian Moreno Schneider; Anna Breit; Anna Breit; Victor Mireles; Victor Mireles; Sotiris Karampatakis; Sotiris Karampatakis; Christian Sageder (2021). Austrian court decisions [Dataset]. http://doi.org/10.5281/zenodo.4625767
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 22, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christian Sageder; Artem Revenko; Artem Revenko; Julian Moreno Schneider; Julian Moreno Schneider; Anna Breit; Anna Breit; Victor Mireles; Victor Mireles; Sotiris Karampatakis; Sotiris Karampatakis; Christian Sageder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Austria
    Description

    A dataset of Austrian court decisions in German language prepared by Christian Sageder from Cybly in JSON-LD format compliant with LynxDocument schema (https://lynx-project.eu/doc/lkg/) - folder "original_json".

    Additionally, named entities annotations produced by a Bert-based transformer trained on WikiNer corpus - Per, Loc, Org, Misc - by DFKI team in N3 RDF notations, compliant with NIF2.1 (https://github.com/NLP2RDF/documentation/blob/f63715b951d03324390edbbd3e84babdf43bc60e/docs/index.rst) - folder "ner_annotations_nif".

    Additionally, manually annotated sample of 9 fine-grained named entity types - folder "manual/manually_annotated", see file names for the NE types - and further manually verified predictions by a classifier trained on manually annotated sample - folder "manual/manually_verified". Manual work was done by all authors.

  8. h

    ner-cat

    • huggingface.co
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ner-cat [Dataset]. https://huggingface.co/datasets/Ugiat/ner-cat
    Explore at:
    Dataset updated
    Mar 19, 2025
    Dataset authored and provided by
    Ugiat Technologies
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    NERCat Dataset

      Dataset Summary
    

    The NERCat dataset is a manually annotated collection of Catalan-language television transcriptions, designed to improve Named Entity Recognition (NER) performance for the Catalan language. The dataset covers diverse domains such as politics, sports, and culture, and includes 9,242 sentences with 13,732 named entities annotated across eight categories: Person, Facility, Organization, Location, Product, Event, Date, and Law. The dataset was… See the full description on the dataset page: https://huggingface.co/datasets/Ugiat/ner-cat.

  9. H

    Data from: Power in Text: Implementing Networks and Institutional Complexity...

    • dataverse.harvard.edu
    Updated Apr 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Shaffer (2020). Power in Text: Implementing Networks and Institutional Complexity in American Law [Dataset]. http://doi.org/10.7910/DVN/9ULNMU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 23, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Robert Shaffer
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Replication materials for "Power in Text: Implementing Networks and Institutional Complexity in American Law". Contains webscrapers, scraped text, fit NER models, network extraction code, and Bayesian modeling code/results. All data were originally collected in late 2018, so re-scraped data may differ. For details, see comments in individual scripts, as well as the included README file. If at all possible, maintain the original file structure of this repository for easier replication.

  10. f

    Table1_Revealing the dynamic landscape of drug-drug interactions through...

    • figshare.com
    xlsx
    Updated Oct 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eugene Jeong; Bradley Malin; Scott D. Nelson; Yu Su; Lang Li; You Chen (2023). Table1_Revealing the dynamic landscape of drug-drug interactions through network analysis.xlsx [Dataset]. http://doi.org/10.3389/fphar.2023.1211491.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Eugene Jeong; Bradley Malin; Scott D. Nelson; Yu Su; Lang Li; You Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction: The landscape of drug-drug interactions (DDIs) has evolved significantly over the past 60 years, necessitating a retrospective analysis to identify research trends and under-explored areas. While methodologies like bibliometric analysis provide valuable quantitative perspectives on DDI research, they have not successfully delineated the complex interrelations between drugs. Understanding these intricate relationships is essential for deciphering the evolving architecture and progressive transformation of DDI research structures over time. We utilize network analysis to unearth the multifaceted relationships between drugs, offering a richer, more nuanced comprehension of shifts in research focus within the DDI landscape.Methods: This groundbreaking investigation employs natural language processing, techniques, specifically Named Entity Recognition (NER) via ScispaCy, and the information extraction model, SciFive, to extract pharmacokinetic (PK) and pharmacodynamic (PD) DDI evidence from PubMed articles spanning January 1962 to July 2023. It reveals key trends and patterns through an innovative network analysis approach. Static network analysis is deployed to discern structural patterns in DDI research, while evolving network analysis is employed to monitor changes in the DDI research trend structures over time.Results: Our compelling results shed light on the scale-free characteristics of pharmacokinetic, pharmacodynamic, and their combined networks, exhibiting power law exponent values of 2.5, 2.82, and 2.46, respectively. In these networks, a select few drugs serve as central hubs, engaging in extensive interactions with a multitude of other drugs. Interestingly, the networks conform to a densification power law, illustrating that the number of DDIs grows exponentially as new drugs are added to the DDI network. Notably, we discovered that drugs connected in PK and PD networks predominantly belong to the same categories defined by the Anatomical Therapeutic Chemical (ATC) classification system, with fewer interactions observed between drugs from different categories.Discussion: The finding suggests that PK and PD DDIs between drugs from different ATC categories have not been studied as extensively as those between drugs within the same categories. By unearthing these hidden patterns, our study paves the way for a deeper understanding of the DDI landscape, providing valuable information for future DDI research, clinical practice, and drug development focus areas.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Robin Erd; Robin Erd; Leila Feddoul; Leila Feddoul; Clara Lachenmaier; Clara Lachenmaier; Marianne Jana Mauch; Marianne Jana Mauch (2023). data-augmentation-ner-results [Dataset]. http://doi.org/10.5281/zenodo.6956508
Organization logo

data-augmentation-ner-results

Explore at:
zipAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Robin Erd; Robin Erd; Leila Feddoul; Leila Feddoul; Clara Lachenmaier; Clara Lachenmaier; Marianne Jana Mauch; Marianne Jana Mauch
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Model evaluation results produced in the context of evaluating data augmentation for Named Entity Recognition over the German legal domain.

Detailed information can be found on the Github page.

Search
Clear search
Close search
Google apps
Main menu