22 datasets found
  1. Legal NER Dataset

    • kaggle.com
    zip
    Updated Apr 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pratik Pujari (2024). Legal NER Dataset [Dataset]. https://www.kaggle.com/datasets/pratikpujarichef/legal-ner-dataset
    Explore at:
    zip(6930691 bytes)Available download formats
    Dataset updated
    Apr 12, 2024
    Authors
    Pratik Pujari
    Description

    Dataset

    This dataset was created by Pratik Pujari

    Contents

  2. h

    InLegalNER

    • huggingface.co
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenNyAI (2024). InLegalNER [Dataset]. https://huggingface.co/datasets/opennyaiorg/InLegalNER
    Explore at:
    Dataset updated
    Apr 17, 2024
    Dataset authored and provided by
    OpenNyAI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset for training and evaluating Indian Legal Named Entity Recognition model.

      Paper details
    

    Named Entity Recognition in Indian court judgments Arxiv

      Label Scheme
    

    View label scheme (14 labels for 1 components)

    ENTITY BELONGS TO

    LAWYER PREAMBLE

    COURT PREAMBLE, JUDGEMENT

    JUDGE PREAMBLE, JUDGEMENT

    PETITIONER PREAMBLE, JUDGEMENT

    RESPONDENT PREAMBLE, JUDGEMENT

    CASE_NUMBER JUDGEMENT

    GPE JUDGEMENT

    DATE JUDGEMENT

    ORG JUDGEMENT

    STATUTE JUDGEMENT… See the full description on the dataset page: https://huggingface.co/datasets/opennyaiorg/InLegalNER.

  3. LeNER-Br: Portuguese Legal NER

    • kaggle.com
    zip
    Updated Dec 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). LeNER-Br: Portuguese Legal NER [Dataset]. https://www.kaggle.com/datasets/thedevastator/lener-br-portuguese-legal-ner-dataset/discussion
    Explore at:
    zip(767366 bytes)Available download formats
    Dataset updated
    Dec 2, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Brazil
    Description

    LeNER-Br: Portuguese Legal NER

    Labeled Portuguese Legal NER

    By lener_br (From Huggingface) [source]

    About this dataset

    LeNER-Br is a comprehensive dataset specifically created for named entity recognition (NER) in the Portuguese language, particularly within the domain of legal documents. This dataset consists of manually annotated texts extracted from legislation and legal cases. Each text has undergone meticulous tagging to identify various types of named entities, including persons, locations, time entities, organizations, legislation references, and legal case references.

    To curate this dataset, a total of 66 legal documents were collected from diverse Brazilian Courts encompassing both superior and state levels. Prominent courts such as the Supremo Tribunal Federal, Superior Tribunal de Justiça, Tribunal de Justiça de Minas Gerais, and Tribunal de Contas da União contributed to this collection. Additionally, four significant legislation documents like Lei Maria da Penha were also included to ensure a comprehensive representation. In total, 70 unique documents form part of this extensive dataset.

    The primary purpose of LeNER-Br is to facilitate the development and evaluation of NER models specifically tailored for Portuguese legal text analysis. The labeled data provided in this dataset enables researchers and data scientists to train their NER models effectively by leveraging insights from varied legal contexts present in Brazil's jurisdiction system.

    The columns included within each instance of annotated text include tokens which represent individual words or tokens found within the original texts. The ner_tags column provides valuable information through assigned NER tags for each token that specify their entity type representation - whether it be a person's name or organization name specific to law or any other relevant category that falls under legislative contexts.

    Researchers may use LeNER-Br as a benchmark test set against which they can evaluate the performance and efficacy of their own NER models designed for Portuguese legal documents. Moreover,**tokens**column is repeated twice with additional tagged descriptions including ner_tagswhich contains relevant NER information assigned uniquely for each token.

    In conclusion,**LeNER-Br dataset** is an invaluable resource for advancing NER techniques within the Portuguese language, particularly within the legal domain. It provides a high-quality, manually annotated collection of legal texts specifically chosen to accurately represent Brazil's legislative landscape and entities involved. This dataset serves as a strong foundation for training and evaluating NER models and facilitates advancements in information extraction from Portuguese legal documents

    How to use the dataset

    The LeNER-Br dataset is a valuable resource for researchers and practitioners working on named entity recognition (NER) in the context of Portuguese legal documents. This guide will provide you with an overview of the dataset and how to effectively utilize it for your NER tasks.

    Dataset Overview

    LeNER-Br is composed of 70 manually annotated legal documents written in Portuguese. These documents were collected from various Brazilian Courts, including superior and state levels such as the Supremo Tribunal Federal, Superior Tribunal de Justiça, Tribunal de Justiça de Minas Gerais, and Tribunal de Contas da União. The dataset also includes four legislation documents, such as Lei Maria da Penha.

    The dataset provides tags for different types of named entities commonly found in legal texts. These named entity types include persons, locations, time entities, organizations, legislations, and legal cases. Additionally, there are two main columns in the dataset that you should pay attention to:

    • tokens or tokens: This column contains individual words or tokens present in the text of the legal documents.
    • ner_tags or ner_tags: This column contains named entity recognition (NER) tags assigned to each token in the text. These tags indicate the type of named entity that each token represents.

    Utilizing the Dataset

    Here are some steps you can follow to make effective use of this dataset:

    • Data Exploration: Start by loading and exploring the data using your preferred programming language or data analysis tools like Python's pandas library.

      • Load train.csv file for training your NER models with manually annotated texts.
      • Utilize test.csv file as a test set for evaluating model performance.
      • Use validation.csv file for additional validation during model development.
    • Preprocessing:

      • Perform necessary preprocess...
  4. h

    german-ler

    • huggingface.co
    • opendatalab.com
    Updated Nov 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Leitner (2024). german-ler [Dataset]. http://doi.org/10.57967/hf/0046
    Explore at:
    Dataset updated
    Nov 2, 2024
    Authors
    Elena Leitner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for "German LER"

      Dataset Summary
    

    A dataset of Legal Documents from German federal court decisions for Named Entity Recognition. The dataset is human-annotated with 19 fine-grained entity classes. The dataset consists of approx. 67,000 sentences and contains 54,000 annotated entities. NER tags use the BIO tagging scheme. The dataset includes two different versions of annotations, one with a set of 19 fine-grained semantic classes (ner_tags) and another one… See the full description on the dataset page: https://huggingface.co/datasets/elenanereiss/german-ler.

  5. h

    eg-legal-ner

    • huggingface.co
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Mardi (2025). eg-legal-ner [Dataset]. https://huggingface.co/datasets/fr3on/eg-legal-ner
    Explore at:
    Dataset updated
    Oct 8, 2025
    Authors
    Ahmed Mardi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Arabic Legal Dataset - Legal Named Entity Recognition

      Dataset Description
    

    Named entity recognition dataset for Arabic legal texts with specialized legal entity types and relationships. This dataset contains 1,046 examples of ner data derived from Egyptian legal texts, including criminal law, civil law, procedural law, and personal status law. The dataset is designed for training and evaluating Arabic legal AI models.

      Dataset Summary
    

    Language: Arabic (Egyptian… See the full description on the dataset page: https://huggingface.co/datasets/fr3on/eg-legal-ner.

  6. ner-data

    • kaggle.com
    zip
    Updated Mar 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivam Kumar (2024). ner-data [Dataset]. https://www.kaggle.com/datasets/shivamk01/ner-data
    Explore at:
    zip(2020341 bytes)Available download formats
    Dataset updated
    Mar 11, 2024
    Authors
    Shivam Kumar
    Description

    Dataset

    This dataset was created by Shivam Kumar

    Contents

  7. data-augmentation-ner-results

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robin Erd; Robin Erd; Leila Feddoul; Leila Feddoul; Clara Lachenmaier; Clara Lachenmaier; Marianne Jana Mauch; Marianne Jana Mauch (2023). data-augmentation-ner-results [Dataset]. http://doi.org/10.5281/zenodo.6956508
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Robin Erd; Robin Erd; Leila Feddoul; Leila Feddoul; Clara Lachenmaier; Clara Lachenmaier; Marianne Jana Mauch; Marianne Jana Mauch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Model evaluation results produced in the context of evaluating data augmentation for Named Entity Recognition over the German legal domain.

    Detailed information can be found on the Github page.

  8. Multilingual NER Dataset

    • kaggle.com
    zip
    Updated May 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantin Verner (2022). Multilingual NER Dataset [Dataset]. https://www.kaggle.com/datasets/constantinwerner/multilingual-ner-dataset
    Explore at:
    zip(837884 bytes)Available download formats
    Dataset updated
    May 22, 2022
    Authors
    Konstantin Verner
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    It is a collection of NER datasets in five languages:

    German (https://github.com/elenanereiss/Legal-Entity-Recognition) Greek (https://github.com/nmpartzio/elNER) Japanese (https://github.com/stockmarkteam/ner-wikipedia-dataset) Russian (https://github.com/dialogue-evaluation/factRuEval-2016/) Turkish (https://data.mendeley.com/datasets/cdcztymf4k/1)

    The annotation was adapted to OntoNotes standard and converted to IOB format. The main purpose of this dataset is evaluation of XLM models.

  9. Z

    Romanian Named Entity Recognition in the Legal domain (LegalNERo)

    • data.niaid.nih.gov
    Updated Aug 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Păiș, Vasile; Mitrofan, Maria; Gasan, Carol Luca; Ianov, Alexandru; Ghiță, Corvin; Coneschi, Vlad Silviu; Onuț, Andrei (2022). Romanian Named Entity Recognition in the Legal domain (LegalNERo) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4772094
    Explore at:
    Dataset updated
    Aug 26, 2022
    Dataset provided by
    Research Institute for Artificial Intelligence "Mihai Drăgănescu", Romanian Academy
    Authors
    Păiș, Vasile; Mitrofan, Maria; Gasan, Carol Luca; Ianov, Alexandru; Ghiță, Corvin; Coneschi, Vlad Silviu; Onuț, Andrei
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    LegalNERo is a manually annotated corpus for named entity recognition in the Romanian legal domain. It provides gold annotations for organizations, locations, persons, time and legal resources mentioned in legal documents. Additionally it offers GEONAMES codes for the named entities annotated as location (where a link could be established).

    The LegalNERo corpus is available in different formats: span-based, token-based and RDF. The Linguistic Linked Open Data (LLOD) version is provided in RDF-Turtle format.

    CONLLUP files conform to the CoNLL-U Plus format https://universaldependencies.org/ext-format.html . Part-of-speech tagging was realized using UDPIPE. Named entity annotations are placed in the column "RELATE:NE" (the 11th column) as defined in the "global.columns" metadata field. Similarly GEONAMES references are in the column "RELATE:GEONAMES" (the 12th column, last). Automatic processing was performed through the RELATE platform (https://relate.racai.ro).

    ANN files conform to BRAT format (https://brat.nlplab.org/).

    The archive contains:

    • ann_LEGAL_PER_LOC_ORG_TIME_overlap Folder in which all the files are in .ann format and contains annotations of: legal resources mentioned, persons, locations, organizations and time. Overlapping annotations of organizations and time entities inside legal references were allowed.

    • ann_LEGAL_PER_LOC_ORG_TIME Folder in which all the files are in .ann format and contains annotations of: legal resources mentioned, persons, locations, organizations and time. Overlapping annotations were not allowed and only the longest named entities were annotated.

    • ann_PER_LOC_ORG_TIME Folder in which all the files are in .ann format and contains annotations of: persons, locations, organizations and time. There are no overlapping annotations.

    • conllup_LEGAL_PER_LOC_ORG_TIME Folder in which all the files are in .conllup format and contains annotations of: legal resources mentioned, persons, locations, organizations and time. Overlapping annotations were not allowed and only the longest named entities were annotated. The annotation of these files was enhanced with GEONAMES codes (where linking was possible).

    • conllup_PER_LOC_ORG_TIME Folder in which all the files are in .conllup format and contains annotations of: persons, locations, organizations and time. Overlapping annotations were not allowed and only the longest named entities were annotated. The annotation of these files was enhanced with GEONAMES codes (where linking was possible).

    • rdf Folder containing the corpus in RDF-Turtle format. All the annotations are available here in both span and token format.

    • text Folder containing the raw texts.

    NER System

    A NER model generated using the LegalNERo corpus can be used online in the RELATE platform: https://relate.racai.ro/index.php?path=ner/demo

    This system was described in: Păiș, Vasile and Mitrofan, Maria and Gasan, Carol Luca and Coneschi, Vlad and Ianov, Alexandru. Named Entity Recognition in the Romanian Legal Domain. In Proceedings of the Natural Legal Language Processing Workshop 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 9--18, nov 2021

    LICENSING

    This work is provided under the license CC BY-NC-ND 4.0 (Attribution-NonCommercial-NoDerivatives 4.0 International). The license can be viewed online here: https://creativecommons.org/licenses/by-nc-nd/4.0/ and the full text here: https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode .

    CONTACT

    Research Institute for Artificial Intelligence "Mihai Draganescu", Romanian Academy Web: http://www.racai.ro Contact emails: vasile@racai.ro , maria@racai.ro

  10. h

    indian-legal-ner

    • huggingface.co
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HF Tuner (2025). indian-legal-ner [Dataset]. https://huggingface.co/datasets/hf-tuner/indian-legal-ner
    Explore at:
    Dataset updated
    Nov 8, 2025
    Authors
    HF Tuner
    Area covered
    India
    Description

    hf-tuner/indian-legal-ner dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. NKP_Legal_Cases

    • kaggle.com
    zip
    Updated Mar 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arpan Sapkota (2025). NKP_Legal_Cases [Dataset]. https://www.kaggle.com/arpansapkota/nkp-legal-cases
    Explore at:
    zip(139812570 bytes)Available download formats
    Dataset updated
    Mar 30, 2025
    Authors
    Arpan Sapkota
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The NKP_Legal_Cases datasets (Raw_Legal_Cases.json and Para_Legal_Cases.json) is a curated collection of Nepali legal case texts sourced from publicly available Nepali court documents. It was prepared to focus on legal NLP, text summarization, and evaluation of multilingual LLMs.

    The dataset aims to support research in: - Legal document summarization - Named entity recognition (NER) - Legal information retrieval - Document classification - Multilingual NLP for low-resource languages (specifically Nepali)

    This dataset addresses the significant gap in publicly accessible legal datasets for Nepali, enabling researchers, students, and practitioners to explore legal AI applications in low-resource contexts.

  12. Legal_NER

    • kaggle.com
    zip
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ishaan Bhattacharjee (2024). Legal_NER [Dataset]. https://www.kaggle.com/datasets/ishaaaaan/legal-ner
    Explore at:
    zip(2766869 bytes)Available download formats
    Dataset updated
    Mar 28, 2024
    Authors
    Ishaan Bhattacharjee
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Ishaan Bhattacharjee

    Released under MIT

    Contents

  13. h

    LegalLensNER

    • huggingface.co
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darrow (2024). LegalLensNER [Dataset]. https://huggingface.co/datasets/darrow-ai/LegalLensNER
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 11, 2024
    Dataset authored and provided by
    Darrow
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Homepage: https://www.darrow.ai/ Repository: https://github.com/darrow-labs/LegalLens Paper: https://arxiv.org/pdf/2402.04335.pdf Point of Contact: Dor Bernsohn,Gil Semo

      Overview
    

    LegalLensNER is a dedicated dataset designed for Named Entity Recognition (NER) in the legal domain, with a specific emphasis on detecting legal violations in unstructured texts.

      Data Fields
    

    id: (int) A unique identifier for each record. word: (str) The specific word or token in the… See the full description on the dataset page: https://huggingface.co/datasets/darrow-ai/LegalLensNER.

  14. h

    legal-ner

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    daishen, legal-ner [Dataset]. https://huggingface.co/datasets/daishen/legal-ner
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    daishen
    Description

    daishen/legal-ner dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    uk_ner_contracts_spacy

    • huggingface.co
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Law Insider (2023). uk_ner_contracts_spacy [Dataset]. https://huggingface.co/datasets/lawinsider/uk_ner_contracts_spacy
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2023
    Dataset authored and provided by
    Law Insider
    Description

    Dataset Description

    Legal Contracts Dataset for Training SpaCy NER Model This repository contains a specially curated dataset consisting of legal contracts. It is designed for the purpose of training a Named Entity Recognition (NER) model using SpaCy, with the aim to recognize and classify four types of entities in the text: Contract Type, Clause Title, Clause Number, Definition Title The dataset includes a broad variety of legal contracts, covering diverse domains such as… See the full description on the dataset page: https://huggingface.co/datasets/lawinsider/uk_ner_contracts_spacy.

  16. d

    Replication Data for: Power in Text: Implementing Networks and Institutional...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaffer, Robert (2023). Replication Data for: Power in Text: Implementing Networks and Institutional Complexity in American Law [Dataset]. http://doi.org/10.7910/DVN/PYAZZE
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Shaffer, Robert
    Description

    Replication materials for "Power in Text: Implementing Networks and Institutional Complexity in American Law". Contains webscrapers, scraped text, fit NER models, network extraction code, and Bayesian modeling code/results. All data were originally collected in late 2018, so re-scraped data may differ. For details, see comments in individual scripts, as well as the included README file. If at all possible, maintain the original file structure of this repository for easier replication.

  17. h

    greek_legal_ner

    • huggingface.co
    Updated May 31, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joel Niklaus (2013). greek_legal_ner [Dataset]. https://huggingface.co/datasets/joelniklaus/greek_legal_ner
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 31, 2013
    Authors
    Joel Niklaus
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for Greek Legal Named Entity Recognition

      Dataset Summary
    

    This dataset contains an annotated corpus for named entity recognition in Greek legislations. It is the first of its kind for the Greek language in such an extended form and one of the few that examines legal text in a full spectrum entity recognition.

      Supported Tasks and Leaderboards
    

    The dataset supports the task of named entity recognition.

      Languages
    

    The language in the dataset… See the full description on the dataset page: https://huggingface.co/datasets/joelniklaus/greek_legal_ner.

  18. HOME-Alcar: Aligned and Annotated Cartularies

    • zenodo.org
    • data.niaid.nih.gov
    bin, json, pdf, zip
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dominique Stutzmann; Dominique Stutzmann; Sergio Torres Aguilar; Sergio Torres Aguilar; Paul Chaffenet; Paul Chaffenet (2024). HOME-Alcar: Aligned and Annotated Cartularies [Dataset]. http://doi.org/10.5281/zenodo.5600884
    Explore at:
    pdf, zip, json, binAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dominique Stutzmann; Dominique Stutzmann; Sergio Torres Aguilar; Sergio Torres Aguilar; Paul Chaffenet; Paul Chaffenet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The HOME-Alcar (Aligned and Annotated Cartularies) corpus was produced as part of the European research project HOME History of Medieval Europe (https://www.heritageresearch-hub.eu/project/home/), led under the coordination oflinebreakof Institut de Recherche et d'Histoire des Textes (PI: D. Stutzmann), with the Universitat Politecnica de Valencia (PI: E. Vidal), the National Archives of the Czech Republic in Prague (PI: J. Kreckova), and Teklia SAS (PI: C. Kermorvant)
    The HOME-Alcar (Aligned and Annotated Cartularies) corpus is a resource created to train Handwritten Text Recognition (HTR) and Named Entity Recognition (NER), and presents a collection of
    (i) digital images of 17 medieval manuscripts;
    (ii) scholarly editions thereof;
    (iii) coordinates linking images and text at line level;
    (iv) annotations of Named Entities (place and person names).
    The 17 medieval manuscripts in this corpus are cartularies, i.e. books copying charters and legal acts, produced between the 12th and 14th centuries.

  19. o

    The triall, of Lieut. Collonell John Lilburne, by an extraordinary or...

    • llds.ling-phil.ox.ac.uk
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Lilburne; Clement Walker (2024). The triall, of Lieut. Collonell John Lilburne, by an extraordinary or special commission, of oyear and terminer at the Guild-Hall of London, the 24, 25, 26. of Octob. 1649. Being as exactly pen'd and taken in short hand, as it was possible to be done in such a croud and noise, and transcribed with an indifferent and even hand, both in reference to the court, and the prisoner; that so matter of fact, as it was there declared, might truly come to publick view. In which is contained all the judges names, and the names of the grand inquest, and the names of the honest jury of life and death. Vnto which is annexed a necessary and essential appendix, very well worth the readers, carefull perusal; if he desire rightly to understand the whole body of the discourse, and know the worth of that ner'e enough to be prised, bulwork of English freedom, viz. to be tried by a jury of legal and good men of the neighbour-hood. / Published by Theodorus Verax. [Dataset]. https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/A96856
    Explore at:
    Dataset updated
    Jun 1, 2024
    Authors
    John Lilburne; Clement Walker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    (:unav)...........................................

  20. h

    eg-legal-multi-task

    • huggingface.co
    Updated Oct 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Mardi (2025). eg-legal-multi-task [Dataset]. https://huggingface.co/datasets/fr3on/eg-legal-multi-task
    Explore at:
    Dataset updated
    Oct 8, 2025
    Authors
    Ahmed Mardi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Arabic Legal Dataset - Multi-Task Legal Learning

      Dataset Description
    

    Multi-task learning dataset combining classification, QA, NER, and summarization tasks in unified format. This dataset contains 1,046 examples of multi_task data derived from Egyptian legal texts, including criminal law, civil law, procedural law, and personal status law. The dataset is designed for training and evaluating Arabic legal AI models.

      Dataset Summary
    

    Language: Arabic (Egyptian… See the full description on the dataset page: https://huggingface.co/datasets/fr3on/eg-legal-multi-task.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Pratik Pujari (2024). Legal NER Dataset [Dataset]. https://www.kaggle.com/datasets/pratikpujarichef/legal-ner-dataset
Organization logo

Legal NER Dataset

Explore at:
374 scholarly articles cite this dataset (View in Google Scholar)
zip(6930691 bytes)Available download formats
Dataset updated
Apr 12, 2024
Authors
Pratik Pujari
Description

Dataset

This dataset was created by Pratik Pujari

Contents

Search
Clear search
Close search
Google apps
Main menu