22 datasets found

Legal NER Dataset
kaggle.com
zip
Updated Apr 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pratik Pujari (2024). Legal NER Dataset [Dataset]. https://www.kaggle.com/datasets/pratikpujarichef/legal-ner-dataset
Explore at:
zip(6930691 bytes)Available download formats
Dataset updated
Apr 12, 2024
Authors
Pratik Pujari
Description
Dataset

This dataset was created by Pratik Pujari

Contents
h
InLegalNER
huggingface.co
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenNyAI (2024). InLegalNER [Dataset]. https://huggingface.co/datasets/opennyaiorg/InLegalNER
Explore at:
Dataset updated
Apr 17, 2024
Dataset authored and provided by
OpenNyAI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset for training and evaluating Indian Legal Named Entity Recognition model.

Paper details

Named Entity Recognition in Indian court judgments Arxiv

Label Scheme

View label scheme (14 labels for 1 components)

ENTITY BELONGS TO

LAWYER PREAMBLE

COURT PREAMBLE, JUDGEMENT

JUDGE PREAMBLE, JUDGEMENT

PETITIONER PREAMBLE, JUDGEMENT

RESPONDENT PREAMBLE, JUDGEMENT

CASE_NUMBER JUDGEMENT

GPE JUDGEMENT

DATE JUDGEMENT

ORG JUDGEMENT

STATUTE JUDGEMENT… See the full description on the dataset page: https://huggingface.co/datasets/opennyaiorg/InLegalNER.
LeNER-Br: Portuguese Legal NER
kaggle.com
zip
Updated Dec 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). LeNER-Br: Portuguese Legal NER [Dataset]. https://www.kaggle.com/datasets/thedevastator/lener-br-portuguese-legal-ner-dataset/discussion
Explore at:
zip(767366 bytes)Available download formats
Dataset updated
Dec 2, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Brazil
Description
LeNER-Br: Portuguese Legal NER

Labeled Portuguese Legal NER

By lener_br (From Huggingface) [source]

About this dataset

LeNER-Br is a comprehensive dataset specifically created for named entity recognition (NER) in the Portuguese language, particularly within the domain of legal documents. This dataset consists of manually annotated texts extracted from legislation and legal cases. Each text has undergone meticulous tagging to identify various types of named entities, including persons, locations, time entities, organizations, legislation references, and legal case references.

To curate this dataset, a total of 66 legal documents were collected from diverse Brazilian Courts encompassing both superior and state levels. Prominent courts such as the Supremo Tribunal Federal, Superior Tribunal de Justiça, Tribunal de Justiça de Minas Gerais, and Tribunal de Contas da União contributed to this collection. Additionally, four significant legislation documents like Lei Maria da Penha were also included to ensure a comprehensive representation. In total, 70 unique documents form part of this extensive dataset.

The primary purpose of LeNER-Br is to facilitate the development and evaluation of NER models specifically tailored for Portuguese legal text analysis. The labeled data provided in this dataset enables researchers and data scientists to train their NER models effectively by leveraging insights from varied legal contexts present in Brazil's jurisdiction system.

The columns included within each instance of annotated text include tokens which represent individual words or tokens found within the original texts. The ner_tags column provides valuable information through assigned NER tags for each token that specify their entity type representation - whether it be a person's name or organization name specific to law or any other relevant category that falls under legislative contexts.

Researchers may use LeNER-Br as a benchmark test set against which they can evaluate the performance and efficacy of their own NER models designed for Portuguese legal documents. Moreover,**tokens**column is repeated twice with additional tagged descriptions including ner_tagswhich contains relevant NER information assigned uniquely for each token.

In conclusion,**LeNER-Br dataset** is an invaluable resource for advancing NER techniques within the Portuguese language, particularly within the legal domain. It provides a high-quality, manually annotated collection of legal texts specifically chosen to accurately represent Brazil's legislative landscape and entities involved. This dataset serves as a strong foundation for training and evaluating NER models and facilitates advancements in information extraction from Portuguese legal documents

How to use the dataset

The LeNER-Br dataset is a valuable resource for researchers and practitioners working on named entity recognition (NER) in the context of Portuguese legal documents. This guide will provide you with an overview of the dataset and how to effectively utilize it for your NER tasks.

Dataset Overview

LeNER-Br is composed of 70 manually annotated legal documents written in Portuguese. These documents were collected from various Brazilian Courts, including superior and state levels such as the Supremo Tribunal Federal, Superior Tribunal de Justiça, Tribunal de Justiça de Minas Gerais, and Tribunal de Contas da União. The dataset also includes four legislation documents, such as Lei Maria da Penha.

The dataset provides tags for different types of named entities commonly found in legal texts. These named entity types include persons, locations, time entities, organizations, legislations, and legal cases. Additionally, there are two main columns in the dataset that you should pay attention to:

tokens or tokens: This column contains individual words or tokens present in the text of the legal documents.

ner_tags or ner_tags: This column contains named entity recognition (NER) tags assigned to each token in the text. These tags indicate the type of named entity that each token represents.

Utilizing the Dataset

Here are some steps you can follow to make effective use of this dataset:

Data Exploration: Start by loading and exploring the data using your preferred programming language or data analysis tools like Python's pandas library.

Load train.csv file for training your NER models with manually annotated texts.

Utilize test.csv file as a test set for evaluating model performance.

Use validation.csv file for additional validation during model development.

Preprocessing:

Perform necessary preprocess...
h
german-ler
huggingface.co
opendatalab.com
Updated Nov 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elena Leitner (2024). german-ler [Dataset]. http://doi.org/10.57967/hf/0046
Explore at:
Unique identifier
https://doi.org/10.57967/hf/0046
Dataset updated
Nov 2, 2024
Authors
Elena Leitner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for "German LER"

Dataset Summary

A dataset of Legal Documents from German federal court decisions for Named Entity Recognition. The dataset is human-annotated with 19 fine-grained entity classes. The dataset consists of approx. 67,000 sentences and contains 54,000 annotated entities. NER tags use the BIO tagging scheme. The dataset includes two different versions of annotations, one with a set of 19 fine-grained semantic classes (ner_tags) and another one… See the full description on the dataset page: https://huggingface.co/datasets/elenanereiss/german-ler.
h
eg-legal-ner
huggingface.co
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Mardi (2025). eg-legal-ner [Dataset]. https://huggingface.co/datasets/fr3on/eg-legal-ner
Explore at:
Dataset updated
Oct 8, 2025
Authors
Ahmed Mardi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Arabic Legal Dataset - Legal Named Entity Recognition

Dataset Description

Named entity recognition dataset for Arabic legal texts with specialized legal entity types and relationships. This dataset contains 1,046 examples of ner data derived from Egyptian legal texts, including criminal law, civil law, procedural law, and personal status law. The dataset is designed for training and evaluating Arabic legal AI models.

Dataset Summary

Language: Arabic (Egyptian… See the full description on the dataset page: https://huggingface.co/datasets/fr3on/eg-legal-ner.
ner-data
kaggle.com
zip
Updated Mar 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shivam Kumar (2024). ner-data [Dataset]. https://www.kaggle.com/datasets/shivamk01/ner-data
Explore at:
zip(2020341 bytes)Available download formats
Dataset updated
Mar 11, 2024
Authors
Shivam Kumar
Description
Dataset

This dataset was created by Shivam Kumar

Contents
data-augmentation-ner-results
zenodo.org
data.niaid.nih.gov
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robin Erd; Robin Erd; Leila Feddoul; Leila Feddoul; Clara Lachenmaier; Clara Lachenmaier; Marianne Jana Mauch; Marianne Jana Mauch (2023). data-augmentation-ner-results [Dataset]. http://doi.org/10.5281/zenodo.6956508
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6956508
Dataset updated
May 30, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Robin Erd; Robin Erd; Leila Feddoul; Leila Feddoul; Clara Lachenmaier; Clara Lachenmaier; Marianne Jana Mauch; Marianne Jana Mauch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Model evaluation results produced in the context of evaluating data augmentation for Named Entity Recognition over the German legal domain.

Detailed information can be found on the Github page.
Multilingual NER Dataset
kaggle.com
zip
Updated May 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Konstantin Verner (2022). Multilingual NER Dataset [Dataset]. https://www.kaggle.com/datasets/constantinwerner/multilingual-ner-dataset
Explore at:
zip(837884 bytes)Available download formats
Dataset updated
May 22, 2022
Authors
Konstantin Verner
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
It is a collection of NER datasets in five languages:

German (https://github.com/elenanereiss/Legal-Entity-Recognition) Greek (https://github.com/nmpartzio/elNER) Japanese (https://github.com/stockmarkteam/ner-wikipedia-dataset) Russian (https://github.com/dialogue-evaluation/factRuEval-2016/) Turkish (https://data.mendeley.com/datasets/cdcztymf4k/1)

The annotation was adapted to OntoNotes standard and converted to IOB format. The main purpose of this dataset is evaluation of XLM models.
Z
Romanian Named Entity Recognition in the Legal domain (LegalNERo)
data.niaid.nih.gov
Updated Aug 26, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Păiș, Vasile; Mitrofan, Maria; Gasan, Carol Luca; Ianov, Alexandru; Ghiță, Corvin; Coneschi, Vlad Silviu; Onuț, Andrei (2022). Romanian Named Entity Recognition in the Legal domain (LegalNERo) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4772094
Explore at:
Dataset updated
Aug 26, 2022
Dataset provided by
Research Institute for Artificial Intelligence "Mihai Drăgănescu", Romanian Academy
Authors
Păiș, Vasile; Mitrofan, Maria; Gasan, Carol Luca; Ianov, Alexandru; Ghiță, Corvin; Coneschi, Vlad Silviu; Onuț, Andrei
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
LegalNERo is a manually annotated corpus for named entity recognition in the Romanian legal domain. It provides gold annotations for organizations, locations, persons, time and legal resources mentioned in legal documents. Additionally it offers GEONAMES codes for the named entities annotated as location (where a link could be established).

The LegalNERo corpus is available in different formats: span-based, token-based and RDF. The Linguistic Linked Open Data (LLOD) version is provided in RDF-Turtle format.

CONLLUP files conform to the CoNLL-U Plus format https://universaldependencies.org/ext-format.html . Part-of-speech tagging was realized using UDPIPE. Named entity annotations are placed in the column "RELATE:NE" (the 11th column) as defined in the "global.columns" metadata field. Similarly GEONAMES references are in the column "RELATE:GEONAMES" (the 12th column, last). Automatic processing was performed through the RELATE platform (https://relate.racai.ro).

ANN files conform to BRAT format (https://brat.nlplab.org/).

The archive contains:

ann_LEGAL_PER_LOC_ORG_TIME_overlap Folder in which all the files are in .ann format and contains annotations of: legal resources mentioned, persons, locations, organizations and time. Overlapping annotations of organizations and time entities inside legal references were allowed.

ann_LEGAL_PER_LOC_ORG_TIME Folder in which all the files are in .ann format and contains annotations of: legal resources mentioned, persons, locations, organizations and time. Overlapping annotations were not allowed and only the longest named entities were annotated.

ann_PER_LOC_ORG_TIME Folder in which all the files are in .ann format and contains annotations of: persons, locations, organizations and time. There are no overlapping annotations.

conllup_LEGAL_PER_LOC_ORG_TIME Folder in which all the files are in .conllup format and contains annotations of: legal resources mentioned, persons, locations, organizations and time. Overlapping annotations were not allowed and only the longest named entities were annotated. The annotation of these files was enhanced with GEONAMES codes (where linking was possible).

conllup_PER_LOC_ORG_TIME Folder in which all the files are in .conllup format and contains annotations of: persons, locations, organizations and time. Overlapping annotations were not allowed and only the longest named entities were annotated. The annotation of these files was enhanced with GEONAMES codes (where linking was possible).

rdf Folder containing the corpus in RDF-Turtle format. All the annotations are available here in both span and token format.

text Folder containing the raw texts.

NER System

A NER model generated using the LegalNERo corpus can be used online in the RELATE platform: https://relate.racai.ro/index.php?path=ner/demo

This system was described in: Păiș, Vasile and Mitrofan, Maria and Gasan, Carol Luca and Coneschi, Vlad and Ianov, Alexandru. Named Entity Recognition in the Romanian Legal Domain. In Proceedings of the Natural Legal Language Processing Workshop 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 9--18, nov 2021

LICENSING

This work is provided under the license CC BY-NC-ND 4.0 (Attribution-NonCommercial-NoDerivatives 4.0 International). The license can be viewed online here: https://creativecommons.org/licenses/by-nc-nd/4.0/ and the full text here: https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode .

CONTACT

Research Institute for Artificial Intelligence "Mihai Draganescu", Romanian Academy Web: http://www.racai.ro Contact emails: vasile@racai.ro , maria@racai.ro
h
indian-legal-ner
huggingface.co
Updated Nov 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HF Tuner (2025). indian-legal-ner [Dataset]. https://huggingface.co/datasets/hf-tuner/indian-legal-ner
Explore at:
Dataset updated
Nov 8, 2025
Authors
HF Tuner
Area covered
India
Description
hf-tuner/indian-legal-ner dataset hosted on Hugging Face and contributed by the HF Datasets community
NKP_Legal_Cases
kaggle.com
zip
Updated Mar 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arpan Sapkota (2025). NKP_Legal_Cases [Dataset]. https://www.kaggle.com/arpansapkota/nkp-legal-cases
Explore at:
zip(139812570 bytes)Available download formats
Dataset updated
Mar 30, 2025
Authors
Arpan Sapkota
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The NKP_Legal_Cases datasets (Raw_Legal_Cases.json and Para_Legal_Cases.json) is a curated collection of Nepali legal case texts sourced from publicly available Nepali court documents. It was prepared to focus on legal NLP, text summarization, and evaluation of multilingual LLMs.

The dataset aims to support research in: - Legal document summarization - Named entity recognition (NER) - Legal information retrieval - Document classification - Multilingual NLP for low-resource languages (specifically Nepali)

This dataset addresses the significant gap in publicly accessible legal datasets for Nepali, enabling researchers, students, and practitioners to explore legal AI applications in low-resource contexts.
Legal_NER
kaggle.com
zip
Updated Mar 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ishaan Bhattacharjee (2024). Legal_NER [Dataset]. https://www.kaggle.com/datasets/ishaaaaan/legal-ner
Explore at:
zip(2766869 bytes)Available download formats
Dataset updated
Mar 28, 2024
Authors
Ishaan Bhattacharjee
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Ishaan Bhattacharjee

Released under MIT

Contents
h
LegalLensNER
huggingface.co
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darrow (2024). LegalLensNER [Dataset]. https://huggingface.co/datasets/darrow-ai/LegalLensNER
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 11, 2024
Dataset authored and provided by
Darrow
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Homepage: https://www.darrow.ai/ Repository: https://github.com/darrow-labs/LegalLens Paper: https://arxiv.org/pdf/2402.04335.pdf Point of Contact: Dor Bernsohn,Gil Semo

Overview

LegalLensNER is a dedicated dataset designed for Named Entity Recognition (NER) in the legal domain, with a specific emphasis on detecting legal violations in unstructured texts.

Data Fields

id: (int) A unique identifier for each record. word: (str) The specific word or token in the… See the full description on the dataset page: https://huggingface.co/datasets/darrow-ai/LegalLensNER.
h
legal-ner
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
daishen, legal-ner [Dataset]. https://huggingface.co/datasets/daishen/legal-ner
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
daishen
Description
daishen/legal-ner dataset hosted on Hugging Face and contributed by the HF Datasets community
h
uk_ner_contracts_spacy
huggingface.co
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Law Insider (2023). uk_ner_contracts_spacy [Dataset]. https://huggingface.co/datasets/lawinsider/uk_ner_contracts_spacy
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 19, 2023
Dataset authored and provided by
Law Insider
Description
Dataset Description

Legal Contracts Dataset for Training SpaCy NER Model This repository contains a specially curated dataset consisting of legal contracts. It is designed for the purpose of training a Named Entity Recognition (NER) model using SpaCy, with the aim to recognize and classify four types of entities in the text: Contract Type, Clause Title, Clause Number, Definition Title The dataset includes a broad variety of legal contracts, covering diverse domains such as… See the full description on the dataset page: https://huggingface.co/datasets/lawinsider/uk_ner_contracts_spacy.
d
Replication Data for: Power in Text: Implementing Networks and Institutional...
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaffer, Robert (2023). Replication Data for: Power in Text: Implementing Networks and Institutional Complexity in American Law [Dataset]. http://doi.org/10.7910/DVN/PYAZZE
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/PYAZZE
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Shaffer, Robert
Description
Replication materials for "Power in Text: Implementing Networks and Institutional Complexity in American Law". Contains webscrapers, scraped text, fit NER models, network extraction code, and Bayesian modeling code/results. All data were originally collected in late 2018, so re-scraped data may differ. For details, see comments in individual scripts, as well as the included README file. If at all possible, maintain the original file structure of this repository for easier replication.
h
greek_legal_ner
huggingface.co
Updated May 31, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joel Niklaus (2013). greek_legal_ner [Dataset]. https://huggingface.co/datasets/joelniklaus/greek_legal_ner
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 31, 2013
Authors
Joel Niklaus
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for Greek Legal Named Entity Recognition

Dataset Summary

This dataset contains an annotated corpus for named entity recognition in Greek legislations. It is the first of its kind for the Greek language in such an extended form and one of the few that examines legal text in a full spectrum entity recognition.

Supported Tasks and Leaderboards

The dataset supports the task of named entity recognition.

Languages

The language in the dataset… See the full description on the dataset page: https://huggingface.co/datasets/joelniklaus/greek_legal_ner.
HOME-Alcar: Aligned and Annotated Cartularies
zenodo.org
data.niaid.nih.gov
bin, json, pdf, zip
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dominique Stutzmann; Dominique Stutzmann; Sergio Torres Aguilar; Sergio Torres Aguilar; Paul Chaffenet; Paul Chaffenet (2024). HOME-Alcar: Aligned and Annotated Cartularies [Dataset]. http://doi.org/10.5281/zenodo.5600884
Explore at:
pdf, zip, json, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5600884
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dominique Stutzmann; Dominique Stutzmann; Sergio Torres Aguilar; Sergio Torres Aguilar; Paul Chaffenet; Paul Chaffenet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The HOME-Alcar (Aligned and Annotated Cartularies) corpus was produced as part of the European research project HOME History of Medieval Europe (https://www.heritageresearch-hub.eu/project/home/), led under the coordination oflinebreakof Institut de Recherche et d'Histoire des Textes (PI: D. Stutzmann), with the Universitat Politecnica de Valencia (PI: E. Vidal), the National Archives of the Czech Republic in Prague (PI: J. Kreckova), and Teklia SAS (PI: C. Kermorvant)
The HOME-Alcar (Aligned and Annotated Cartularies) corpus is a resource created to train Handwritten Text Recognition (HTR) and Named Entity Recognition (NER), and presents a collection of
(i) digital images of 17 medieval manuscripts;
(ii) scholarly editions thereof;
(iii) coordinates linking images and text at line level;
(iv) annotations of Named Entities (place and person names).
The 17 medieval manuscripts in this corpus are cartularies, i.e. books copying charters and legal acts, produced between the 12th and 14th centuries.
o
The triall, of Lieut. Collonell John Lilburne, by an extraordinary or...
llds.ling-phil.ox.ac.uk
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Lilburne; Clement Walker (2024). The triall, of Lieut. Collonell John Lilburne, by an extraordinary or special commission, of oyear and terminer at the Guild-Hall of London, the 24, 25, 26. of Octob. 1649. Being as exactly pen'd and taken in short hand, as it was possible to be done in such a croud and noise, and transcribed with an indifferent and even hand, both in reference to the court, and the prisoner; that so matter of fact, as it was there declared, might truly come to publick view. In which is contained all the judges names, and the names of the grand inquest, and the names of the honest jury of life and death. Vnto which is annexed a necessary and essential appendix, very well worth the readers, carefull perusal; if he desire rightly to understand the whole body of the discourse, and know the worth of that ner'e enough to be prised, bulwork of English freedom, viz. to be tried by a jury of legal and good men of the neighbour-hood. / Published by Theodorus Verax. [Dataset]. https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/A96856
Explore at:
Dataset updated
Jun 1, 2024
Authors
John Lilburne; Clement Walker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
(:unav)...........................................
h
eg-legal-multi-task
huggingface.co
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Mardi (2025). eg-legal-multi-task [Dataset]. https://huggingface.co/datasets/fr3on/eg-legal-multi-task
Explore at:
Dataset updated
Oct 8, 2025
Authors
Ahmed Mardi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Arabic Legal Dataset - Multi-Task Legal Learning

Dataset Description

Multi-task learning dataset combining classification, QA, NER, and summarization tasks in unified format. This dataset contains 1,046 examples of multi_task data derived from Egyptian legal texts, including criminal law, civil law, procedural law, and personal status law. The dataset is designed for training and evaluating Arabic legal AI models.

Dataset Summary

Language: Arabic (Egyptian… See the full description on the dataset page: https://huggingface.co/datasets/fr3on/eg-legal-multi-task.

Facebook

Twitter

Click to copy link

Link copied

Cite

Pratik Pujari (2024). Legal NER Dataset [Dataset]. https://www.kaggle.com/datasets/pratikpujarichef/legal-ner-dataset

Legal NER Dataset

Explore at:

374 scholarly articles cite this dataset (View in Google Scholar)

zip(6930691 bytes)Available download formats

Dataset updated

Apr 12, 2024

Authors

Pratik Pujari

Description

Dataset

This dataset was created by Pratik Pujari

Clear search

Close search

Google apps

Main menu

Legal NER Dataset

Dataset

Contents

InLegalNER

LeNER-Br: Portuguese Legal NER

LeNER-Br: Portuguese Legal NER

Labeled Portuguese Legal NER

About this dataset

How to use the dataset

Dataset Overview

Utilizing the Dataset

german-ler

eg-legal-ner

ner-data

Dataset

Contents

data-augmentation-ner-results

Multilingual NER Dataset

Romanian Named Entity Recognition in the Legal domain (LegalNERo)

indian-legal-ner

NKP_Legal_Cases

Legal_NER

Dataset

Contents

LegalLensNER

legal-ner

uk_ner_contracts_spacy

Replication Data for: Power in Text: Implementing Networks and Institutional...

greek_legal_ner

HOME-Alcar: Aligned and Annotated Cartularies

The triall, of Lieut. Collonell John Lilburne, by an extraordinary or...

eg-legal-multi-task

Legal NER Dataset

Dataset

Contents