100+ datasets found

h
universal_ner
huggingface.co
Updated Sep 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Universal NER (2024). universal_ner [Dataset]. https://huggingface.co/datasets/universalner/universal_ner
Explore at:
Dataset updated
Sep 3, 2024
Dataset authored and provided by
Universal NER
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Universal Named Entity Recognition (UNER) aims to fill a gap in multilingual NLP: high quality NER datasets in many languages with a shared tagset.

UNER is modeled after the Universal Dependencies project, in that it is intended to be a large community annotation effort with language-universal guidelines. Further, we use the same text corpora as Universal Dependencies.
h
kaggle-entity-annotated-corpus-ner-dataset
huggingface.co
Updated Jul 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Arias Calles (2022). kaggle-entity-annotated-corpus-ner-dataset [Dataset]. https://huggingface.co/datasets/rjac/kaggle-entity-annotated-corpus-ner-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 10, 2022
Authors
Rafael Arias Calles
License
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Description
Date: 2022-07-10 Files: ner_dataset.csv Source: Kaggle entity annotated corpus notes: The dataset only contains the tokens and ner tag labels. Labels are uppercase.

About Dataset

from Kaggle Datasets

Context

Annotated Corpus for Named Entity Recognition using GMB(Groningen Meaning Bank) corpus for entity classification with enhanced and popular features by Natural Language Processing applied to the data set. Tip: Use Pandas Dataframe to load dataset if using Python for… See the full description on the dataset page: https://huggingface.co/datasets/rjac/kaggle-entity-annotated-corpus-ner-dataset.
Multilingual named entity recognition for medieval charters. Datasets and...
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergio Torres Aguilar; Sergio Torres Aguilar (2023). Multilingual named entity recognition for medieval charters. Datasets and models [Dataset]. http://doi.org/10.5281/zenodo.6463699
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6463699
Dataset updated
Jan 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sergio Torres Aguilar; Sergio Torres Aguilar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Annotated dataset for training named entities recognition models for medieval charters in Latin, French and Spanish.

The original raw texts for all charters were collected from four charters collections

- HOME-ALCAR corpus : https://zenodo.org/record/5600884

- CBMA : http://www.cbma-project.eu

- Diplomata Belgica : https://www.diplomata-belgica.be

- CODEA corpus : https://corpuscodea.es/

We include (i) the annotated training datasets, (ii) the contextual and static embeddings trained on medieval multilingual texts and (iii) the named entity recognition models trained using two architectures: Bi-LSTM-CRF + stacked embeddings and fine-tuning on Bert-based models (mBert and RoBERTa)

Codes, datasets and notebooks used to train models can be consulted in our gitlab repository: https://gitlab.com/magistermilitum/ner_medieval_multilingual

Our best RoBERTa model is also available in the HuggingFace library: https://huggingface.co/magistermilitum/roberta-multilingual-medieval-ner
h
Pile-NER-type
huggingface.co
Updated Aug 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Universal-NER (2023). Pile-NER-type [Dataset]. https://huggingface.co/datasets/Universal-NER/Pile-NER-type
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2023
Authors
Universal-NER
Description
Intro

Pile-NER-type is a set of GPT-generated data for named entity recognition using the type-based data construction prompt. It was collected by prompting gpt-3.5-turbo-0301 and augmented by negative sampling. Check our project page for more information.

License

Attribution-NonCommercial 4.0 International
h
aeroBERT-NER
huggingface.co
Updated Apr 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archana Tikayat Ray (2023). aeroBERT-NER [Dataset]. http://doi.org/10.57967/hf/0470
Explore at:
Unique identifier
https://doi.org/10.57967/hf/0470
Dataset updated
Apr 7, 2023
Authors
Archana Tikayat Ray
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for aeroBERT-NER

Dataset Summary

This dataset contains sentences from the aerospace requirements domain. The sentences are tagged for five NER categories (SYS, VAL, ORG, DATETIME, and RES) using the BIO tagging scheme. There are a total of 1432 sentences. The creation of this dataset is aimed at -
(1) Making available an open-source dataset for aerospace requirements which are often proprietary
(2) Fine-tuning language models for token identification… See the full description on the dataset page: https://huggingface.co/datasets/archanatikayatray/aeroBERT-NER.
Weekly supervised Multilingual Data Set to train Named Entity Recognition...
zenodo.org
Updated Apr 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Izidor Mlakar; Izidor Mlakar; Rigon Sallauka; Rigon Sallauka; Umut Arioz; Umut Arioz; Matej Rojc; Matej Rojc (2025). Weekly supervised Multilingual Data Set to train Named Entity Recognition for Symptom Extraction [Dataset]. http://doi.org/10.5281/zenodo.13918009
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13918009
Dataset updated
Apr 16, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Izidor Mlakar; Izidor Mlakar; Rigon Sallauka; Rigon Sallauka; Umut Arioz; Umut Arioz; Matej Rojc; Matej Rojc
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Sets were generated using the Weakly Supervised NER pipeline (https://github.com/HUMADEX/Weekly-Supervised-NER-pipline) to train the symptom extraction NER models.

Supported Languages and dataset locations for the specific language:

English (base language): https://huggingface.co/HUMADEX/english_medical_ner
German: https://huggingface.co/HUMADEX/german_medical_ner
Italian: https://huggingface.co/HUMADEX/italian_medical_ner
Spanish: https://huggingface.co/HUMADEX/spanish_medical_ner
Greek: https://huggingface.co/HUMADEX/german_medical_ner
Slovenian: https://huggingface.co/HUMADEX/slovenian_medical_ner
Polish: https://huggingface.co/HUMADEX/polish_medical_ner
Portuguese: https://huggingface.co/HUMADEX/portugese_medical_ner

Dataset Building

Data Integration and Preprocessing

Data Cleaning

Annotation with Stanza's i2b2 Clinical Model

Translation into the targeted language

Word Alignment

Data Augmentation

Acknowledgement
This dataset had been created as part of joint research of HUMADEX research group (https://www.linkedin.com/company/101563689/) and has received funding by the European Union Horizon Europe Research and Innovation Program project SMILE (grant number 101080923) and Marie Skłodowska-Curie Actions (MSCA) Doctoral Networks, project BosomShield ((rant number 101073222). Responsibility for the information and views expressed herein lies entirely with the authors.

Authors:
dr. Izidor Mlakar, Rigona Sallauka, dr. Umut Arioz, dr. Matej Rojc

Please cite as:

Article title: Weakly-Supervised Multilingual Medical NER For Symptom Extraction For Low-Resource Languages
Doi: 10.20944/preprints202504.1356.v1
Website: https://www.preprints.org/manuscript/202504.1356/v1" href="https://www.preprints.org/manuscript/202504.1356/v1">https://www.preprints.org/manuscript/202504.1356/v1
h
InLegalNER
huggingface.co
Updated Apr 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenNyAI (2024). InLegalNER [Dataset]. https://huggingface.co/datasets/opennyaiorg/InLegalNER
Explore at:
Dataset updated
Apr 17, 2024
Dataset authored and provided by
OpenNyAI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset for training and evaluating Indian Legal Named Entity Recognition model.

Paper details

Named Entity Recognition in Indian court judgments Arxiv

Label Scheme

View label scheme (14 labels for 1 components)

ENTITY BELONGS TO

LAWYER PREAMBLE

COURT PREAMBLE, JUDGEMENT

JUDGE PREAMBLE, JUDGEMENT

PETITIONER PREAMBLE, JUDGEMENT

RESPONDENT PREAMBLE, JUDGEMENT

CASE_NUMBER JUDGEMENT

GPE JUDGEMENT

DATE JUDGEMENT

ORG JUDGEMENT

STATUTE JUDGEMENT… See the full description on the dataset page: https://huggingface.co/datasets/opennyaiorg/InLegalNER.
h
Annotated_NER_PDF_Resumes
huggingface.co
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MehyarMlaweh (2024). Annotated_NER_PDF_Resumes [Dataset]. https://huggingface.co/datasets/Mehyaar/Annotated_NER_PDF_Resumes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 22, 2024
Authors
MehyarMlaweh
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
IT Skills Named Entity Recognition (NER) Dataset

Description:

This dataset includes 5,029 curriculum vitae (CV) samples, each annotated with IT skills using Named Entity Recognition (NER). The skills are manually labeled and extracted from PDFs, and the data is provided in JSON format. This dataset is ideal for training and evaluating NER models, especially for extracting IT skills from CVs.

Highlights:

5,029 CV samples with annotated IT skills Manual annotations for… See the full description on the dataset page: https://huggingface.co/datasets/Mehyaar/Annotated_NER_PDF_Resumes.
Climate-Change-NER
huggingface.co
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM Research (2024). Climate-Change-NER [Dataset]. https://huggingface.co/datasets/ibm-research/Climate-Change-NER
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 11, 2024
Dataset provided by
IBMhttp://ibm.com/
IBM Research
Authors
IBM Research
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
Dataset Card for Climate Change NER

The Climate Change NER is an English-language dataset containing 534 abstracts of climate-related papers. They have been sourced from the Semantic Scholar Academic Graph "abstracts" dataset. The abstracts have been manually annotated by classifying climate-related tokens in a set of 13 categories.

Dataset Details Dataset Description

We introduce a comprehensive dataset for developing and evaluating NLP models tailored towards… See the full description on the dataset page: https://huggingface.co/datasets/ibm-research/Climate-Change-NER.
h
PII-NER
huggingface.co
Updated Jul 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph G Flowers (2024). PII-NER [Dataset]. https://huggingface.co/datasets/Josephgflowers/PII-NER
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 20, 2024
Authors
Joseph G Flowers
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for NER PII Extraction Dataset Dataset Summary This dataset is designed for training and evaluating Named Entity Recognition (NER) models focused on extracting Personally Identifiable Information (PII) from text. It includes a variety of entities such as names, addresses, phone numbers, email addresses, and identification numbers. The dataset is suitable for tasks that involve PII detection, compliance checks, and data anonymization. Supported Tasks and Leaderboards Named Entity… See the full description on the dataset page: https://huggingface.co/datasets/Josephgflowers/PII-NER.
h
bioleaflets-biomedical-ner
huggingface.co
Updated May 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruslan Yermak (2023). bioleaflets-biomedical-ner [Dataset]. https://huggingface.co/datasets/ruslan/bioleaflets-biomedical-ner
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 7, 2023
Authors
Ruslan Yermak
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for BioLeaflets Dataset

Dataset Summary

BioLeaflets is a biomedical dataset for Data2Text generation. It is a corpus of 1,336 package leaflets of medicines authorised in Europe, which were obtained by scraping the European Medicines Agency (EMA) website. Package leaflets are included in the packaging of medicinal products and contain information to help patients use the product safely and appropriately. This dataset comprises the large majority (∼ 90%) of… See the full description on the dataset page: https://huggingface.co/datasets/ruslan/bioleaflets-biomedical-ner.
o
The Chilean Waiting List Corpus
explore.openaire.eu
zenodo.org
Updated Jul 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pablo Báez; Fabián Villena; Matías Rojas; Felipe Bravo-Marquez; Jocelyn Dunstan (2020). The Chilean Waiting List Corpus [Dataset]. http://doi.org/10.5281/zenodo.3926704
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3926704
Dataset updated
Jul 1, 2020
Authors
Pablo Báez; Fabián Villena; Matías Rojas; Felipe Bravo-Marquez; Jocelyn Dunstan
Area covered
Chile
Description
Here we describe a new clinical corpus rich in nested entities and a series of neural models to identify them. The corpus comprises de-identified referrals from the waiting list in Chilean public hospitals. A subset of 9,000 referrals (medical and dental) was manually annotated with ten types of entities, six attributes, and pairs of relations with clinical relevance. A trained medical doctor or dentist annotated these referrals and then, together with three other researchers, consolidated each of the annotations. The annotated corpus has more than 48% of entities embedded in other entities or containing another. We use this corpus to build Named Entity Recognition (NER) models. The best results were achieved using Multiple Single-entity architectures with clinical word embeddings stacked with character and Flair contextual embeddings (refer to this paper: https://aclanthology.org/2022.coling-1.184/). The entity with the best performance is abbreviation, and the hardest to recognize is finding. NER models applied to this corpus can leverage statistics of diseases and pending procedures. This work constitutes the first annotated corpus using clinical narratives from Chile and one of the few in Spanish. The annotated corpus, clinical word embeddings, annotation guidelines, and neural models are freely released to the community.This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/. We are releasing the dataset in 3 formats: cwlc.zip: Contains the raw text files for each document along with its annotation file in Standoff format cwlc_conll-format: CoNLL format for training NER models. In addition, the dataset has been released in hugging face (https://huggingface.co/plncmm) to facilitate experiments with transformer-based architectures.
h
azerbaijani-ner-dataset
huggingface.co
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LocalDoc (2024). azerbaijani-ner-dataset [Dataset]. http://doi.org/10.57967/hf/2484
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/2484
Dataset updated
Jun 13, 2024
Dataset authored and provided by
LocalDoc
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Azerbaijani Named Entity Recognition (NER) Dataset

This repository contains the dataset for training and evaluating Named Entity Recognition (NER) models in the Azerbaijani language. The dataset includes annotated text data with various named entities.

Dataset Description

The dataset includes the following entity types:

0: O: Outside any named entity 1: PERSON: Names of individuals 2: LOCATION: Geographical locations, both man-made and natural 3: ORGANISATION: Names of… See the full description on the dataset page: https://huggingface.co/datasets/LocalDoc/azerbaijani-ner-dataset.
h
ancora-ca-ner
huggingface.co
Updated Nov 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Projecte Aina (2021). ancora-ca-ner [Dataset]. https://huggingface.co/datasets/projecte-aina/ancora-ca-ner
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 1, 2021
Dataset authored and provided by
Projecte Aina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for AnCora-Ca-NER

Dataset Summary

This is a dataset for Named Entity Recognition (NER) in Catalan. It adapts AnCora corpus for Machine Learning and Language Model evaluation purposes. This dataset was developed by BSC TeMU as part of the Projecte AINA, to enrich the Catalan Language Understanding Benchmark (CLUB).

Supported Tasks and Leaderboards

Named Entities Recognition, Language Model

Languages

The dataset is in Catalan… See the full description on the dataset page: https://huggingface.co/datasets/projecte-aina/ancora-ca-ner.
h
Funder-NER
huggingface.co
Updated Aug 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ZBW Leibniz Information Center for Economics (2023). Funder-NER [Dataset]. http://doi.org/10.57967/hf/1011
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/1011
Dataset updated
Aug 25, 2023
Dataset authored and provided by
ZBW Leibniz Information Center for Economics
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for Dataset Named Entity Recognition of funders of scientific research

Dataset Summary

Training/test set for automatically identifying funder entities mentioned in scientific papers. This data set is generated from Open Access documents hosted at https://econstor.eu and manually curated/labeled.

Supported Tasks and Leaderboards

The dataset is for training and testing the automatic recognition of funders as they are acknowledged in scientific… See the full description on the dataset page: https://huggingface.co/datasets/ZBWatHF/Funder-NER.
O
Polyglot-NER
opendatalab.com
huggingface.co
zip
Updated Apr 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stony Brook University (2023). Polyglot-NER [Dataset]. https://opendatalab.com/OpenDataLab/Polyglot-NER
Explore at:
zip(3575536533 bytes)Available download formats
Dataset updated
Apr 7, 2023
Dataset provided by
Stony Brook University
Description
Polyglot-NER builds massive multilingual annotators with minimal human expertise and intervention.
h
requirements-ner-id
huggingface.co
Updated Jul 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dekai Xiao (2023). requirements-ner-id [Dataset]. https://huggingface.co/datasets/dxiao/requirements-ner-id
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 12, 2023
Authors
Dekai Xiao
Description
dxiao/requirements-ner-id dataset hosted on Hugging Face and contributed by the HF Datasets community
h
RaTE-NER
huggingface.co
Updated Jun 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weike Zhao (2024). RaTE-NER [Dataset]. https://huggingface.co/datasets/Angelakeke/RaTE-NER
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2024
Authors
Weike Zhao
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for RaTE-NER Dataset

GitHub | Paper

Dataset Summary

RaTE-NER dataset is a large-scale, radiological named entity recognition (NER) dataset, including 13,235 manually annotated sentences from 1,816 reports within the MIMIC-IV database, that spans 9 imaging modalities and 23 anatomical regions, ensuring comprehensive coverage. Additionally, we further enriched the dataset with 33,605 sentences from the 17,432 reports available on Radiopaedia, by… See the full description on the dataset page: https://huggingface.co/datasets/Angelakeke/RaTE-NER.
h
finer-139
huggingface.co
opendatalab.com
Updated May 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AUEB NLP Group (2022). finer-139 [Dataset]. https://huggingface.co/datasets/nlpaueb/finer-139
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 9, 2022
Authors
AUEB NLP Group
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
FiNER-139 is a named entity recognition dataset consisting of 10K annual and quarterly English reports (filings) of publicly traded companies downloaded from the U.S. Securities and Exchange Commission (SEC) annotated with 139 XBRL tags in the IOB2 format.
h
grocery-ner-dataset
huggingface.co
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
empathy.ai (2025). grocery-ner-dataset [Dataset]. https://huggingface.co/datasets/empathyai/grocery-ner-dataset
Explore at:
Dataset updated
May 13, 2025
Dataset provided by
empathy.ai
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Groceries Named Entity Recognition (NER) Dataset

A specialized dataset for identifying food and grocery items in natural language text using Named Entity Recognition (NER).

Entity Types

The dataset includes the following grocery categories:

Fruits Vegetables: Fresh produce (e.g., apples, spinach) Lactose, Diary, Eggs, Cheese, Yoghurt: Dairy products and eggs Meat, Fish, Seafood: Protein sources Frozen, Prepared Meals: Ready-to-eat and frozen meals Baking, Cooking: Baking… See the full description on the dataset page: https://huggingface.co/datasets/empathyai/grocery-ner-dataset.

Facebook

Twitter

Click to copy link

Link copied

Cite

Universal NER (2024). universal_ner [Dataset]. https://huggingface.co/datasets/universalner/universal_ner

universal_ner

universalner/universal_ner

Explore at:

Dataset updated

Sep 3, 2024

Dataset authored and provided by

Universal NER

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Universal Named Entity Recognition (UNER) aims to fill a gap in multilingual NLP: high quality NER datasets in many languages with a shared tagset.

UNER is modeled after the Universal Dependencies project, in that it is intended to be a large community annotation effort with language-universal guidelines. Further, we use the same text corpora as Universal Dependencies.

Clear search

Close search

Google apps

Main menu

universal_ner

kaggle-entity-annotated-corpus-ner-dataset

Multilingual named entity recognition for medieval charters. Datasets and...

Pile-NER-type

aeroBERT-NER

Weekly supervised Multilingual Data Set to train Named Entity Recognition...

InLegalNER

Annotated_NER_PDF_Resumes

Climate-Change-NER

PII-NER

bioleaflets-biomedical-ner

The Chilean Waiting List Corpus

azerbaijani-ner-dataset

ancora-ca-ner

Funder-NER

Polyglot-NER

requirements-ner-id

RaTE-NER

finer-139

grocery-ner-dataset

universal_nerSee More Versions

universalner/universal_ner

universal_ner