42 datasets found

T
conll2003
tensorflow.org
opendatalab.com
+1more
Updated Dec 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). conll2003 [Dataset]. https://www.tensorflow.org/datasets/catalog/conll2003
Explore at:
Dataset updated
Dec 22, 2022
Description
The shared task of CoNLL-2003 concerns language-independent named entity recognition and concentrates on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('conll2003', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
h
conll2003
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TNER, conll2003 [Dataset]. https://huggingface.co/datasets/tner/conll2003
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
TNER
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
CoNLL 2003 NER dataset
h
test3
huggingface.co
Updated Apr 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mehrab Hossain (2024). test3 [Dataset]. https://huggingface.co/datasets/mHossain/test3
Explore at:
Dataset updated
Apr 23, 2024
Authors
Mehrab Hossain
Description
The shared task of CoNLL-2003 concerns language-independent named entity recognition. We will concentrate on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups. The CoNLL-2003 shared task data files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase. Note the dataset uses IOB2 tagging scheme, whereas the original dataset uses IOB1. For more details see https://www.clips.uantwerpen.be/conll2003/ner/ and https://www.aclweb.org/anthology/W03-0419
E
English Model (CoNLL-2003) for NameTag
live.european-language-grid.eu
Updated Apr 7, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2014). English Model (CoNLL-2003) for NameTag [Dataset]. https://live.european-language-grid.eu/catalogue/ld/18229
Explore at:
Dataset updated
Apr 7, 2014
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
English model for NameTag, a named entity recognition tool. The model is trained on CoNLL-2003 training data. Recognizes PER, ORG, LOC and MISC named entities. Achieves F-measure 84.73 on CoNLL-2003 test data.
h
conll2003-mini
huggingface.co
Updated Aug 22, 1996
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luca Bandelli (1996). conll2003-mini [Dataset]. https://huggingface.co/datasets/bandoos/conll2003-mini
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 22, 1996
Authors
Luca Bandelli
Description
!! forked version producing at most 10 items per split !!

The shared task of CoNLL-2003 concerns language-independent named entity recognition. We will concentrate on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups.

The CoNLL-2003 shared task data files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase. Note the dataset uses IOB2 tagging scheme, whereas the original dataset uses IOB1.

For more details see https://www.clips.uantwerpen.be/conll2003/ner/ and https://www.aclweb.org/anthology/W03-0419
t
CoNLL 2003 NER dataset
service.tib.eu
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). CoNLL 2003 NER dataset [Dataset]. https://service.tib.eu/ldmservice/dataset/conll-2003-ner-dataset
Explore at:
Dataset updated
Nov 25, 2024
Description
The CoNLL 2003 shared task dataset is focused on named entity recognition tasks.
P
CoNLL Dataset
paperswithcode.com
library.toponeai.link
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CoNLL Dataset [Dataset]. https://paperswithcode.com/dataset/conll-1
Explore at:
Description
The CoNLL dataset is a widely used resource in the field of natural language processing (NLP). The term “CoNLL” stands for Conference on Natural Language Learning. It originates from a series of shared tasks organized at the Conferences of Natural Language Learning.
CoNLL-2003
kaggle.com
Updated Aug 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhou He (2021). CoNLL-2003 [Dataset]. https://www.kaggle.com/henavajov/conll2003/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Zhou He
Description
Dataset

This dataset was created by Zhou He

Contents
CoNLL 2003
kaggle.com
zip
Updated Mar 14, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GONG ZEQUN (2021). CoNLL 2003 [Dataset]. https://www.kaggle.com/gongzequn/conll-2003
Explore at:
zip(941044 bytes)Available download formats
Dataset updated
Mar 14, 2021
Authors
GONG ZEQUN
Description
Dataset

This dataset was created by GONG ZEQUN

Contents

It contains the following files:
t
CoNLL 2003 - Dataset - LDM
service.tib.eu
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). CoNLL 2003 - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/conll-2003
Explore at:
Dataset updated
Nov 25, 2024
Description
The CoNLL 2003 dataset contains 1393 articles with about 34K mentions, and the standard performance metric is mention-averaged accuracy.
h
conll2003-generative
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ana Areias, conll2003-generative [Dataset]. https://huggingface.co/datasets/areias/conll2003-generative
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Ana Areias
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for CoNLL-2003 with NER Workflow Enhancements

This dataset is a modified version of the CoNLL-2003 dataset, enhanced to support an LLM-based Named Entity Recognition (NER) workflow. Two new columns, sentence and entities, have been added. You can find the code used to generate this version together with the data files.

Named Entities

As in the original CoNLL-2003 task, this dataset focuses on four types of named entities:

Persons (PER) Locations (LOC)… See the full description on the dataset page: https://huggingface.co/datasets/areias/conll2003-generative.
P
AIDA CoNLL-YAGO Dataset
paperswithcode.com
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Hoffart; Mohamed Amir Yosef; Ilaria Bordino; Hagen Fürstenau; Manfred Pinkal; Marc Spaniol; Bilyana Taneva; Stefan Thater; Gerhard Weikum, AIDA CoNLL-YAGO Dataset [Dataset]. https://paperswithcode.com/dataset/aida-conll-yago
Explore at:
Authors
Johannes Hoffart; Mohamed Amir Yosef; Ilaria Bordino; Hagen Fürstenau; Manfred Pinkal; Marc Spaniol; Bilyana Taneva; Stefan Thater; Gerhard Weikum
Description
AIDA CoNLL-YAGO contains assignments of entities to the mentions of named entities annotated for the original CoNLL 2003 entity recognition task. The entities are identified by YAGO2 entity name, by Wikipedia URL, or by Freebase mid.
l
NameTag 3 Multilingual CoNLL Model
lindat.cz
live.european-language-grid.eu
Updated Aug 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jana Straková (2024). NameTag 3 Multilingual CoNLL Model [Dataset]. https://lindat.cz/repository/xmlui/handle/11234/1-5678
Explore at:
Dataset updated
Aug 30, 2024
Authors
Jana Straková
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This is a trained model for the supervised machine learning tool NameTag 3 (https://ufal.mff.cuni.cz/nametag/3/), trained jointly on several NE corpora: English CoNLL-2003, German CoNLL-2003, Dutch CoNLL-2002, Spanish CoNLL-2002, Ukrainian Lang-uk, and Czech CNEC 2.0, all harmonized to flat NEs with 4 labels PER, ORG, LOC, and MISC. NameTag 3 is an open-source tool for both flat and nested named entity recognition (NER). NameTag 3 identifies proper names in text and classifies them into a set of predefined categories, such as names of persons, locations, organizations, etc. The model documentation can be found at https://ufal.mff.cuni.cz/nametag/3/models#multilingual-conll.
P
FIN Dataset
paperswithcode.com
opendatalab.com
Updated Jun 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julio Cesar Salinas Alvarado; Karin Verspoor; Timothy Baldwin (2021). FIN Dataset [Dataset]. https://paperswithcode.com/dataset/fin
Explore at:
Dataset updated
Jun 18, 2021
Authors
Julio Cesar Salinas Alvarado; Karin Verspoor; Timothy Baldwin
Description
A dataset of financial agreements made public through U.S. Security and Exchange Commission (SEC) filings. Eight documents (totalling 54,256 words) were randomly selected for manual annotation, based on the four NE types provided in the CoNLL-2003 dataset: LOCATION (LOC), ORGANISATION (ORG), PERSON (PER), and MISCELLANEOUS (MISC).
h
demo
huggingface.co
Updated Apr 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marco Montelatici (2023). demo [Dataset]. https://huggingface.co/datasets/marcolin/demo
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 16, 2023
Authors
Marco Montelatici
Description
The shared task of CoNLL-2003 concerns language-independent named entity recognition. We will concentrate on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups. The CoNLL-2003 shared task data files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase. Note the dataset uses IOB2 tagging scheme, whereas the original dataset uses IOB1. For more details see https://www.clips.uantwerpen.be/conll2003/ner/ and https://www.aclweb.org/anthology/W03-0419
Data from: Learning multilingual named entity recognition from Wikipedia
figshare.com
bz2
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joel Nothman; Nicky Ringland; Will Radford; Tara Murphy; James R Curran (2023). Learning multilingual named entity recognition from Wikipedia [Dataset]. http://doi.org/10.6084/m9.figshare.5462500.v1
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5462500.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Joel Nothman; Nicky Ringland; Will Radford; Tara Murphy; James R Curran
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the data associated with Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy and James R. Curran (2013), "Learning multilingual named entity recognition from Wikipedia", Artificial Intelligence 194 (DOI: 10.1016/j.artint.2012.03.006). A preprint is included here as wikiner-preprint.pdfThis data was originally available at http://schwa.org/resources (which linked to http://schwa.org/projects/resources/wiki/Wikiner).The .bz2 files are NER training corpora produced as reported in the Artificial Intelligence paper. wp2 and wp3 are differentiated by wp3 using a higher level of link inference. They use a pipe-delimited format that can be converted to CoNLL 2003 format with system2conll.pl.nothman08types.tsv is a manual classification of articles first used in Joel Nothman, James R. Curran and Tara Murphy (2008), "Transforming Wikipedia into Named Entity Training Data", In Proceedings of the Australasian Language Technology Association Workshop 2008. http://aclanthology.coli.uni-saarland.de/pdf/U/U08/U08-1016.pdfpopular.tsv and random.tsv are manual article classifications developed for the Artifiical Intelligence paper based on different strategies for sampling articles from Wikipedia in order to account for Wikipedia's biased distribution (see that paper). scheme.tsv maps these fine-grained labels to coarser annotations including CoNLL 2003-style.wikigold.conll.txt is a manual NER annotation of some Wikipedia text as presented in Dominic Balasuriya and Nicky Ringland and Joel Nothman and Tara Murphy and James R. Curran (2009), in Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources (http://www.aclweb.org/anthology/W/W09/W09-3302).See also corpora produced similarly in an enhanced version of this work work (Pan et al., "Cross-lingual Name Tagging and Linking for 282 Languages", ACL 2017) at http://nlp.cs.rpi.edu/wikiann/.
P
CoNLL++ Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zihan Wang; Jingbo Shang; Liyuan Liu; Lihao Lu; Jiacheng Liu; Jiawei Han, CoNLL++ Dataset [Dataset]. https://paperswithcode.com/dataset/conll
Explore at:
Authors
Zihan Wang; Jingbo Shang; Liyuan Liu; Lihao Lu; Jiacheng Liu; Jiawei Han
Description
CoNLL++ is a corrected version of the CoNLL03 NER dataset where 5.38% of the test sentences have been fixed.
l
AlbNER Named Entity Recognition in Albanian
lindat.cz
live.european-language-grid.eu
+1more
Updated Sep 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erion Çano (2023). AlbNER Named Entity Recognition in Albanian [Dataset]. https://lindat.cz/repository/xmlui/handle/11234/1-5214
Explore at:
Dataset updated
Sep 19, 2023
Authors
Erion Çano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AlbNER is a Named Entity Recognition corpus of Wikipedia sentences in Albanian, consisting of 900 records. The sentence tokens are manually labeled complying with the CoNLL-2003 shared task annotation scheme explained at https://aclanthology.org/W03-0419.pdf that uses I-ORG, B-ORG, I-PER, B-PER, I-LOC, B-LOC, I-MISC, B-MISC and O tags. AlbNER data are released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). If using AlbMoRe corpus, please cite the following paper: Çano Erion. AlbNER: A Corpus for Named Entity Recognition in Albanian. CoRR, abs/2309.08741, 2023. URL https://arxiv.org/abs/2309.08741.
CoNLL2003 Dataset
kaggle.com
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julian Garratt (2022). CoNLL2003 Dataset [Dataset]. https://www.kaggle.com/datasets/juliangarratt/conll2003-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 28, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Julian Garratt
Description
Dataset

This dataset was created by Julian Garratt

Contents
h
conll2003
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Casanova, conll2003 [Dataset]. https://huggingface.co/datasets/Zarinah/conll2003
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Casanova
Description
Zarinah/conll2003 dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

(2022). conll2003 [Dataset]. https://www.tensorflow.org/datasets/catalog/conll2003

conll2003

Explore at:

Dataset updated

Dec 22, 2022

Description

The shared task of CoNLL-2003 concerns language-independent named entity recognition and concentrates on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('conll2003', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Clear search

Close search

Google apps

Main menu

conll2003

conll2003

test3

English Model (CoNLL-2003) for NameTag

conll2003-mini

CoNLL 2003 NER dataset

CoNLL Dataset

CoNLL-2003

Dataset

Contents

CoNLL 2003

Dataset

Contents

CoNLL 2003 - Dataset - LDM

conll2003-generative

AIDA CoNLL-YAGO Dataset

NameTag 3 Multilingual CoNLL Model

FIN Dataset

demo

Data from: Learning multilingual named entity recognition from Wikipedia

CoNLL++ Dataset

AlbNER Named Entity Recognition in Albanian

CoNLL2003 Dataset

Dataset

Contents

conll2003

conll2003