42 datasets found
  1. T

    conll2003

    • tensorflow.org
    • opendatalab.com
    • +1more
    Updated Dec 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). conll2003 [Dataset]. https://www.tensorflow.org/datasets/catalog/conll2003
    Explore at:
    Dataset updated
    Dec 22, 2022
    Description

    The shared task of CoNLL-2003 concerns language-independent named entity recognition and concentrates on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('conll2003', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  2. h

    conll2003

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TNER, conll2003 [Dataset]. https://huggingface.co/datasets/tner/conll2003
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    TNER
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description
  3. h

    test3

    • huggingface.co
    Updated Apr 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mehrab Hossain (2024). test3 [Dataset]. https://huggingface.co/datasets/mHossain/test3
    Explore at:
    Dataset updated
    Apr 23, 2024
    Authors
    Mehrab Hossain
    Description

    The shared task of CoNLL-2003 concerns language-independent named entity recognition. We will concentrate on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups. The CoNLL-2003 shared task data files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase. Note the dataset uses IOB2 tagging scheme, whereas the original dataset uses IOB1. For more details see https://www.clips.uantwerpen.be/conll2003/ner/ and https://www.aclweb.org/anthology/W03-0419

  4. E

    English Model (CoNLL-2003) for NameTag

    • live.european-language-grid.eu
    Updated Apr 7, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2014). English Model (CoNLL-2003) for NameTag [Dataset]. https://live.european-language-grid.eu/catalogue/ld/18229
    Explore at:
    Dataset updated
    Apr 7, 2014
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    English model for NameTag, a named entity recognition tool. The model is trained on CoNLL-2003 training data. Recognizes PER, ORG, LOC and MISC named entities. Achieves F-measure 84.73 on CoNLL-2003 test data.

  5. h

    conll2003-mini

    • huggingface.co
    Updated Aug 22, 1996
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luca Bandelli (1996). conll2003-mini [Dataset]. https://huggingface.co/datasets/bandoos/conll2003-mini
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 22, 1996
    Authors
    Luca Bandelli
    Description

    !! forked version producing at most 10 items per split !!

    The shared task of CoNLL-2003 concerns language-independent named entity recognition. We will concentrate on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups.

    The CoNLL-2003 shared task data files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase. Note the dataset uses IOB2 tagging scheme, whereas the original dataset uses IOB1.

    For more details see https://www.clips.uantwerpen.be/conll2003/ner/ and https://www.aclweb.org/anthology/W03-0419

  6. t

    CoNLL 2003 NER dataset

    • service.tib.eu
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). CoNLL 2003 NER dataset [Dataset]. https://service.tib.eu/ldmservice/dataset/conll-2003-ner-dataset
    Explore at:
    Dataset updated
    Nov 25, 2024
    Description

    The CoNLL 2003 shared task dataset is focused on named entity recognition tasks.

  7. P

    CoNLL Dataset

    • paperswithcode.com
    • library.toponeai.link
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CoNLL Dataset [Dataset]. https://paperswithcode.com/dataset/conll-1
    Explore at:
    Description

    The CoNLL dataset is a widely used resource in the field of natural language processing (NLP). The term “CoNLL” stands for Conference on Natural Language Learning. It originates from a series of shared tasks organized at the Conferences of Natural Language Learning.

  8. CoNLL-2003

    • kaggle.com
    Updated Aug 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhou He (2021). CoNLL-2003 [Dataset]. https://www.kaggle.com/henavajov/conll2003/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 15, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Zhou He
    Description

    Dataset

    This dataset was created by Zhou He

    Contents

  9. CoNLL 2003

    • kaggle.com
    zip
    Updated Mar 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GONG ZEQUN (2021). CoNLL 2003 [Dataset]. https://www.kaggle.com/gongzequn/conll-2003
    Explore at:
    zip(941044 bytes)Available download formats
    Dataset updated
    Mar 14, 2021
    Authors
    GONG ZEQUN
    Description

    Dataset

    This dataset was created by GONG ZEQUN

    Contents

    It contains the following files:

  10. t

    CoNLL 2003 - Dataset - LDM

    • service.tib.eu
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). CoNLL 2003 - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/conll-2003
    Explore at:
    Dataset updated
    Nov 25, 2024
    Description

    The CoNLL 2003 dataset contains 1393 articles with about 34K mentions, and the standard performance metric is mention-averaged accuracy.

  11. h

    conll2003-generative

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ana Areias, conll2003-generative [Dataset]. https://huggingface.co/datasets/areias/conll2003-generative
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Ana Areias
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for CoNLL-2003 with NER Workflow Enhancements

    This dataset is a modified version of the CoNLL-2003 dataset, enhanced to support an LLM-based Named Entity Recognition (NER) workflow. Two new columns, sentence and entities, have been added. You can find the code used to generate this version together with the data files.

      Named Entities
    

    As in the original CoNLL-2003 task, this dataset focuses on four types of named entities:

    Persons (PER) Locations (LOC)… See the full description on the dataset page: https://huggingface.co/datasets/areias/conll2003-generative.

  12. P

    AIDA CoNLL-YAGO Dataset

    • paperswithcode.com
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Hoffart; Mohamed Amir Yosef; Ilaria Bordino; Hagen Fürstenau; Manfred Pinkal; Marc Spaniol; Bilyana Taneva; Stefan Thater; Gerhard Weikum, AIDA CoNLL-YAGO Dataset [Dataset]. https://paperswithcode.com/dataset/aida-conll-yago
    Explore at:
    Authors
    Johannes Hoffart; Mohamed Amir Yosef; Ilaria Bordino; Hagen Fürstenau; Manfred Pinkal; Marc Spaniol; Bilyana Taneva; Stefan Thater; Gerhard Weikum
    Description

    AIDA CoNLL-YAGO contains assignments of entities to the mentions of named entities annotated for the original CoNLL 2003 entity recognition task. The entities are identified by YAGO2 entity name, by Wikipedia URL, or by Freebase mid.

  13. l

    NameTag 3 Multilingual CoNLL Model

    • lindat.cz
    • live.european-language-grid.eu
    Updated Aug 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jana Straková (2024). NameTag 3 Multilingual CoNLL Model [Dataset]. https://lindat.cz/repository/xmlui/handle/11234/1-5678
    Explore at:
    Dataset updated
    Aug 30, 2024
    Authors
    Jana Straková
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is a trained model for the supervised machine learning tool NameTag 3 (https://ufal.mff.cuni.cz/nametag/3/), trained jointly on several NE corpora: English CoNLL-2003, German CoNLL-2003, Dutch CoNLL-2002, Spanish CoNLL-2002, Ukrainian Lang-uk, and Czech CNEC 2.0, all harmonized to flat NEs with 4 labels PER, ORG, LOC, and MISC. NameTag 3 is an open-source tool for both flat and nested named entity recognition (NER). NameTag 3 identifies proper names in text and classifies them into a set of predefined categories, such as names of persons, locations, organizations, etc. The model documentation can be found at https://ufal.mff.cuni.cz/nametag/3/models#multilingual-conll.

  14. P

    FIN Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jun 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julio Cesar Salinas Alvarado; Karin Verspoor; Timothy Baldwin (2021). FIN Dataset [Dataset]. https://paperswithcode.com/dataset/fin
    Explore at:
    Dataset updated
    Jun 18, 2021
    Authors
    Julio Cesar Salinas Alvarado; Karin Verspoor; Timothy Baldwin
    Description

    A dataset of financial agreements made public through U.S. Security and Exchange Commission (SEC) filings. Eight documents (totalling 54,256 words) were randomly selected for manual annotation, based on the four NE types provided in the CoNLL-2003 dataset: LOCATION (LOC), ORGANISATION (ORG), PERSON (PER), and MISCELLANEOUS (MISC).

  15. h

    demo

    • huggingface.co
    Updated Apr 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Montelatici (2023). demo [Dataset]. https://huggingface.co/datasets/marcolin/demo
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 16, 2023
    Authors
    Marco Montelatici
    Description

    The shared task of CoNLL-2003 concerns language-independent named entity recognition. We will concentrate on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups. The CoNLL-2003 shared task data files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase. Note the dataset uses IOB2 tagging scheme, whereas the original dataset uses IOB1. For more details see https://www.clips.uantwerpen.be/conll2003/ner/ and https://www.aclweb.org/anthology/W03-0419

  16. Data from: Learning multilingual named entity recognition from Wikipedia

    • figshare.com
    bz2
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joel Nothman; Nicky Ringland; Will Radford; Tara Murphy; James R Curran (2023). Learning multilingual named entity recognition from Wikipedia [Dataset]. http://doi.org/10.6084/m9.figshare.5462500.v1
    Explore at:
    bz2Available download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Joel Nothman; Nicky Ringland; Will Radford; Tara Murphy; James R Curran
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the data associated with Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy and James R. Curran (2013), "Learning multilingual named entity recognition from Wikipedia", Artificial Intelligence 194 (DOI: 10.1016/j.artint.2012.03.006). A preprint is included here as wikiner-preprint.pdfThis data was originally available at http://schwa.org/resources (which linked to http://schwa.org/projects/resources/wiki/Wikiner).The .bz2 files are NER training corpora produced as reported in the Artificial Intelligence paper. wp2 and wp3 are differentiated by wp3 using a higher level of link inference. They use a pipe-delimited format that can be converted to CoNLL 2003 format with system2conll.pl.nothman08types.tsv is a manual classification of articles first used in Joel Nothman, James R. Curran and Tara Murphy (2008), "Transforming Wikipedia into Named Entity Training Data", In Proceedings of the Australasian Language Technology Association Workshop 2008. http://aclanthology.coli.uni-saarland.de/pdf/U/U08/U08-1016.pdfpopular.tsv and random.tsv are manual article classifications developed for the Artifiical Intelligence paper based on different strategies for sampling articles from Wikipedia in order to account for Wikipedia's biased distribution (see that paper). scheme.tsv maps these fine-grained labels to coarser annotations including CoNLL 2003-style.wikigold.conll.txt is a manual NER annotation of some Wikipedia text as presented in Dominic Balasuriya and Nicky Ringland and Joel Nothman and Tara Murphy and James R. Curran (2009), in Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources (http://www.aclweb.org/anthology/W/W09/W09-3302).See also corpora produced similarly in an enhanced version of this work work (Pan et al., "Cross-lingual Name Tagging and Linking for 282 Languages", ACL 2017) at http://nlp.cs.rpi.edu/wikiann/.

  17. P

    CoNLL++ Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zihan Wang; Jingbo Shang; Liyuan Liu; Lihao Lu; Jiacheng Liu; Jiawei Han, CoNLL++ Dataset [Dataset]. https://paperswithcode.com/dataset/conll
    Explore at:
    Authors
    Zihan Wang; Jingbo Shang; Liyuan Liu; Lihao Lu; Jiacheng Liu; Jiawei Han
    Description

    CoNLL++ is a corrected version of the CoNLL03 NER dataset where 5.38% of the test sentences have been fixed.

  18. l

    AlbNER Named Entity Recognition in Albanian

    • lindat.cz
    • live.european-language-grid.eu
    • +1more
    Updated Sep 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erion Çano (2023). AlbNER Named Entity Recognition in Albanian [Dataset]. https://lindat.cz/repository/xmlui/handle/11234/1-5214
    Explore at:
    Dataset updated
    Sep 19, 2023
    Authors
    Erion Çano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    AlbNER is a Named Entity Recognition corpus of Wikipedia sentences in Albanian, consisting of 900 records. The sentence tokens are manually labeled complying with the CoNLL-2003 shared task annotation scheme explained at https://aclanthology.org/W03-0419.pdf that uses I-ORG, B-ORG, I-PER, B-PER, I-LOC, B-LOC, I-MISC, B-MISC and O tags. AlbNER data are released under CC-BY license (https://creativecommons.org/licenses/by/4.0/). If using AlbMoRe corpus, please cite the following paper: Çano Erion. AlbNER: A Corpus for Named Entity Recognition in Albanian. CoRR, abs/2309.08741, 2023. URL https://arxiv.org/abs/2309.08741.

  19. CoNLL2003 Dataset

    • kaggle.com
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Garratt (2022). CoNLL2003 Dataset [Dataset]. https://www.kaggle.com/datasets/juliangarratt/conll2003-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Julian Garratt
    Description

    Dataset

    This dataset was created by Julian Garratt

    Contents

  20. h

    conll2003

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casanova, conll2003 [Dataset]. https://huggingface.co/datasets/Zarinah/conll2003
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Casanova
    Description

    Zarinah/conll2003 dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2022). conll2003 [Dataset]. https://www.tensorflow.org/datasets/catalog/conll2003

conll2003

Explore at:
Dataset updated
Dec 22, 2022
Description

The shared task of CoNLL-2003 concerns language-independent named entity recognition and concentrates on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('conll2003', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Search
Clear search
Close search
Google apps
Main menu