3 datasets found
  1. h

    hallmarks_of_cancer

    • huggingface.co
    Updated Apr 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigScience Biomedical Datasets (2023). hallmarks_of_cancer [Dataset]. https://huggingface.co/datasets/bigbio/hallmarks_of_cancer
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 4, 2023
    Dataset authored and provided by
    BigScience Biomedical Datasets
    License

    https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

    Description

    The Hallmarks of Cancer (HOC) Corpus consists of 1852 PubMed publication abstracts manually annotated by experts according to a taxonomy. The taxonomy consists of 37 classes in a hierarchy. Zero or more class labels are assigned to each sentence in the corpus. The labels are found under the "labels" directory, while the tokenized text can be found under "text" directory. The filenames are the corresponding PubMed IDs (PMID).

  2. h

    HoC

    • huggingface.co
    Updated Feb 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yanis labrak (2023). HoC [Dataset]. https://huggingface.co/datasets/qanastek/HoC
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 22, 2023
    Authors
    yanis labrak
    Description

    The Hallmarks of Cancer Corpus for text classification

    The Hallmarks of Cancer (HOC) Corpus consists of 1852 PubMed publication abstracts manually annotated by experts according to a taxonomy. The taxonomy consists of 37 classes in a hierarchy. Zero or more class labels are assigned to each sentence in the corpus. The labels are found under the "labels" directory, while the tokenized text can be found under "text" directory. The filenames are the corresponding PubMed IDs (PMID).

    In addition to the HOC corpus, we also have the Cancer Hallmarks Analytics Tool which classifes all of PubMed according to the HoC taxonomy.

  3. O

    HOC (Hallmarks of Cancer)

    • opendatalab.com
    zip
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karolinska Institutet (2024). HOC (Hallmarks of Cancer) [Dataset]. https://opendatalab.com/OpenDataLab/HOC
    Explore at:
    zip(3193597 bytes)Available download formats
    Dataset updated
    Jul 1, 2024
    Dataset provided by
    University of Cambridge
    Karolinska Institutet
    License

    https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

    Description

    The Hallmarks of Cancer (HOC) corpus consists of 1852 PubMed publication abstracts manually annotated by experts according to the Hallmarks of Cancer taxonomy. The taxonomy consists of 37 classes in a hierarchy. Zero or more class labels are assigned to each sentence in the corpus. The labels are found under the “labels” directory, while the tokenized text can be found under “text” directory. The filenames are the corresponding PubMed IDs (PMID).

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
BigScience Biomedical Datasets (2023). hallmarks_of_cancer [Dataset]. https://huggingface.co/datasets/bigbio/hallmarks_of_cancer

hallmarks_of_cancer

Hallmarks of Cancer

bigbio/hallmarks_of_cancer

Explore at:
19 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 4, 2023
Dataset authored and provided by
BigScience Biomedical Datasets
License

https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

Description

The Hallmarks of Cancer (HOC) Corpus consists of 1852 PubMed publication abstracts manually annotated by experts according to a taxonomy. The taxonomy consists of 37 classes in a hierarchy. Zero or more class labels are assigned to each sentence in the corpus. The labels are found under the "labels" directory, while the tokenized text can be found under "text" directory. The filenames are the corresponding PubMed IDs (PMID).

Search
Clear search
Close search
Google apps
Main menu