2 datasets found
  1. bnlearn datasets

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). bnlearn datasets [Dataset]. http://doi.org/10.5281/zenodo.7676616
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 29, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This collection consists of 5 structure learning datasets from the Bayesian Network Repository (Scutari, 2010).

    Task: The dataset collection can be used to study causal discovery algorithms.

    Summary:

    • Size of collection: 5 datasets with 3 - 56 columns of various sizes
    • Task: Causal Discovery
    • Data Type: Discrete
    • Dataset Scope: Collection
    • Ground Truth: Known / Estimated
    • Temporal Structure: No
    • License: TBD
    • Missing Values: No

    Missingness Statement: There are no missing values.

    Collection:

    The alarm dataset contains the following 37 variables:

    • CVP (central venous pressure): a three-level factor with levels LOW, NORMAL and HIGH.
    • PCWP (pulmonary capillary wedge pressure): a three-level factor with levels LOW, NORMAL and HIGH.
    • HIST (history): a two-level factor with levels TRUE and FALSE.
    • TPR (total peripheral resistance): a three-level factor with levels LOW, NORMAL and HIGH.
    • ... (33 more variables, see the corresponding .html file)

    The binary synthetic asia dataset:

    • D (dyspnoea), a two-level factor with levels yes and no.
    • T (tuberculosis), a two-level factor with levels yes and no.
    • L (lung cancer), a two-level factor with levels yes and no.
    • B (bronchitis), a two-level factor with levels yes and no.
    • A(visit to Asia), a two-level factor with levels yes and no.
    • S (smoking), a two-level factor with levels yes and no.
    • X (chest X-ray), a two-level factor with levels yes and no.
    • E (tuberculosis versus lung cancer/bronchitis), a two-level factor with levels yes and no.

    The binary coronary dataset:

    • Smoking (smoking): a two-level factor with levels no and yes.
    • M. Work (strenuous mental work): a two-level factor with levels no and yes.
    • P. Work (strenuous physical work): a two-level factor with levels no and yes.
    • Pressure (systolic blood pressure): a two-level factor with levels <140 and >140.
    • Proteins (ratio of beta and alpha lipoproteins): a two-level factor with levels <3 and >3.
    • Family (family anamnesis of coronary heart disease): a two-level factor with levels neg and pos.

    The hailfinder dataset contains the following 56 variables:

    • N07muVerMo (10.7mu vertical motion): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
    • SubjVertMo (subjective judgment of vertical motion): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
    • QGVertMotion (quasigeostrophic vertical motion): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
    • CombVerMo (combined vertical motion): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
    • AreaMesoALS (area of meso-alpha): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
    • SatContMoist (satellite contribution to moisture): a four-level factor with levels VeryWet, Wet, Neutral and Dry.
    • ... (49 more variables are in the correspondent .html file)

    The lizards dataset contains the following 3 variables:

    • Species (the species of the lizard): a two-level factor with levels Sagrei and Distichus.
    • Height (perch height): a two-level factor with levels high (greater than 4.75 feet) and low (lesser or equal to 4.75 feet).
    • Diameter (perch diameter): a two-level factor with levels narrow (greater than 4 inches) and wide (lesser or equal to 4 inches).
  2. 100 samples of 5000 instances of categorical BNs from bnlearn's Bayesian...

    • zenodo.org
    zip
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pablo Torrijos; Pablo Torrijos (2025). 100 samples of 5000 instances of categorical BNs from bnlearn's Bayesian Network Repository [Dataset]. http://doi.org/10.5281/zenodo.14917796
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pablo Torrijos; Pablo Torrijos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We provide 100 samples, each containing 5000 instances, of discrete Bayesian Networks from bnlearn's Bayesian Network Repository. Specifically, the BNs, along with their characteristics, are:

    NETWORK#NODES#EDGES#PARAMETERSMAX. PARENTSMEAN DEGREE
    Cancer541022.00
    Earthquake541022.00
    Survey662122.00
    Asia881822.00
    Sachs111717833.09
    Child202523042.50
    Insurance275223042.50
    Water32661008354.12
    Mildew354654015032.63
    Alarm374650952.49
    Barley488411400543.50
    Hailfinder5666265642.36
    Hepar270123145363.51
    Win95pts7611257472.95
    Pathfinder1091957207953.58
    Munin11862731562232.94
    Andes223338115763.03
    Diabetes41360242940922.92
    Pigs441592561822.68
    Link72411251421133.11
    Munin210031244 6943132.48
    Munin4103813888035232.67
    Munin3104113067105932.51
    Munin104113978059232.68

    Each dataset is sampled using Python and the bnlearn package. The BN structure is loaded from the .bif file using bif = bnlearn.import_DAG(path), and samples are generated with bnlearn.sampling(bif, n=5000, methodtype='bayes'). Post-processing is then applied to replace the generated numerical values with those categorical specified in the .bif structure file.

    Additionally, ten extra old database samples of most BNs can be found on OpenML.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zenodo (2025). bnlearn datasets [Dataset]. http://doi.org/10.5281/zenodo.7676616
Organization logo

bnlearn datasets

Explore at:
zipAvailable download formats
Dataset updated
Jan 29, 2025
Dataset provided by
Zenodohttp://zenodo.org/
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This collection consists of 5 structure learning datasets from the Bayesian Network Repository (Scutari, 2010).

Task: The dataset collection can be used to study causal discovery algorithms.

Summary:

  • Size of collection: 5 datasets with 3 - 56 columns of various sizes
  • Task: Causal Discovery
  • Data Type: Discrete
  • Dataset Scope: Collection
  • Ground Truth: Known / Estimated
  • Temporal Structure: No
  • License: TBD
  • Missing Values: No

Missingness Statement: There are no missing values.

Collection:

The alarm dataset contains the following 37 variables:

  • CVP (central venous pressure): a three-level factor with levels LOW, NORMAL and HIGH.
  • PCWP (pulmonary capillary wedge pressure): a three-level factor with levels LOW, NORMAL and HIGH.
  • HIST (history): a two-level factor with levels TRUE and FALSE.
  • TPR (total peripheral resistance): a three-level factor with levels LOW, NORMAL and HIGH.
  • ... (33 more variables, see the corresponding .html file)

The binary synthetic asia dataset:

  • D (dyspnoea), a two-level factor with levels yes and no.
  • T (tuberculosis), a two-level factor with levels yes and no.
  • L (lung cancer), a two-level factor with levels yes and no.
  • B (bronchitis), a two-level factor with levels yes and no.
  • A(visit to Asia), a two-level factor with levels yes and no.
  • S (smoking), a two-level factor with levels yes and no.
  • X (chest X-ray), a two-level factor with levels yes and no.
  • E (tuberculosis versus lung cancer/bronchitis), a two-level factor with levels yes and no.

The binary coronary dataset:

  • Smoking (smoking): a two-level factor with levels no and yes.
  • M. Work (strenuous mental work): a two-level factor with levels no and yes.
  • P. Work (strenuous physical work): a two-level factor with levels no and yes.
  • Pressure (systolic blood pressure): a two-level factor with levels <140 and >140.
  • Proteins (ratio of beta and alpha lipoproteins): a two-level factor with levels <3 and >3.
  • Family (family anamnesis of coronary heart disease): a two-level factor with levels neg and pos.

The hailfinder dataset contains the following 56 variables:

  • N07muVerMo (10.7mu vertical motion): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
  • SubjVertMo (subjective judgment of vertical motion): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
  • QGVertMotion (quasigeostrophic vertical motion): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
  • CombVerMo (combined vertical motion): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
  • AreaMesoALS (area of meso-alpha): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
  • SatContMoist (satellite contribution to moisture): a four-level factor with levels VeryWet, Wet, Neutral and Dry.
  • ... (49 more variables are in the correspondent .html file)

The lizards dataset contains the following 3 variables:

  • Species (the species of the lizard): a two-level factor with levels Sagrei and Distichus.
  • Height (perch height): a two-level factor with levels high (greater than 4.75 feet) and low (lesser or equal to 4.75 feet).
  • Diameter (perch diameter): a two-level factor with levels narrow (greater than 4 inches) and wide (lesser or equal to 4 inches).
Search
Clear search
Close search
Google apps
Main menu