MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This collection consists of 5 structure learning datasets from the Bayesian Network Repository (Scutari, 2010).
Task: The dataset collection can be used to study causal discovery algorithms.
Summary:
Missingness Statement: There are no missing values.
Collection:
The alarm dataset contains the following 37 variables:
The binary synthetic asia dataset:
The binary coronary dataset:
The hailfinder dataset contains the following 56 variables:
The lizards dataset contains the following 3 variables:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We provide 100 samples, each containing 5000 instances, of discrete Bayesian Networks from bnlearn's Bayesian Network Repository. Specifically, the BNs, along with their characteristics, are:
NETWORK | #NODES | #EDGES | #PARAMETERS | MAX. PARENTS | MEAN DEGREE |
Cancer | 5 | 4 | 10 | 2 | 2.00 |
Earthquake | 5 | 4 | 10 | 2 | 2.00 |
Survey | 6 | 6 | 21 | 2 | 2.00 |
Asia | 8 | 8 | 18 | 2 | 2.00 |
Sachs | 11 | 17 | 178 | 3 | 3.09 |
Child | 20 | 25 | 230 | 4 | 2.50 |
Insurance | 27 | 52 | 230 | 4 | 2.50 |
Water | 32 | 66 | 10083 | 5 | 4.12 |
Mildew | 35 | 46 | 540150 | 3 | 2.63 |
Alarm | 37 | 46 | 509 | 5 | 2.49 |
Barley | 48 | 84 | 114005 | 4 | 3.50 |
Hailfinder | 56 | 66 | 2656 | 4 | 2.36 |
Hepar2 | 70 | 123 | 1453 | 6 | 3.51 |
Win95pts | 76 | 112 | 574 | 7 | 2.95 |
Pathfinder | 109 | 195 | 72079 | 5 | 3.58 |
Munin1 | 186 | 273 | 15622 | 3 | 2.94 |
Andes | 223 | 338 | 1157 | 6 | 3.03 |
Diabetes | 413 | 602 | 429409 | 2 | 2.92 |
Pigs | 441 | 592 | 5618 | 2 | 2.68 |
Link | 724 | 1125 | 14211 | 3 | 3.11 |
Munin2 | 1003 | 1244 | 69431 | 3 | 2.48 |
Munin4 | 1038 | 1388 | 80352 | 3 | 2.67 |
Munin3 | 1041 | 1306 | 71059 | 3 | 2.51 |
Munin | 1041 | 1397 | 80592 | 3 | 2.68 |
Each dataset is sampled using Python and the bnlearn
package. The BN structure is loaded from the .bif
file using bif = bnlearn.import_DAG(path)
, and samples are generated with bnlearn.sampling(bif, n=5000, methodtype='bayes')
. Post-processing is then applied to replace the generated numerical values with those categorical specified in the .bif
structure file.
Additionally, ten extra old database samples of most BNs can be found on OpenML.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This collection consists of 5 structure learning datasets from the Bayesian Network Repository (Scutari, 2010).
Task: The dataset collection can be used to study causal discovery algorithms.
Summary:
Missingness Statement: There are no missing values.
Collection:
The alarm dataset contains the following 37 variables:
The binary synthetic asia dataset:
The binary coronary dataset:
The hailfinder dataset contains the following 56 variables:
The lizards dataset contains the following 3 variables: