Databases for learning of Bayesian networks

 

This page contains links to varioss databases of cases in Hugin format. Each database consists of cases sampled from a Bayesian network presented in the book. The tasks below are to use a BN-learning system of your own choice and to investigate the result. In general you shall not expect to retrieve exactly the original network. For evaluating the result, you may first compare the d-separation properties of the networks. If the networks different, but the d-sepatation properties are identical, you shall not hope for anything better. If the learned network has d-separation properties not shared with the initial network, this may be an outset for looking closer into why your system goes wrong.

Another way to evaluate the differences can be to run the cases in both networks and compare the scores.

 

You may download tools for BN-learning at the following sites:

http://b-course.cs.helsinki.fi, http://bayesware.com, www.hugin.com, http://research.microsoft.com/~dmax/WinMine/Tool.doc.htm

On the web page Datamine you find other links to machine learning tools.

 

Complete cases

The links below give access to databases consisting of 10 000 complete cases (for each case, the state of each variable is known).

Angina | Pregnancy | Poker (Opponent's hand) | Poker (Best hand) |

 

Missing values

The links below give access to databases containing 10 000 cases, where some values are missing. The values are missing completely at random. A suffix "1" indicates that the probability for missing value is 0.1, and a suffix "3" indicates the probaility of missing value 0.3.

Angina1 | Angina3

 

Hidden variable

This database contains 10 000 cases sampled from Pregnancy with the variable Ho never observed. The real learning task is to detect a hidden variable. None of the tools above are able to detect hidden variables, nor do they have tools for loooking for indications of hidden variables.

Pregnance

 

Structural constraint

This database contains 100.000 complete cases sampled from a three time slice model of Infected milk (2.2.1). The model used is the one in Figure 2.12. The links for the variables Corrcti are preknown. So are also their potentials. The task is to investigate whether your BN-learning system has a facility to clamp structure as well as potentials before learning. Note: It sems that only Hugin has the option of fixing parts of the structure. Even for Hugin it is not straightforward. You have to use the API. The Microsoft tool allows you to specify variables as without parents or as without children

Infected milk