# An Introduction to Bayesian Networks.

(I do not guarentee for miscalculations !!)

At the disc accompanying the book you find solutions to exercises requring HUGIN.
• Chapter 2
• Chapter 3
• Chapter 4
• Chapter 5
• Chapter 6

### Chapter 2

#### Exercise 1.

A converging connection is a counter example for transitivity

#### Exercise 2.

In (a) all variables are d-connected to A. In (b) all variables except C and F are d-connected to A.

#### Exercise 3.

Consider a path from A to B, and let C be the neighbour to A on that path. Then C is instantiated. If the C-connection is serial or diverging, then C blocks the path. If the link is converging, consider C's neigbour, D, on the path. D is instantiated. Since the direction of the link is from D to C, the D-connection on the path can not be converging, and therfore D blocks the path.

#### Exercise 4.

(a), (b), and (c) are I-equivalent

#### Exercise 5.

P(A) = (0.2, 0.4 0.4) ; P(B ) = (0.3, 0.3, 0.4);
P(A | b1 ) = (0.167, 0.5, 0.333); P(A | b2 ) = (0.333, 0, 0.667); P(A | b3 ) = (0.125, 0.625, 0.25);
P(B | a1 ) = (0.25, 0.5, 0.25); P(B | a2 ) = (0.375, 0, 0.625); P(B | a3 ) = (0.25, 0.5, 0.25)

#### Exercise 6.

i) P(B, c1 ) = (0.02, 0.08); P(B, c2 ) = (0.18, 0.72); P(B ) = (0.2, 0.8)

ii) P(A | b1 , c1 ) = (0.3, 0.7) = P(A | b1 , c2 ); P(A | b2 , c1 ) = (0.6, 0.4) = P(A | b2 , c2 )

#### Exercise 7.

P(A, B, C) = P(A)P(B |A )P(C| A ). For example P(A, b1 , c1 ) = (0.01, 0.09)

#### Exercise 8.

Check your results by use of HUGIN

#### Exercise 9.

P(A | B, C ) = P(A | B ) <=> P(A | B, C )P(C | B ) = P(A | B )P(C | B)
<=> P(A, C | B ) = P(A | B ) P(C | B)

### Chapter 3

#### Exercise 1.

Solution on the disc

#### Exercise 2.

From the model we have that the findings are independent given H:
P(f1, ..., fn | H ) = P(f1 |H ). . . . . P(f2 | H ) = L(H | f1, ..., fn ), and Bayes' rule then yields
P(H | f1, ..., fn ) = m L(H | f1, ..., fn )P(H ), where m = 1/ P( f1, ..., fn )

#### Exercise 3.

Solution on the disc

#### Exercise 4.

Solution on the disc. For question iii), the most probable symbols are, b, a, a, a, and a. To find the most probable word, you can enter a word as evidence and use the probability of the evidence (displayed in the bottom left corner). Do this for all possible words. The result is baaba. In chapter 5, section 2 a more efficient method is presented.

#### Exercise 5.

Solution on the disc

#### Exercise 6.

Solution on the disc

#### Exercise 7.

i) There are three possible configurations, BG, GG, GB. All are equally likely.
ii) Solution on the disc. Information does not change the probability. The reason is that the two states BG and GG are not equally likely given the information that the girl is youngest (use Bayes' rule).
Note: Subquestion i) is not formulated sufficiently correct. It is not stated in the formulation how "I notice that one of them is a girl". If I see dolls and skirts, the conclusion will be as above. However, if I see one of the children and notices that it is a girl, then GG is twice as likely as BG and GB.

#### Exercise 8.

Solution on the disc

#### Exercise 9.

Give each pair (Pi,Ci ) a common child Constri , and let the conditional probabilities express the pairing of colour and pattern: P(y | p1, c1 ) = P(y | p2, c2 ) = P(n | p1, c2 ) = P(n | p2, c1 ) = 1. Insert Constri = y

#### Exercise 10.

Solution on the disc

#### Exercise 11.

Solution on the disc

#### Exercise 12.

Solution on the disc

#### Exercise 13.

Show, that P(B | A1, . . . , An ) is the same in the two models. Assume that A1', . . . , Ak' are y and rest are n (call this configuration e ). The model of Figure 3.16 yields P(B = y |e ) = 1- Pqi' . It is enough to show that we get the same probability in the model of Figure 3.17.

The calculations in the model of Figure 3.17 go as follows: P(Bi' |e ) = (1-qi', qi' ), and for remaining Bi the probability for (y, n ) are (0, 1). Now, the probability for B = y is the sum of the probability of all Bi-configurations with at least one Bi = y . This sum is 1 minus the probability of the configurations with all Bi = n, and it is 1- Pqi' .

#### Exercise 14.

Solution on the disc

#### Exercise 15.

Solution on the disc

#### Exercise 16.

Solutions to i) and ii) on the disc. The possible configurations are exactly the configurations with a positive probability. The probability of a state a of a variable A is the sum of the probabilities of the configurations with A = a. The possible configurations can be found in the same way as described in the solution to Exercise 4. iii) If the Boolean expression is in conjunctive normal form, then the technique from question i) can be used to construct a Bayesian network. If any configuration in that network has a positive probability, then the expression is satisfiable. Take any variable A. If any state of A has a positive probability, then this state is part of a configuration satisfying the Boolean expression. If no state of A has a positive probability then the expression is a contradiction. So, the expression is satisfiable if and only if a state of A has a positive probability. Note: The HUGIN system gives a message if the network (or the evidence entered to the network) is inconsistent.

iv) The satisfiability problem can be reduced to probability calculation in Bayesian networks, and the satisfiability problem is NP-complete.

### Chapter 4

#### Exercise 1.

......... a1... a2... a3-------------- a1... a2... a3
c1 (6,54) (12,36) (18,18)-- c1 (6,6) (3,9) (2,18)
c2 (12,18) (24,12) (36, 6)-- c2 (12,2) (6,3) (4,6)
c3 (24,36) (48, 24) (72,12)-- c3 (24,4) (12,6)(8,12)
... tvtW--------------------... tW/tV

#### Exercise 2.

P(B | f1, f2 ) = 1/21(5, 9, 7); P(B | f1, f2 ) = 1/21(8, 0 , 13); P(f1) = 0.4; P(f2) = 0.6;
P(f1, f2 ) = 0.21

#### Exercise 3.

The induction step goes as follows: Assume the statement to be true for all trees consisting of n nodes, and let T be a tree with n+1 nodes. Suppose that anarchistic message passing has taken place for a while, and that we are in a situation where not all links have been used in both directions. We shall prove that a legal message can be sent.The message passing must have started with a leaf A sending to its single neighbour B. Consider the tree the tree T\{A}. It consists of n nodes, and after the message from A has been passed, message passing in T\{A} is legal if and only if it is legal in T. By the induction hypothesis, there is always a legal message to send in T\{A} unless all messages have been passed. In that case a legal message can be sent from B to A.

#### Exercise 4.

P(A, B, C ) = P(B | A, C )P(A, C) = P(B | A )P(A, C) = P(B, A )P(A, C)/P(A )>

#### Exercise 5.

All message passings are legal in the collect phase. After that, all message passings in the distribute phase are legal. After both calls, all links have passed a message in both directions.

#### Exercise 6.

The cliques in the junction tree are ABCD, FDH, DI, DJ, CE, and EGK

#### Exercise 7.

Let V and W be nodes in a locally consistent junction tree, and let the intersection be I. Then I is a subset of all nodes on the path between V and W. Since the tree is locally consistent the marginal on I is the same for all nodes in the path.

#### Exercise 8.

i) Let G be a clique, and let Ai be the member of G with lowest index. When Ai is eliminated the Ci must contain G. Morover, since G is a maximal complete set and the elimination does not create fill-ins, we must have that G = Ci .
ii) Let Aj be the member of G\{Ai} with lowest index. Reason as above.
iii) Let T be a junction tree for G, and assume that (Ci, Cj ) is not a link in T. Consider the path in T between Ci and Cj. The separator on the link, l, connected to Ci must be Ci\{Ai} (it contains Ci\{Ai} and is a proper subset of Ci ). Now, add (Ci, Cj ) and break the unique cycle by removing l. The result is a tree containing (Ci, Cj ). From theorem 4.11 we infer that the result is a junction tree.
iv) The junction tree is
145 - 345 - 235
|
456

#### Exercise 9.

i) The cliques are DEF, EIJ, BDE, ABE, ACEG, EHIK, CEHK, CEGH. Total space size 309.
ii) The cliques are (with the eliminated variable underlined) DEF, EIJ, ABD, ADE, ACEG, CEIG, CGHI, CHIK. Total space size 225.

#### Exercise 10.

i) P(y, y, C) = (0.054, 0.006), P(y, n, C) = (0.216, 0.024), P(n, y, C) = (0.140, 0.210) = P(n, n, C)
ii) P(y, y, D) = (0, 0.119), P(y, n, D) = (0.086, 0.130), P(n, y, D) = (0.249, 0.107),
P(n, n, C) = (0.117, 117)
iii) P(B ) = (0.429, 0.571)
iv) P(A=y, D=n) = 0.1344

#### Exercise 11

i) Since P(A, B, C, D) = P(A)P(B|A)P(C|A)P(D|B,C) we have P(B, C, D | A ) = P(B|A)P(C|A)P(D|B,C), and therefore
P(B, C, D | a ) = P(B|a)P(C|a)P(D|B,C) (*)
This yields that the model (b) in the Figure is a model for P(B, C, D | a ).
ii) Note the correction: Show that P(D | a ) = SB,CP(D | B, C )P(B |a )P(C |a ). Formula (*) yields the result.
iii) Initially we have P(A), and the propagations yield P(B, e | A). Then P(B, e) = SAP(B, e | A)P(A).
For each propagation, the normalization constant is P(e| a). Hence, the propagations yield P(e |A ), and a muliplication with P(A) will give you P(A, e).
iv) Using Bayes' rule on A and B we get P(A, B, C, D) = P(B)P(A|B)P(C|A)P(D|B,C) and hence
P(A, C, D | B ) = P(A|B)P(C|A)P(D|B,C). Therefore, P(A, C, D | b ) = P(A|b)P(C|A)P(D|b,C), and the model (c) in the Figure is a model for P(A, C, D | b ).
v) Bayes' inversion on D as used in iv) requires P(B, C).
vi) Conditioning on the variables A, and E yield a singly connected network ({A, E} is called a minimal cut set).
vii) Conditioning on A and E results in 10 networks, while conditioning on A,B and G results in 8 networks only.

#### Exercise 12.

The sample: P(A) = (0.39, 0.61); P(B) = (0.62, 0.38); P(C) = (0.51, 0.49);
P(D) = (0.32, 0.68); P(E) = (0.99, 0.01).
Exact: P(A) = (0.4, 0.6); P(B) = (0.6, 0.4); P(C) = (0.52, 0.48); P(D) = (0.34, 0.66);
P(A) = (0.98, 0.02).

#### Exercise 13.

If the initial configuration has A=B, then C=n. Any two of these states will determine the third, and therefore the Gibbs sampling procedure will never bring you out of this configuration. If A and B are in the opposite states, then C=y, and again any two of these states determine the third. If

#### Exercise 14.

Look at the solution to Exercise 3.16. Next, consider the following Gibbs sampling problem: You have a consistent network into which the evidence e has been entered. Is there a configuration consistent with e?
Take the construction from Exercise 3.16, and extend all variable with a third state, s. Let all conditional probabilities involving s be even. Put e = 'no variable is in state s'.

### Chapter 5

#### Exercise 1.

Use the model from Exercise 3.4. i) P(baaba| e ) = 0.182
ii)
T1 T2 T3 T4 T5
a a a a (0.020, 0.004)
a a a b (0.034, 0.005)
a a b a (0.030, 0.007)
a a b b (0.006, 0.001)
a b a a (0.007, 0.002)
a b a b (0.012, 0.002)
a b b a (0.001, 0.0003)
a b b b (0.0003, 0.00004)
b a a a (0.104, 0.024)
b a a b (0.182, 0.028)
b a b a (0.158, 0.036)
b a b b (0.034, 0.005)
b b a a (0.084, 0.019)
b b a b (0.146, 0.022)
b b b a (0.016, 0.004)
b b b b (0.003, 0.0005)

#### Exercise 2.

i) The probability is 0.187. ii) The probability for Fred and Gwenn to be pure is 0.374. The remaining two possibilities will contain at least one carrier.

#### Exercise 4.

Perform a max-prop in HUGIN on the model from Exercise 3.4. The most probable configuration is the set of states with "probability" 1: baaba

#### Exercise 5.

i) Perform a max-prop in HUGIN on the stdfrm model (in directory Examples on the disc). The most probable configuration is the set of states with "probability" 1. The following horses are carriers in the most probable configuration: Brian, Dorothy, Eric, Henry, Irene.
ii) The same result as in i)

#### Exercise 6.

We assume that the evidence on K is entered to the clique KL. The evidence is denoted by the variable's lower case letter.
i) Probabilities for the following sets are directly accessible through HUGIN propagation: a, g, i, j, k, ijk, agijk
ii) Cautious propagation will also give access to the probabilities of the following sets: gijk, aijk, agjk, agik, agij, ij, jk, ag, agi, agj, agk.
iii) We only count table muliplications and do not take the size of the tables into consideration. HUGIN propagation requires a separator multiplication and a clique multiplication for each message passed. The number of messages passed are two time the number of links in the junction tree. In the present case the number of multiplications is 32. Cautious propagation requires two multiplications for each separator (16), and for each clique with n neighbours cautious propagation requires n(n-1) multiplications ( 2+12+12). This gives for the present junction tree a total of 42 table multiplications.

#### Exercise 7.

i) Global conflict: log2 0.0092 = - 6.77
ii) The probability for sire incorrect jumps from 0.0125 to 0.6823. The log2 of this fraction is 5.77, so it does explain most of the conflict (as expected; the variable Parental error is introduced for the purpose of taking care of conflicting evidence).

iii) Conflict analysis on a "true" model might do the job.

#### Exercise 10.

We have P(Parental error = no | e ) = 0.2675. A sensitivity analysis can start by investigating what the result would be if a finding was retracted. Here are the results of such an investigation : The probabilities of Parental error = no are as follows when the finding indicated is retracted: Stated Dam f1 : 0.3214; Stated Dam f2 : 0.4847; Stated Sire f1 : 0.9733; Stated Sire f2 : 0.9730;
Factor1 absent: 0.4203; Factor2 absent: 0.9848; Factor3 present: 0.1477; Factor4 absent: 0.2677;

#### Exercise 12.

The expected change in WOE is 0.7(log 0.9939 - log0.4142)+0.3(log 0.0061-log 0.5858), and it is negative.

#### Exercise 13.

For both combined Blood and Urin test and for Scanning, no outcome will change the optimal action to repeat, and Theorem 5.5 yields that the expected benefit is zero (note the correction for page 123). If, however, we combine all three tests, a tripple no will yield repeat to be optimal, and hence the expected benefit of all three is positive.

### Chapter 6

#### Exercise 1.

i) EU(Gd) = 14.7; EU(SB) = 15.5; EU(Dg) = 15.1

ii) Let vi denote the utility attached to the mark i. Then EU(Gd)-EU(Sb) = 0.1vo-0.1v3. Hence Gd can only be better than Sb if the mark 0 is given higher utility than the mark 3.

#### Exercise 2.

On the disc you find two solutions. One solution is a Bayesian network using Cooper's trick, and the other solution is an influence diagram.

#### Exercise 3.

On the disc you find a solution using Cooper's trick. Note that the probabilities on the pure chance nodes are not correct in that type of networks.

#### Exercise 4.

Use the Sum-max-sum Rule (page 149): B and C are not known before any decision is taken. So, first average over B and C, then max over A2 and finally, max over A1

#### Exercise 5.

Use the Sum-max-sum Rule (page 149): C is not known before any decision is taken, B is known before deciding on A2 but after deciding on A1. Hence, first average over C, then max over A2, average over B and finally max over A1

#### Exercise 6.

An influence diagram on the disc

#### Exercise 9.

i) A solution on the disc. The questions ii) and iii) should circulate you into the problems of game theory. "I know that he knows that I know that ..." . To answer question iv) you should consult the litterature on game theory. What you are looking for is a fixed point for the process of building networks over "I know that he knows". Such fixed points are called Nash equilibria