Model Checking Reward Markov Models with Uncertainties

Authors: Giovanni Bacci, Mikkel Hansen and, Kim G. Larsen

Affiliation: Department of Computer Science, Aalborg University, Denmark
e-mail: {giovbacci,mhan,kgl}@cs.aau.dk

We consider the problem of analysing Markov reward models (MRMs) in the presence of imprecise or uncertain rewards.
Properties of interests for their analysis are (i) probabilistic bisimilarity, and (ii) specifications expressed as probabilistic reward CTL formulae (PRCTL).
We consider two extensions of the notion of MRM, namely (a) constrained Markov reward models, i.e., MRMs with rewards parametric on a set variables subject to some constraints, and (b) stochastic Markov reward models, i.e., MRMs with rewards modelled as real-valued random variables as opposed to precise rewards.
Our approach is based on quantifier elimination for linear real arithmetic which is used for computing a symbolic representation of the set or parameter valuations satisfying a given property in the form of a quantifier-free first-order formula in the linear theory of the reals. Our work finds applications in model repair, where parameters need to be tuned so as to satisfy the desired specification, as well as in robustness analysis in the presence of stochastic perturbations.

Preliminaries

Markov Reward Models

In what follows we fix a finite set of atomic propositions AP

Definition (Markov chain) A Markov chain is a tuple M = (S, τ, l) consisting of a finite nonempty set of states S, a transition probability function τ : S → D(S), and a labeling function l : S → 2 AP mapping states to atomic propositions.

The transition probability function can be equivalently represented as a state-indexed matrix ( τ ( s ) ( t ) ) s , t S . Analogously, the labeling function can be alternatively represented as a state-indexed vector. Later, we will use this alternative representation to express our models.

Definition (Markov reward model) A Markov reward model is a tuple R = (S, τ, l, ρ) where (S,τ,l) is a Markov chain, and ρ : S → R is the reward function assigning a reward to each state.

A Markov reward model probabilistically generates infinite paths in S ω according to its underlying Markov chain; in addition, whenever a transition is performed, say from s to t, the system is rewarded by ρ(s). It is worth noting that the reward is given after leaving the current state.

Constrained Markov Reward Models

Constrained Markov reward models (CMRMs) model families of MRMs where both transition probabilities and state rewards are parametric on a set of real - valued variables subject to constraints expressed as a first-order formula in the linear theory of the reals.
Let x = ( x 1 ,..., x k ) be a vector of real-valued parameters. We denote by E the set of affine maps f : R k R of the form f(x) = a . x + b where a Q k and b ∈ Q.

Definition (constrained Markov reward model) A constrained Markov reward model is a tuple F = (S, τ, l, ρ, F) where S and l are defined as for Markov chains, τ : S → (S → E) is a parametric transition function, ρ : S → E is a parametric reward function, and F(x) is a linear first-order formula such that, for all s,t ∈ S, F(x) implies τ(s)(t) > 0 and   t S τ ( s ) ( t ) = 1 .

Intuitively, a CMRM F = (S, τ, l, ρ, F) defines a family of MRMs arising by plugging in concrete values for the parameters. A parameter valuation v R k is admissible (or feasible) if F(v) holds true.

Stochastic Markov Reward Models

It’s common practice to model experimental measurements by means of real-valued random variables distributed according to well studied families of distributions (e.g., normal or student’s T). Following this philosophy, we introduce the notion of stochastic Markov reward models (SMRMs) where state rewards are real-valued random variables.
Consider the probability space (Ω,Σ,P) representing the environment where the experiments are performed.  We will denote by Y the set of real-valued random variables of the form Y : Ω → R.

Definition (stochastic Markov reward model) A stochastic Markov reward model is a tuple J = (S, τ, l, ρ) where (S,τ,l) is a Markov chain and ρ : S → Y is a reward function assigning a real-valued random variables to each state.

An SMRM can be intuitively interpreted as a measurable function J : Ω → MRM where for a given event ω ∈ Ω, J(ω) is the MRM (S,τ, l, ρ ω ) where ρ ω (s) = ρ(s)(ω) for all s ∈ S.

Quantifier Elimination for the linear theory real arithmetic

Mathematica implements a built-in quantifier elimination algorithm, called Resolve. However if the formula involves only  polynomials of degree one, we suggest the user to employ Mjollnir which implements more efficient methods.

Mjollnir is a tool developed by David Monniaux that implements quantifier elimination algorithm for the theory of linear real arithmetic. The tool is freely available at http://www-verimag.imag.fr/~monniaux/download/Mjollnir_2012-02-24.zip

Once installed you shall set the installation directory as done below.

Options [ QE ] = { Path /Users/giovbacci/AAU/QuantifierElimination/Mjollnir/trunk/ , Method Mjollnir } ;

QE [ expr_ , OptionsPattern [ ] ] := If [ MatchQ [ OptionValue [ Method ] , Mjollnir ] , Put [ expr , temp.m ] ; (* export the expression in a temporary file *) Get [ ! <> OptionValue [ Path ] <> mjollnir --full-simplify temp.m ] (* execute Mjollnir *) , Resolve [ expr , Reals ] (* execute resolve *) ] ;

Test

By default, quantifier elimination is performed using Mjollnir. Run the following test to check if Mjollnir works correctly.

tutorial_1.gif

z ( ( x 0 ( x 1 4 x && x 9 ) && ( x 0 5 z && z 10 ) ) && ( x - 1 ( x 1 5 z && z 10 ) && ( x 0 3 x && x 5 ) ) )

( x < 0 && - 1 - x < 0 ) || ( - x < 0 && - 1 + x < 0 ) || ( 9 - x 0 && - 4 + x 0 )

In case the test does not work you can set the Method as Automatic and use Resolve

QE [ expr , Method Automatic ]

( x < 0 && - 1 - x < 0 ) || ( - 1 + x < 0 && - x < 0 ) || ( - 9 + x 0 && 4 - x 0 )

Getting Started

Data Structures for CMRMs

A CMRMs are represented by means of (symbolic) terms of the form CMRM[S,tau,ell,rho,F]
The following definitions allow one to access each component of a given CMRM

tutorial_2.gif

The following definitions allow one to construct the underlying graph of a given CMRM.
The graph can be manipulated and visualized by using standard Mathematica functions for graphs.

(* toGraph [ M : CMRM [ S_ , tau_ , _ , _ , _ ] ] := Graph [ ( DirectedEdge @@ # & ) /@ ( Most @ ArrayRules [ tau ] All , 1 ) ] ; *) toGraph [ M : CMRM [ S_ , tau_ , ell_ , rho_ , _ ] ] := Graph [ Labeled [ DirectedEdge @@ # 1 , # 2 ] & /@ ( Most @ ArrayRules [ tau ] ) , VertexSize 0.3 , VertexShapeFunction Square , VertexLabels Table [ i Placed [ { ell i , rho i , i } , { Top , Bottom , Before } ] , { i , Length [ S ] } ] ] ; Reachability [ M : CMRM [ S_ , tau_ , _ , _ , _ ] , u_ ] := Append [ Select [ IncidenceList [ TransitiveClosureGraph [ toGraph [ M ] ] , u ] , # 1 u & ] All , 2 , u ] ;

Declaration of CMRMs and SMRMs

A CMRM F = (S, τ, l, ρ, F) with S = {1,..,n} is represented by a term of the form CMRM[S,tau,ell,rho,F] where
- tau is an n × n probability matrix such that tau[[i,j]] = τ(i)(j), for all 1 ≤ i,j ≤ n,
- ell is a vector of length n such that ell[[i]] = l(i), for all 1 ≤ i ≤ n,
- rho is a vector of length n such that rho[[i]] = ρ(i), for all 1 ≤ i ≤ n.
- F is a first order formula in the linear theory of the reals.
Notably, parameters are implicitly represented by undeclared symbols.

Example: We are now ready to construct our first CMRM. We will model a gambler that tosses a coin according to the following rules.
Whenever the coin faces head, the following coin toss will give head again with probability q and tail with probability 1-q.
Whenever the coin faces tail, the following coin toss will fairly give head or tail with equal probability.
The reward gained after tossing is parametric on r . If the toss is done from head, then the reward is r-2, otherwise r+2.

tutorial_3.gif

We can visualize the above CMRM by plotting its underlying labeled graph.

toGraph [ Coin ]

tutorial_4.gif

An SMRM J = (S, τ, l, ρ) is represented by a term of the form SMRM[S,tau,ell,rho,decl] where S,tau, and ell are represented as for CMRMs and decl is a list of declarations of the form {x, distr} assigning each random variable x to its distribution. One can use any distribution provided in the Wolfram Language.

Example: Let us fix q = 1/2 and assume r to be a random variable distributed according to a Normal distribution with mean μ = 1 and standard deviation σ = 0.5. Accordingly, the gambler model described above can be rephrased as a SMRM as follows

tutorial_5.gif

Remark: we allow also to add constraints to the set of random rewards. These shall not be interpreted as feasibility constraints that restrict the set of possible outcomes. Rather, they should be interpreted as conditional constraints affecting the probability of our observations which will be conditioned under such constraints.
For instance, the above example can be extended by adding the constraint 0 ≤ r ≤ 2 indicating that we are interesting in observing the outcomes conditioned on the fact that the outcome for the random variable r lays in the closed interval [0,2].

tutorial_6.gif

Syntax for Probabilistic Reward CTL

PRCTL allows for state formulae describing properties about states in a MRM, and path formulae describing properties about paths in a MRM. State formulae Φ,Ψ and path formulae φ are constructed over the following abstract syntax:

Φ, Ψ  ::= True | False | AP[a] | Rew[rel, r] | Not[Φ] | Φ && Ψ | Φ || Ψ | Pr[J, X[Φ]] | Pr[J, U[Φ, Ψ] ] | Ex[R, Φ]

where a ∈ AP, r ∈ Q, rel ∈ {Equal, Less, LessEqual, Greater, GreaterEqual}, and J and R are intervals of the form Interval[{min,max}].

Model checking Constrained Markov Reward Models

Encoding PRCTL satisfiability

The following function implements the encoding of the PRCTL model checking problem for CMRMs.

tutorial_7.gif

Model checking CMRMs against PRCTL

Models [ P_CMRM , s_ , Phi_ , opt : OptionsPattern [ ] ] := Module [ { tp , satformula } , { tp , satformula } = Timing [ QE [ EncPRCTL [ P , s , Phi ] && getConstraints [ P ] ] ] ; If [ OptionValue [ Verbose ] , Print [ Computation time: , tp , sec ] ; ] ; satformula ] ;

Model checking Stochastic Markov Reward Models

Model Checking of Stochastic Markov Reward Models

Monte Carlo Estimation

Number of samples based on the Chernoff bound

SamplesNumber [ err_ , conf_ ] := Ceiling [ - Log [ conf / 2 ] / ( 2 err ^ 2 ) ] ;

Estimate the event by sampling the random variables

tutorial_8.gif

Model Checking SMRMs against PRCTL

tutorial_9.gif

getDeclarations [ J_SMRM ] := Last [ J ] ;

tutorial_10.gif

Examples

Modelling the stress level of a PhD student

Consider the following CMRM modelling the stress level of a Ph.D. student in computer science. It has seven states, namely s 1 , . . . , s 7 , annotated with the propositions “thk” (thinking), “ths” (thesis), “wr” (writing paper), “tc” (tool coding), and “pe” (paper evaluation). The level of stress of the student is modelled by associating with each state a reward that represents the stress that the student accumulates in the state at each time unit. After spending some time thinking, the student starts developing a tool with probability pc = 0.2, or writing a paper with probability pw = 0.5 (40% of the times for a journal and 60% for a conference), otherwise the student submits the thesis with probability 0.1. Once the tool is mature enough, the student starts writing a paper about it with probability pw (70% of which for a journal, otherwise for a conference). When a paper has been completed it is submitted for evaluation with probability ps = 0.3. The paper may be rejected or accepted (moving resp. to s7 or s5) according to some acceptance rate. At any moment before the thesis is completed, the student may move back to s1 with probability pt = 0.2 to think on how to proceed with his/her Ph.D.
State rewards are parametric on h, c, j, a, and r. Here, the parameter h models the help of the supervisor, whereas c and j represent the stress that accumulates while writing a conference paper c, or a journal paper j. The stress caused by having a paper accepted or rejected in modelled by the parameters a and r respectively. These parameters are subject to the constraints F defined below.  

tutorial_11.gif

toGraph [ R ]

tutorial_12.gif

We can express the property “the average level of stress accumulated by the student until the thesis is submitted doesn’t exceed 10” as the following PRCTL formula

phi1 = Ex [ Interval [ { 0 , 10 } ] , AP [ ths ] ] ;

Analogously, we can express the property “the student completes the thesis without passing through a state with stress level higher than 2 with probability greater than or equal to 0.5” by means of the following formula

phi2 = Pr [ Interval [ { 0.5 , 1 } ] , U [ Rew [ LessEqual , 2 ] , AP [ ths ] ] ] ;

The set of feasible valuations satisfying the first formula can be modeled as the following expression.

FullSimplify @ Models [ R , 1 , phi1 , Verbose True ]

Computation time: 0.016616 sec

- 5 h 0 && c j && 0 57 a + 575 c + 750 h + 175 j + 168 r 1750 3 && a r && - 3 a + r 3 && c 1

Analogously, the set of feasible valuations satisfying the second formula are

FullSimplify @ Models [ R , 1 , phi2 , Verbose True ]

Computation time: 0.01222 sec

- 5 h 0 && c j && a r && - 3 a + r 3 && c 1

Remark: our encoding allows one to have parameters in the formula. This feature can be used to express families of properties, but also to relate the probability of satisfying one formula with that of satisfying another one.
For instance, one can express complex properties such as “the student completes the thesis without passing through a state with stress level higher than x+2 with probability greater than or equal to x and can also complete the thesis without passing through a state with stress level higher than x+1 with probability smaller than or equal to x”.

tutorial_13.gif

FullSimplify @ Models [ R , 1 , phi3 && phi4 , Verbose True ]

Computation time: 0.020978 sec

5 + h 0 && h 0 && a + r 3 && ( ( c j && ( ( 3 + a + r 0 && ( ( a r && ( ( c + h > 59 52 && 52 x 7 ) || ( c 1 && ( ( h + j 219 97 && 97 r > 122 && 97 x 25 ) || ( h + j 3 && r 3 && x 1 ) || ( h + j 2 + x && ( ( h + j > 1 + x && ( ( 97 x 25 && ( 4 x 1 || ( r > 1 + x && 136 x 25 ) ) ) || ( 4 x 1 && r 2 + x && x 1 ) ) ) || ( r 2 + x && r > 1 + x && x 1 && 97 x > 25 ) ) ) ) ) || ( c + h > 1 + x && 52 x > 7 && ( ( h + j 2 + x && ( 97 x 25 || ( r 2 + x && x 1 ) ) ) || ( c + h 2 + x && ( 136 x 25 || ( r 2 + x && 4 x 1 ) ) ) ) ) ) ) || ( c 1 && ( ( h + j 219 97 && 97 r > 219 && 97 x 25 ) || ( h + j 2 + x && r 2 + x && r > 1 + x && x 1 && 346 x > 175 ) ) ) ) ) || ( c 1 && a r && a > 1 + x && h + j 2 + x && ( 34 x 7 || ( h + j > 1 + x && 212 x 35 ) ) ) ) ) || ( c 1 && a r && ( ( 3 + a + r 0 && ( ( c + h 9 4 && h + j > 9 4 && 4 r 9 && 4 x 1 ) || ( c + h 297 136 && h + j > 297 136 && 136 r > 161 && 136 x 25 ) || ( c + h 2 + x && h + j > 2 + x && r 2 + x && r > 1 + x && 4 x 1 && 136 x > 25 ) ) ) || ( a > 1 + x && c + h 2 + x && h + j > 2 + x && 808 x 175 && 212 x 35 ) ) ) )

Assume now that in the model described above, we interpret the parameters h, c, j, a, and r as pairwise independent random variables distributed according to the following declaration

tutorial_14.gif

We can now estimate the probability of satisfying the formula phi1

N @ Models [ J , 1 , phi1 , Verbose True ]

Event: E = ( 57 a + 575 c + 750 h + 175 j + 168 r 0 && 1750 - 171 a - 1725 c - 2250 h - 525 j - 504 r 0 ) [computed in 0.009605 sec]

Feasible Valuations: F = ( True )

Samples: 198070 , Error: 0.005 , Confidence: 99.99 %

Estimation of P[X∈E|X∈F] [computed in 4.035516 sec]

0.15636896046852122

The SMRM above does not consider the feasibility constraints F used above. We add such constraints as follows.

tutorial_15.gif

Event: E = ( - h 0 && - c + j 0 && 57 a + 575 c + 750 h + 175 j + 168 r 0 && - a + r 0 && 1750 - 171 a - 1725 c - 2250 h - 525 j - 504 r 0 && 5 + h 0 && 3 + a + r 0 && 3 - a - r 0 && - 1 + c 0 ) [computed in 0.007937 sec]

Feasible Valuations: F = ( - 5 h && h 0 && c j && c 1 && h 5 && - 3 a + r && a + r 3 && a r )

Samples: 198070 , Error: 0.005 , Confidence: 99.99 %

Estimation of P[X∈E|X∈F] [computed in 8.958402 sec]

0.15567282321899736

Queueing Model

tutorial_16.gif

tutorial_17.gif

tutorial_18.gif

Computation time: 0.00867 sec

10205 43 71 l + 580 r 20410 43

Created with the Wolfram Language