make the necessary changes for the state space for it to represent a two person
ID: 3644461 • Letter: M
Question
make the necessary changes for the state space for it to represent a two person game.implement two person game for the state space using mini-max algorith.choose the evaluations of the terminal states arbitrary any newsExplanation / Answer
1. m = m + 1 2. Find the next critical time t * m, and the action a * and player p * associated with it. 3 This is done by comparing the potential bene?ts and costs for each move. (a) We use some auxiliary de?nitions: i. Let q(a, p) be the ?rst player who switches out of a if player ~ p is the ?rst who moves. More precisely, let q(a, p) = ? ?? ?? ~ p if AMm-1(a, ~ p) =6 Ø p if AMm-1(a, ~ p) = Ø and AMm-1(a, p) =6 Ø Ø otherwise ii. Let SMm-1(a, p) be the longest ordered set of action pro?les (a 0 , a 1 , ..., a k-1 , a k ) such that a 0 = a and, for i > 0 a i = ( (a i-1 ~q(a,q) , AMm-1(a i-1 , q(a, q))) if i is odd and AMm-1(a i-1 , q(a, q))) =6 Ø (a i-1 q(a,q) , AMm-1(a i-1 , ~ q(a, q))) if i is even and AMm-1(a i-1 , q(a, q)) =6 Ø This de?nes the sequence of consecutive switches within stage m-1 that starts at a and ends at a pro?le from which there is no active move. We denote this ?nal node by SMm-1(a, p). The sequence is ?nite and is solely a function of AMm-1. iii. Given SMm-1(a, p) = (a 0 , ..., a k ), de?ne F Sm-1(a, p) = Pk i=1 I(a i-1 p =6 a i p ) where I(·) is the indicator function. F Sm-1 computes the number of switches by player p in the SMm-1(a, p) sequence. iv. Let ?Vm-1(a, p, bp) = Vm-1(SMm-1((bp, a~p), p)) - Vm-1(SMm-1(a, p)). This di?erence in values stands for the bene?t (or loss) of a switch by player p from pro?le a to action bp ? Ap - {ap}, without accounting for the switching cost of such a move. Let also ?F Sm-1(a, p, bp) = F Sm-1((bp, a~p), p)-F Sm-1(a, p). This di?erence stands for the di?erence in the number of subsequent immediate moves by player p, when considering a move from pro?le a to action bp. (b) We now use these de?nitions to ?nd the next critical time: i. First, we compute ttm(a, p, bp), the ?rst point in time at which player p would ?nd it pro?table to switch from pro?le a to action bp. This involves three di?erent cases, as shown below. The ?rst is when the switch gives a negative value (and therefore is never pro?table). The second is a case in which the di?erence in values favors a switch, and, moreover, if player p does not switch, he will be making more immediate subsequent switches than if he switches. This means that player 3 This part of the algorithm corresponds to the subroutine FindNextStage in the associated Matlab code. 2p would prefer to switch right away rather than staying put, so ttm(a, p, bp) kicks in immediately. The third case is the “standard” case, in which the critical time is the latest point on the gird at which the cost of switching is less than its bene?t. 4 ttm(a, p, bp) 5 = ? ? ? 0 if ?Vm-1(a, p, bp) < 0 or (?Vm-1(a, p, bp) = 0 and ?F Sm-1(a, p, bp) = -1) prevp (t * m-1 ) if ?Vm-1(a, p, bp) = 0 and ?F Sm-1(a, p, bp) < 0, except for (?Vm-1(a, p, bp) = 0 and ?F Sm-1(a, p, bp) = -1) max n t ? gp, t 0 and ?F Sm-1(a, p, bp) = 0 ii. Next, we compute for each pro?le a what is the latest time at which it is pro?table to switch out from this pro?le by each player. We make sure not to account for those switches which were already active. More precisely, for each (a, p) we compute tm(a, p) = ? ?? ?? 0 if AMm-1(a, p) = arg max bp?Ap{ap} ttm(a, p, bp) max bp?Ap{ap} ttm(a, p, bp) otherwise iii. Finally, we compute the next critical time by ?nding the latest such time across pro?les and players. That is, we compute: (a * , p * ) = arg max (a,p) {tm(a, p)}, and t * = max (a,p) tm(a * , p * ) (c) Given (a * , p * ): Termi nate if t * = 0, and if so set also m = m - 1. Abort if |p * | > 1. 6 This implies that there are equal critical times for di?erent players, and that the solution is not grid invariant. Otherwise, set t * m = t * and p * m = p * , and continue to part 3. 4 When ?F Sm-1(a, p, bp) > 0 one has to account for multiple switches. It is in this last case when the assumption that c(t) is common to all players and moves helps considerably. To accommodate richer families of cost technologies, one would need to introduce cumbersome notation to keep track of the costs associated with switching along the SM sequence, instead of simply counting them. 5 Note that by having weak inequalities within the max operator we implicitly assume that a player switches whenever he is indi?erent between switching or not. 6 arg max is a correspondence. Given the way we construct tm(a,p), the multiple solutions must be associated with a unique p * for any ?nite grid. In the limiting case, this is the only generic case. This is why the algorithm may abort in non-generic cases. 33. We now enter a (“short”) phase in which the set of active switches gets computed at every point in the grid until it “stabilizes,” as de?ned below. This is where (potentially) multiple stages of the game (as de?ned in De?nition 2) are computed within a single step of the algorithm. This part of the algorithm is considerably di?erent from the one described in Appendix B of the paper. (a) First, let V temp (a, p) = Vm-1(SMm-1(a, ~ p * m)) - PF Sm-1(a,~p * m) i=0 c(nextround i p (t * m)) These are the continuation values for each player and pro?le just after t * m. (b) Second, use these continuation values and apply standard backward induction on the game grid from time t * m backwards. At each point in time, record the full set of equilibrium strategies. Stop applying backward induction when the strategies for both players remain constant for a full round. More precisely, stop at the latest t ? gp that satis?es the following conditions: (i) next(t) ? g~p; (ii) the strategies for p at t and nextp(t) are the same; (iii) the strategies for ~ p at next(t) and nextround~p(next(t)) are also the same; and (iv) nextround~p(next(t)) = t * m. 7 (c) Finally, update AMm and Vm. The strategies for player p at t and those of ~ p at next(t) constitute AMm. The continuation values for player p at next(p) and those of ~ p at t constitute Vm. Termi nate if P (a,p) I (AMm(a, p) =6 Ø) = K1(K2 - 1) + K2(K1 - 1) where I(·) is the indicator function. This condition implies that the maximal possible switches are active. Before termination, let also m = m, t * m+1 = 0. Otherwise, go to part 1. Output: The essential information of the algorithm consists of the number of stages of the game, m, the critical points that de?ne the end of each stage, (t * m)m m=0 , the strategies at every stage (AMm)m m=0 , and the continuation values at the earliest stage, Vm. The initial equilibrium actions are given by the (generically) unique pro?le ainitial that has no active switches at the earliest stage, i.e. AMm(ainitial , p) = Ø for p = 1, 2. The equilibrium path can be computed by tracing the sequence along the SMm(a, p)’s, stage by stage. In other words, start at ainitial , go to SM1(ainitial , p), then continue to SM2(SM1(ainitial , p), p), and so on. The values for the game are given by Vm(ainitial , p) for each player p.Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.