Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

(20 points) My house is in a recently identified coastal flood risk zone. In 31

ID: 3044951 • Letter: #

Question

(20 points) My house is in a recently identified coastal flood risk zone. In 31 days, the gov ernment ill use its power of eminent domain to seize it and I will receive S100k (i.e. S10 thousand) compensation. Until then I may negotiate with the government and they will make offers on the property so I may leave earlier. Each day I must decide whether to accept the offer or wait another day. If on one day I receive an offer P, next day I know that the best offer will be P with probability 0.6, P - 20k with 0.15 chance and P + 20k with probability 0.25. You know that you will only receive offers between 140k and 200k and offers are made in multiples of 20k. If the daily best offer ever reaches 140k, you know it will not go up again If the daily best offer ever reaches 200k, then it will go down by 20k with probability 0.6 and not change with probability 0.4 Formulate this problem as an MDP, clearly stating the decision epochs, states, actions, tran sition probabilities and rewards

Explanation / Answer

MDP=S,A,T,R,

where S are the states, A the actions, T the transition probabilities (i.e. the probabilities Pr(s|s, a) to go from one state to another given an action), R the rewards (given a certain state, and possibly action), and is a discount factor that is used to reduce the importance of the of future rewards.

Once the MDP is defined, a policy can be learned by doing Value Iteration or Policy Iteration which calculates the expected reward for each of the states. The policy then gives per state the best(given the MDP model) action to do.

In summary, an MDP is useful when you want to plan an efficient sequence of actions in which your actions can be not always 100% effective.