Difference between revisions of "Robust Control of Uncertain Markov Decision Processes with Temporal Logic Specifications"

From Murray Wiki
Jump to navigationJump to search
(htdb2wiki: creating page for 2012f_wtm12-cdc.html)
 
(htdb2wiki: creating page for 2011m_wtm12-acc.html)
 
Line 1: Line 1:
 
{{HTDB paper
 
{{HTDB paper
| authors = Eric M. Wolff, Ufuk Topcu, and Richard M. Murray
+
| authors = Eric M. Wolff, Ufuk Topcu and Richard M. Murray
 
| title = Robust Control of Uncertain Markov Decision Processes with Temporal Logic Specifications
 
| title = Robust Control of Uncertain Markov Decision Processes with Temporal Logic Specifications
| source = 2012 Conference on Decision and Control (CDC)
+
| source = Submitted, 2012 American Control Conference (ACC)
| year = 2012
+
| year = 2011
 
| type = Conference Paper
 
| type = Conference Paper
 
| funding = Boeing
 
| funding = Boeing
| url = http://www.cds.caltech.edu/~murray/preprints/wtm12-cdc_s.pdf
+
| url = http://www.cds.caltech.edu/~murray/preprints/wtm12-acc_s.pdf
 
| abstract =  
 
| abstract =  
We present a method for designing a robust control policy for an uncertain system subject to temporal logic spec- ifications. The system is modeled as a finite Markov Decision Process (MDP) whose transition probabilities are not exactly known but are known to belong to a given uncertainty set. A robust control policy is generated for the MDP that maximizes the worst-case probability of satisfying the specification over all transition probabilities in this uncertainty set. To this end, we use a procedure from probabilistic model checking to combine the system model with an automaton representing the specification. This new MDP is then transformed into an equivalent form that satisfies assumptions for stochastic shortest path dynamic programming. A robust version of dynamic programming solves for a epsilon-suboptimal robust control policy with time complexity O(log1/epsilon) times that for the non-robust case.
+
We present a method for designing robust con- trollers for dynamical systems with linear temporal logic specifications. We abstract the original system by a finite Markov Decision Process (MDP) that has transition probabilities in a specified uncertainty set. A robust control policy for the MDP is generated that maximizes the worst-case probability of satisfying the specification over all transition probabilities in the uncertainty set. To do this, we use a procedure from probabilistic model checking to combine the system model with an automaton representing the specification. This new MDP is then transformed into an equivalent form that satisfies assumptions for stochastic shortest path dynamic programming. A robust version of dynamic programming allows us to solve for a eps-suboptimal robust control policy with time complexity O(log1/eps) times that for the non-robust case. We then implement this control policy on the original dynamical system.
 
 
 
| flags =  
 
| flags =  
| tag = wtm12-cdc
+
| tag = wtm12-acc
| id = 2012f
+
| id = 2011m
 
}}
 
}}

Latest revision as of 06:15, 15 May 2016


Eric M. Wolff, Ufuk Topcu and Richard M. Murray
Submitted, 2012 American Control Conference (ACC)

We present a method for designing robust con- trollers for dynamical systems with linear temporal logic specifications. We abstract the original system by a finite Markov Decision Process (MDP) that has transition probabilities in a specified uncertainty set. A robust control policy for the MDP is generated that maximizes the worst-case probability of satisfying the specification over all transition probabilities in the uncertainty set. To do this, we use a procedure from probabilistic model checking to combine the system model with an automaton representing the specification. This new MDP is then transformed into an equivalent form that satisfies assumptions for stochastic shortest path dynamic programming. A robust version of dynamic programming allows us to solve for a eps-suboptimal robust control policy with time complexity O(log1/eps) times that for the non-robust case. We then implement this control policy on the original dynamical system.