# Difference between revisions of "SURF 2017: Scalable Robust Synthesis from Temporal Logic Specifications"

Line 18: | Line 18: | ||

'''2. Robust Synthesis using Reinforcement Learning to augment robustness''' | '''2. Robust Synthesis using Reinforcement Learning to augment robustness''' | ||

− | There has been recent work on employing reinforcement learning for enforcing | + | There has been recent work on employing reinforcement learning for enforcing temporal logic specifications [1,2]. In [4], the authors construct a product MDP that |

combines a Rabin automata generated from a LTL spec to | combines a Rabin automata generated from a LTL spec to | ||

synthesize reactive optimal controllers that satisfy LTL | synthesize reactive optimal controllers that satisfy LTL |

## Latest revision as of 06:38, 21 December 2016

**2017 SURF project description**

- Mentor: Richard M. Murray
- Co-mentors: Sumanth Dathathri

Several interesting problems are presented below and it is desirable that the SURF identifies one or more of the following problems in addition to the experimental task to develop some experience with working on theoretical problems. The primary task would be to develop an experimental test-bed for demonstration/evaluation of recent results and future work with regard to the optimality and robustness of controllers synthesized from temporal logic specifications.

The test bed would be a platform that allows for noise/failures to manifest in the form of sensing noise/actuation failures/unexpected behavior from the environment. The precise form of the test-bed is to be determined (after some preliminary research/exploring literature) -- it could consist of a swarm of quad rotors, RC cars or turtle bots for example. A good starting point would be the experimental set up proposed for the Formal Methods for Robotics Challenge at ICRA 2016. In addition to the design of the experimental facility, other interesting areas of work for the SURF to contribute towards are:

**1. Scalable Synthesis**
A current direction of research in the group is exploring the parallelizability of GR(1) synthesis [3]. The complexity of the algorithm scales as cube of the length of the formula (number of liveness formulas to be satisfied). For large problems, algorithms that are linear or quasi-linear algorithms are desirable. Parallelizing the synthesis algorithm could reduce the complexity, enabling synthesis for systems where the formulas are complex and large. Identifying scenarios in which the synthesis can be efficiently parallelized is an interesting topic. Once proofs are developed for the theory, the SURF would help with writing code to develop a tool that incorporates the parallelization into the synthesis algorithm. Additionally, coming up with heuristics for instances where parallelization cannot be done in a complete manner (completeness here refers to the guarantee that a solution is found whenever the problem is feasible) to parallelize the algorithm is an area to explore for the motivated SURF . Heuristics could potentially involve learning-based algorithms for decomposition or other algorithmic approaches (eg. greedy).

**2. Robust Synthesis using Reinforcement Learning to augment robustness**
There has been recent work on employing reinforcement learning for enforcing temporal logic specifications [1,2]. In [4], the authors construct a product MDP that
combines a Rabin automata generated from a LTL spec to
synthesize reactive optimal controllers that satisfy LTL
specifications. In general, for strategies synthesized from
logic specifications, the guarantees fail to hold in case of modeling uncertainty
or deviation in behavior of the environment from its assumed behavior.
They observe from
experiments that the controllers generated do reasonably
well even for scenarios when the formula cannot be satisfied.
Exploring this line of work, developing theoretical guarantees
regarding robustness for such a synthesis approach would be
useful.

Though not necessary, it would be useful for the student to have some knowledge of the following:

- Mathematical Logic, Formal Methods
- Some background in Algorithms (CS 2, Ma 5, CS21 at Caltech, CS38 at Caltech) -- not essential but would be useful for exploring research directions
- Programming Skills (Preferably Python, if not some experience with C or Java or other languages would be good)
- Familiarity with Linear Temporal Logic and Formal Verification (CS118) (useful, but not necessary)
- Familiarity with Reinforcement Learning (would be useful but not necessary)
- Familiarity with Robot Operating System (ROS) platform (useful not necessary, can be picked up during the course of the summer)

## References

[1] D. Aksaray, A. Jones, Z. Kong, M. Schwager, and C. Belta. Q-learning
for robust satisfaction of signal temporal logic specifications. *CoRR*,
abs/1609.07409, 2016.

[2] A. Jones, D. Aksaray, Z. Kong, M. Schwager, and C. Belta. Enforcing
temporal logic specifications via reinforcement learning. In *Proceedings*
of the 18th International Conference on Hybrid Systems: Computation
and Control*, HSCC ’15, pages 279–280, New York, NY, USA, 2015.*
ACM

[3] Y. Kesten, N. Piterman, and A. Pnueli. Bridging the gap between fair
simulation and trace inclusion. *Information and Computation*, 200:35–
61, 2005.

[4] D. Sadigh, E. S. Kim, S. Coogan, S. S. Sastry, and S. A. Seshia. A
learning based approach to control synthesis of markov decision processes
for linear temporal logic specifications. *CoRR*, abs/1409.5486, 2014.