SURF 2021: Test Design for Extremely Resilient System
2021 SURF: project description
- Mentor: Richard M. Murray
- Co-mentor: Josefine Graebener
Project Description
As future space missions become more advanced, targets further away from Earth become reachable, and with them come greater technical challenges. Spacecraft will have to downlink valuable science data, while potentially encountering more uncertain, hostile and isolated environments. Increasing distance from Earth results in a substantial light time delay, while hostile environments take a toll on the spacecraft and limit the mission lifetime. Due to these reasons, time is a critical resource, and Earth-based problem resolution may be inadequate – the spacecraft needs to resolve problems autonomously and efficiently. It will need to be resilient in this uncertain and communication-constrained environment [1].
Functional redundancy enables a system to continue operation by leveraging incidental capabilities of components, such as using a CPU to generate heat or a wheel as a rotation sensor, and therefore gain resilience beyond component redundancy and diversification. We are developing an “extreme resilience” concept for space missions which aims to utilize functional redundancy in spacecraft components to autonomously reconfigure the spacecraft to adapt to spacecraft failures or environmental changes. This adaptation will be made via on-board reasoning to decide which action will lead towards providing the most valuable science data.
As a motivating application, we chose to develop a concept of a rover whose mission is to reach one particular location, or to get there as close as possible. The spacecraft should redistribute computation tasks between the onboard and the payload computer, reroute commands within the spacecraft and continue driving while part of the system fails.
Testing of this system should be done according to the principles of chaos engineering [2], a software testing practice invented by Netflix, which defines a way of independent testing during production, by breaking the system to gain confidence in its capabilities.
This SURF project should test the extreme resilience concept by defining a metric to evaluate the system’s resilience, a hypothesis of the expected system behavior, and design test cases subject to the constraints given. For example, fail 5 components in any order for an arbitrary amount of time, and try to do the most damage to the system. These test cases shall then be implemented and analyzed according to the metric to identify weaknesses of the system. This can be done entirely via simulation or can be implemented on a hardware implementation, depending on what is available and possible to use at that point. Familiarity with Python would be beneficial. If a hardware implementation is chosen, experience in experimental robotics and ROS is an advantage.

References
[1] 2018 Workshop on Autonomy for Future NASA Science Missions: Ocean Worlds Design Reference Mission Reports, https://science.nasa.gov/technology/2018-autonomy-workshop/, October 10-11, 2018, Pittsburgh, PA
[2] Ali Basiri, Niosha Behnam, Ruud de Rooij, Lorin Hochstein, Luke Kosewski, Justin Reynolds, Casey Rosenthal, "Chaos Engineering", IEEE Software, vol.33, no. 3, pp. 3541, MayJune 2016, DOI:10.1109/MS.2016.60

