The rapid growth of connected smart devices and IoT platforms has brought safety and security concerns for emerging parallel and distributed systems to the forefront. These highly complex systems operate in a nondeterministic environment with unpredictable behaviour. If left unchecked, any fault in the system, whether due to an internal error or one imposed by the external environment, can lead to an unpredictable failure with potentially disastrous consequences. Therefore, preventing faults from turning into failures is and will be a crucial consideration for ensuring system resilience.