General idea

Association of a fault to a response (!)

  • The fault is an exception matcher. The default exception matcher will allow to test against task type, event name and exception class.
  • the response is an action
  • the response can be located at three points:
    • either at the point of origin of the fault
    • at the first task(s) that have a planning task
    • at the topmost task(s) that have a planning task (e.g. missions)
  • by default, it is located at the topmost task


on_fault FAULT_MATCHER do

where FAULT_MATCHER is specified by a TASK, EVENT and/or an EXCEPTION_CLASS, or by a symbol. The symbol would refer to a fault list (see below)

The commands are made of an action script with, in addition, the following:

  • retry: once the response action has finished successfully, retry whichever action was already running
  • retry(count): retry the action as with retry without arguments, but at most 'count' times
  • retry(count, reset_timer): retry a limited number of times, but reset that number when as many seconds as "reset_timer" has elapsed after the last successful repair.
  • locate_on_origins: apply the response on the origin of the fault
  • locate_on_missions: apply the response on the topmost action in the faulty dependency graph (this is the default)
  • locate_on_action: apply the response on the first action parent of the fault origin in the dependency graph

Fault response tables can be defined on:

  • action interfaces. They apply to the whole plan
  • task models. They apply to the graph that is child of the matched task only. It therefore changes the meaning of locate_on_missions and locate_on_action
  • task instances. They apply to the graph that is child of this task. It therefore changes the meaning of locate_on_missions and locate_on_action

Fault lists

Fault lists allow to specify a fault-to-symbol list that is then used in the fault response table to specify the responses. It allows to decouple the definition of the faults to their responses themselves. Additionally, it allows to generate faults based on data predicates (the exception matchers only match Roby exceptions).


As a requirement in HROV, the Roby supervision should be able to restore the ROV into its state after a DTS system reboot. This can be done with this specification with:

on_fault DtsnodeSchilling::Task.power_off_event do |exception|
  # The dts_boot action tries starting the DTS driver until it works
  execute dts_boot(:dev => exception.origin.device)
  # Retry once dts_boot successfully finished.
Last modified 6 years ago Last modified on 05/02/13 10:43:53