[cap-talk] Failure isolation (was: Re: images and capability security)

lists at notatla.org.uk lists at notatla.org.uk
Mon Mar 17 18:35:52 EDT 2008


Jed Donnelley <capability at webstart.com> wrote:

> The key to deriving security value from POLA, as Toby
> suggests, is insuring that when programs execute they
> have only the authority they need to do what is being
> asked of them.  In the case of image rendering code,
> as Toby notes, it only needs the authority to read it's
> input and to write it's output.  Any other authority
> (e.g. to access other files, to access the network,
> etc.) can only cause problems, and we see exactly those
> sorts of problems coming up again and again in the
> sorts of security problems that John noted.

What about
  - (finite) CPU time
  - (finite) working memory
  - to read any file format descriptions provided on the
    system but outside the program (e.g. /usr/share/image-desc/....)

 
> Here I'll share a snippet from a proposed prospectus
> for our Capability Systems Workshop to get some
> feedback (very high level for broad appeal):
> ________
> One straight forward approach to solving the computer
> security problem hasn't been tried in the modern era.
> This approach is that of Principle Of Least Authority
> (POLA) computing.  Just as ship builders learned that by
> having multiple water tight compartments in their ships
> they could keep them from sinking if a compartment was
> breached, computer systems can be made resilient to
> breaches (e.g. viruses or other forms of "malware") if
> they are composed of multiple "water tight" (mutually
> suspicious) compartments.
> ________
> 
> I believe the approach of contained failure is rather
> widespread in engineering.  Another example is in
> civil engineering when structures are built.  Again
> they are constructed so that local failures are
> isolated and don't bring the whole structure down.

Engineering goes beyond that.  Drawing on my earlier career
in nuclear reactor safety there are:

Redundancy:  More of something than you expect to need
             so you are OK if some doesn't work.
             (protects against random failures)

Diversity:   Different kinds of resource in case one
             type doesn't work.
             (protects against common-mode failures)

Segregation: Physical separation - e.g redundant units are
             in widely-separated parts of the site and cables
             providing redundancy follow different routes.
             (protects against common-cause failures)

Containment:  A form of redundancy - another barrier between
              those radionuclides and the public.

Failsafe Defaults/Passive Safety:
This is where you get things like:
            - control-rods drop in under gravity if electric power is lost
            - convection maintains some circulation if coolant pumps stop
            - flywheels on pumps maintain some flow after loss of power
            - ice blocks pre-emptively in place to quench escaped steam
            - tanks under pressure supply water through valves if pressure drops
The aim is to get the safe thing happening without relying on activity or fancy
engineering.  You should be able to ignore the first 30 mins of warnings and
still come to no harm.
      
Feedback:   Many types of feedback can be engineered to be negative to
            lead toward a stable reaction rather than an runaway one.
            Temperature coeffiecients, void coeffiecients,
            control-rod cable expansion, fuel pin geometry ...

ALARP:      Radiological safety involves the principle
            As Low As Reasonably Practical - like POLA

Interlocks: Prevent unsafe things from being done.
            Sometimes combined with administrative measures such as
            booking out the right key before you can work on some equipment.

Monitoring/Assessment:
         -  safety systems initiate shutdown e.g on excess temperature
         -  Metallurgy geeks will assess the largest metal defects that could
            have escaped detection at commissioning and will plot their growth
            to see that they remain too small to produce cracks.
         -  Various forms of monitoring detect failed fuel elements.
         -  Measurements (e.g. of transient behaviour) are used to validate models
         -  Computer modelling looks at what happens if you start in this state and
            that problem happens (leak, blockage, fire, control-rod error etc)
         -  Physical experiments are used (short of wrecking a real reactor)
         -  seismic assessment
         -  structural work (If the worst happens does the roof stay on?)
         -  fault trees and event trees


The reactor safety problems are not the same as we face in computing.
At a low level reactor safety is materials science - can you put what you want
to (meeting the constraints of the neutron economy) into a reactor for the
required duration with the harsh chemicals, neutron flux, high temperatures
and thermal cycling etc and still have it behave OK?  And carry out the rest
of the fuel cycle?  There are people trying to get the pebble-bed design back into
fashion because it is supposed to have exceptional passive safety.

Physical failures and disturbances are not the main problems in IT (although
they turn up in side channels).  I know that POLA, safe defaults and interlocks
are already well-known aspects of IT even if they're not practiced very heavily.

Diversity might mean having 2 different firewalls in sequence.
It is also sometimes used in N-way independently-written voting
systems where the majority of the programs can reject what they
think is a wrong answer.  It turns out they often aren't as
independent as you might have hoped.

There is the remainder fault-detection stuff for RSA calculations that comes
into the monitoring category.

Is there a more complete and coherent engineering model to be found here?


More information about the cap-talk mailing list