[cap-talk] Autonomous maintenance architecture

William Pearson wil.pearson at gmail.com
Tue Feb 19 16:45:24 EST 2008


This is the intro to the specification I have been writing. The spec
itself is still quite in a state of flux, but this should give an idea
of what I am aiming to build. It should hopefully clear up some of the
misunderstandings that I have introduced with some of my previous
messages.

Modern computer systems have to be micro-managed while being
maintained to make sure that they have the correct functionality and
security. This is in part due to architectural choices that encourage
this approach. This document outlines the design of a system that is
geared to encourage automated computer maintenance.

Computer Maintenance - The process of changing the settings of current
programs, introducing new programs or program variants to the system
(either endogenous or exogenous in origin) and removal of old, to
improve the performance of the system.

As much as possible of the computer system programming should be
changeable so it can be maintained (including schedulers, device
drivers etc).

Goal

This architecture should be appropriate to autonomous robotics and
autonomic computing, the systems should be able to be maintained and
controlled automatically or with the minimal human oversight. Real
time constraints on activity are required.

Modern autonomous robotics have concentrated on getting the system to
act autonomously, this architectures focus is on enabling the system
to maintain itself autonomously.

The two pillars of the design are capability based security and
reinforcement learning. Traditionally reinforcement learning is
defined as

    an agent is connected to its environment via perception and action
... On each step of interaction the agent receives as input, i, some
indication of the current state, s, of the environment; the agent then
chooses an action, a, to generate as output. The action changes the
state of the environment, and the value of this state transition is
communicated to the agent through a scalar reinforcement signal, r.
The agent's behavior, B, should choose actions that tend to increase
the long-run sum of values of the reinforcement signal. It can learn
to do this over time by systematic trial and error, guided by a wide
variety of algorithms that are the subject of later sections of this
paper. Formally, the model consists of

        * a discrete set of environment states, S
        * a discrete set of agent actions, a ; and
        * a set of scalar reinforcement signals; {0,1} , or the real numbers.

We are not going to define S and a, as they may change over the
lifetime of a computer system (new hardware added), also if everything
that affects the reinforcement signal is part of a then as computation
can affect the reinforcement signal(e.g. too much computation running
down the battery so cannot achieve goals), the full definition of the
agents potential actions is prohibitive anyway.

The method we will use to make non-trusted programs attempt to
maximise the reinforcement signal is to auction off resources for
fictitious credits which is analogous to reward, if a program loses
all credits it will lose potential to gain any control of the system,
including its own memory. The reinforcement signal gives credits to
the active programs in control of certain system resources for a
certain amount of time, and can be spread conservatively from them.
Resource based selection has precedents in Artificial life projects
such as Tierra, where it is used to create evolving systems with
random mutation. In Eric Baums Hayek systems it is used to create a
learning system in a traditional reinforcement system. In Mark
Miller's agoric vision bidding for resources is used to adapt the
system to earn the most money, which used real money to allocate
resources.*

Capabilities are used to communicate between the programs and provide
fine grained security between processes and system resources.


  Will Pearson

* I sometimes get confused by the agorics literature about the use of
the term reward. Is it always to do with money?


More information about the cap-talk mailing list