[EROS-Arch] Error logging

Ben Laurie ben@algroup.co.uk
Tue, 25 Sep 2001 22:15:31 +0100


Bill Frantz wrote:
> I think there might be a place for rate limiting log entries in an audit
> log, on the assumption that, if a component is producing log entries too
> fast (for some definition of "too"), it is behaving abnormally, and should
> not be permitted to flood the log.  Also, as a practical matter, there
> probably needs to be a method of copying the audit log to cheaper/better
> protected/offline storage.

Ooo, this reminds me of a fascinating conversation I was having with
someone who built a really reliable system by rate-limiting absolutely
all intercomponent communications. I'm racking my brains for who it was.
I think I know, but I should check first...

Anyway, it seemed to make a deal of sense to me, and even if rate limits
are set very conservatively (as in, unlikely to disrupt any correctly
operating system but still useful for restraining one that's gone mad).
Of course, the rate-limited component is not helped, but the damage it
does to the rest of the system (and possibly the rest of the world[1])
is limited, and that is the point.
> 
> >    3. In a decomposed system, how useful is an audit log?
> 
> I think it will help people gain confidence that the system is working as
> it is supposed to be working (which may be different from how it was
> designed to work, or how it was implemented to work).  In other words, an
> audit log can highlight design and implementation errors.

You still have to define what happens when it fills, and I say kill it
or block it (configurably). Add rate limitation to the equation and you
can make guarantees like "so long as you look at me once a day, I
guarantee I'll still be running and the log will not be full". Which
would be cool.

Cheers,

Ben.

[1] Consider mailing list expanders.

--
http://www.apache-ssl.org/ben.html

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff