[cap-talk] search capability
David-Sarah Hopwood
david.hopwood at industrial-designers.co.uk
Thu Oct 2 18:29:23 CDT 2008
Sam Mason wrote:
> On Mon, Sep 29, 2008 at 12:09:51AM +0100, David-Sarah Hopwood wrote:
>> Rob Meijer wrote:
>>> 1) A single indexer would require an abundance of authority in order to be
>>> able to create an index of the full graph.
>> The user's shell has sufficient authority.
>
> As far as I know, most search systems tend to have a single global
> index that's built by a privileged entity (i.e. root in a unix system).
> The results of searches on this global index are filtered before
> being returned to the user to ensure that they can actually see the
> node/document in question.
Note that this approach does the wrong thing in respect of:
- resource accounting (the resources used by the shared indexing
service are not accounted to users);
- denial of service (it might be possible for a user to create a subgraph
that causes the shared indexer to get stuck);
- logging (it is possible to derive information about whether a file
contains a search term without having been logged as having read it);
- content-based exploits (the shared indexer must be perfectly bug-free
when run on arbitrary files, since it runs as root);
- duplication of access control decisions (the search service has to
only return information that its client could have obtained without
it -- which is a very difficult thing to determine correctly);
- changes in access restrictions between when a file is indexed and
when the filtering decision is made, as discussed below.
> The most fiddly case to me seems to be the content management system
> (CMS) of some large organization. Here there would be many thousands of
> users and if each was forced to maintain their own index the majority of
> it would be identical between users. Storing thousands of copies of an
> almost identical multi-gigabyte index seems wasteful.
Suppose that we ignore the resource accounting and denial of service
issues, and try to view the sharing of the indexer as a memoization
optimization. The intuition would be that the result of indexing a
given subgraph is the same whatever user it is done on behalf of, and
so the results should be able to be shared.
Note however that for mutable subgraphs, this intuition is wrong: if
the search system filters results for a given user based on whether
that user can *currently* see a file, but the index contains information
about the file's previous contents, then that information is leaked when
it should not be.
This makes it clear that the optimization must only be applied to
immutable subgraphs. (We only ever have to index the contents of a
particular version of a file once, since each version is immutable.)
Given this fix, the use of a pure capability architecture helps to
reason about the correctness of the optimization and any remaining
weaknesses, precisely because pure capability systems do not make
access control decisions based on what subject is performing a given
action. In an ACL system (or other system with ambient authority), an
indexing process running under Alice's userid will have different
behaviour from an indexing process running under Bob's userid, even
when they are running the same code on the same data. In other words,
in an ambient authority system there is no sound way to perform
memoization across computations run under different userids -- whereas
in a capability system there is, if we are prepared to ignore resource
accounting and denial of service issues.
To be more explicit, what I'm suggesting is that different users should
(must, if we care about not leaking information as described above) use
different Indexers, but most of the effect on performance of using a
shared Indexer can instead be obtained by granting the Indexers access
to a shared Memoizer. The Memoizer will only share results of two or
more Indexers applying the same (according to EQ) confined deterministic
function to the same confined deterministic subgraph.
OTOH, if users only have access to principal-specific capabilities
wrapped by a mechanism such as Horton, then this optimization will not
be effective, because the wrappers held by different users will not be EQ.
(You can of course view this as doing precisely the right thing, because
the capabilities held by different users to the same underlying object
are deliberately not equivalent in that case.)
--
David-Sarah Hopwood
More information about the cap-talk
mailing list