[cap-talk] Designation linux kernel patch concept

Mark Seaborn mrs at mythic-beasts.com
Tue Dec 4 15:13:38 EST 2007


David Hopwood <david.hopwood at industrial-designers.co.uk> wrote:

> It seems to me that using fds as (essentially) capabilities for files
> does not quite work, because:
> 
>  - an fd to an open file can't be safely used to designate the file
>    itself, if duplicates of the fd are to be shared between processes.
>    That is because the duplicates share attributes that you don't want
>    to be shared, such as the O_NONBLOCK flag (see
>    <http://plash.beasts.org/wiki/UsefulKernelChanges>
>    and <http://lkml.org/lkml/2007/8/14/135>).

I've discovered that you can use /proc/self/fd/N to re-open a pipe FD
and that gives you another FD on which O_NONBLOCK can be set
independently.  That's not ideal for the use case I had in mind
(http://plash.beasts.org/wiki/EventLoopAndFDs) because you would have
to treat file FDs and pipe FDs differently -- re-opening a file FD
would give you an FD with an independent seek position -- and I'm a
bit relucant to change behaviour based on the FD type reported by
fstat().

>  - there is no fd type that designates just a file inode.

You can open a file and use the resulting FD to re-open the file using
/proc/self/fd/N.  Most operations available through pathname-based
calls can also be done on file FDs -- assuming you can acquire the FD
in the first place (eg. you can't do an lchmod() by doing open() +
fchmod() if the file has its permissions bits unset to start with).

If /proc isn't directly available (as under Plash) you could have a
trusted intermediary for /proc that re-opens an FD if you asked for a
subset of the file mode flags.

That doesn't work for inodes that are not files such as Unix domain
sockets though.

>  - to do system call interposition in a way that is safe against
>    race conditions, you *really* need an fd type that designates a
>    file inode. Let's say that you are interposing on one of the "at"
>    calls, and you do a security check based on the dirfd and the
>    relative path. The check passes, so you forward the call to the
>    kernel. But how do you know that by the time the kernel
>    interprets the dirfd and path, it is pointing to the same file?
>    You don't. As pointed out in
>    <https://db.usenix.org/events/woot07/tech/full_papers/watson/watson.pdf>,
>    such race conditions are quite exploitable in practice.

What sort of checks do you have in mind?  The main problem Plash faces
in this area is doing operations relative to a dir FD without
following symlinks.  connect() on Unix domain sockets is the main
problem here because there is no way to switch its symlink-following
off (http://plash.beasts.org/wiki/PlashIssues/ConnectRaceCondition).

> However, the "at" calls don't have the right interface to be used with
> an inode fd. It would be bordering on insane to add yet another set of
> API calls and/or syscalls (and it would beg the question, "how many
> more iterations will we need to get this right?").

Yes, Linux seems to be facing the problem at the moment having
pressures to add interfaces but without having a good framework for
doing so.  Now if only they had some sort of generic object invocation
interface... :-)

Regards,
Mark


More information about the cap-talk mailing list