Jonathan S. Shapiro
Tue, 4 Jul 2000 18:24:56 -0400
> IMO, when you begin to need a code generator, it's a very strong sign
> that your underlying interface is too complicated.
I disagree. First, I don't care if you are doing the code generation in C or
assembler. You are still doing code generation. Second, there is a lot of
research on this stuff now that says you need code generation for any sort
of reasonable performance because of the need to reduce data motion. There
is a paper on this from the sawmill folks at IBM, but the link is broken
because their website failed to push the poscript document to the external
website. I've asked them to fix the site.
Remember that the IPC has arguments that are passed in registers. There
simply isn't a way to specify this from a high-level language, and these
days the performance of stub code really is a significant system bottleneck.
It should also be noted that the code generator isn't especially complicated
*except* on the x86, and it is complicated there primarily because there are
so few registers. This creates a situation in which the order of assignments
matter. The code generation is therefore hard because you don't have a lot
of temporaries to work with. You really don't want to move this state into
the stack. I've tried it that way, and the performance sucks.
There is also a semantic gotcha hiding here, which is that if you define the
IPC arguments as coming from memory, then you have to ask what happens when
the memory locations are modified between the time the send completes and
the receive happens. Getting the semantics of this right is actually quite
challenging, as it appears necessary to divide the send and the receive into
two units of operation, and this has really really ugly consequences for new
process states that must now be made visible.
A middle ground would be to use the strategy that the current EROS stubs
use. There are three low-level assembler stubs that expect to get passed a
pointer to a C-level structure, and that rearrange the arguments to/from the
structure into the appropriate registers. The cost of this is a complete
extra copy of the arguments (often two, as it frequently proves that the
on-stack representation is already correct). Data motion in this place in
the design hurts very badly.