> IMO, when you begin to need a code generator, it's a very strong sign
> that your underlying interface is too complicated.
I disagree. First, I don't care if you are doing the code generation in C or assembler. You are still doing code generation. Second, there is a lot of research on this stuff now that says you need code generation for any sort of reasonable performance because of the need to reduce data motion. There is a paper on this from the sawmill folks at IBM, but the link is broken because their website failed to push the poscript document to the external website. I've asked them to fix the site.
Remember that the IPC has arguments that are passed in registers. There simply isn't a way to specify this from a high-level language, and these days the performance of stub code really is a significant system bottleneck.
It should also be noted that the code generator isn't especially complicated *except* on the x86, and it is complicated there primarily because there are so few registers. This creates a situation in which the order of assignments matter. The code generation is therefore hard because you don't have a lot of temporaries to work with. You really don't want to move this state into the stack. I've tried it that way, and the performance sucks.
There is also a semantic gotcha hiding here, which is that if you define the IPC arguments as coming from memory, then you have to ask what happens when the memory locations are modified between the time the send completes and the receive happens. Getting the semantics of this right is actually quite challenging, as it appears necessary to divide the send and the receive into two units of operation, and this has really really ugly consequences for new process states that must now be made visible.
A middle ground would be to use the strategy that the current EROS stubs use. There are three low-level assembler stubs that expect to get passed a pointer to a C-level structure, and that rearrange the arguments to/from the structure into the appropriate registers. The cost of this is a complete extra copy of the arguments (often two, as it frequently proves that the on-stack representation is already correct). Data motion in this place in the design hurts very badly.
Jonathan