Checkpoint mechanics Vadim Lebedev (vlebedev@aplio.fr)
Sun, 2 Jul 2000 19:28:16 +0200

Hello,

First of all i'd like to say that i'm extremely well impressed by eros. Both by the design and by the implementation. All the people working on this system merit congratulation for great job.

After reading the documnetation and browsing through the sources i have couple
of ideas.
I'm pretty new on the subject, so pleas bear with me if the ideas which follows is stupid...

As far as i understand the check point mechanics, when checkpoint task is awakened it spends some time selecting pages to write out and marking them as COW...
during this time no other activity is allowed. This constraint is a real PITA
for real-time software. I think a have an idea to overcome the problem.

This is the outline:

Suppose we mark all pages WRITE PROTECTED. Then, on write access to the page
the page fault handler store a pointer to the offending PTE in the DIRTYLIST and increments a DIRTYCOUNTER. The page then marked as RW and the offending instruction restarted. When DIRTYCOUNT reach some DIRTYTHRESOLD the Checkpoint task is awakened.
The task simply scans the DIRTYLIST and sets the pages accessible through the PTE's on the list as COW. This scan could be done in parallel with other
activities.
Once the marking phase is done, the task can proceed with writing pages on the disk.

Any comments?

Vadim