OBSOLETE CONTENT

This wiki has been archived and the content is no longer updated.

Usage FAQ

Can this checkpoint/restart implementation be used with Liblxc ?

Yes. The instructions are at LXC-USERCR

Design FAQ

What is the big deal - there are several implementations of C/R (see www.checkpointing.org). How is this different ?
Several implementations listed on the checkpointing.org are tailored for specific applications. Some require running applications with special LD_PRELOAD values. Some do not restore all resources (sockets, files, pipes, signals, process-hierarchy etc). This implementation aims to be general purpose and does try to restore all the resources listed and more. See also http://lists.linux-foundation.org/pipermail/clusters_sig/attachments/20051025/88ed8b30/Comparison-CR-0001.pdf. More detailed comparison with some of the C/R implementations is given below.
Why not checkpoint/restart from user space ?<p>C/R from user-space would require a lot of kernel state to exported to user-space via ioctls or /proc or new system calls.Such interfaces would become permanent and complicate maintenance.
How does this implementation differ from BLCR (http://ftg.lbl.gov/checkpoint)
TBD
How does this implementation differ from Cryopid (http://cryopid.berlios.de/)
TBD
How does this implementation differ from Zap
TBD
How does this implementation differ from OpenVZ
TBD
Can non-root users checkpoint/restart an application ?
For now, only users with CAP_SYSADMIN privileges can C/R an application. This is to ensure that the checkpoint image has not been tampered with and will be treated like a loadable kernel-module
Freezer: Why not use SIGSTOP to stop the application rather than freezer ?<p>SIGCONT can be blocked/handled by the application and using SIGSTOP/SIGCONT would not be transparent to the application.
Pros/Cons of recreating process-tree in-kernel (as in OpenVZ)
1. Pros:<p>
  1. Single task calls sys_restart(), all other processes created in kernel
  2. Kernel has complete control of all processes until return from sys_restart().
2. Cons:
  1. Can only restart entire container. To restart subset of processes that were checkpointed, the checkpoint image must be chopped from user-space
Pros/Cons of recreating processes in user-space (as in Zap)
1. Pros:
  1. Leverage existing calls like fork(), clone()
  2. Flexibility
  3. Allow partial-container checkpoint (i.e subtree checkpoint)
2. Cons:
  1. Needs a kernel-glue/synchronization for all processes that call sys_restart()
  2. Needs a system call like clone_with_pids() to create processes with specific pid.
Should we allow/deny subtree (partial-container) checkpoint ?
1. Useful for development
2. Useful for some workloads as long as user understands limitations, leaks
3. Useful for gathering more information during core dumps -- perhaps enough to restart the application from the core dump, attach with a debugger, and then observe/fix the problem.

Faq

OBSOLETE CONTENT

Usage FAQ

Design FAQ

Views

Personal tools

Navigation

Search

Tools