From Linux Checkpoint / Restart Wiki
Revision as of 04:35, 9 July 2010 by Mhelsley (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Usage FAQ

  1. Can this checkpoint/restart implementation be used with Liblxc ?
Yes. The instructions are at LXC-USERCR

Design FAQ

  1. What is the big deal - there are several implementations of C/R (see www.checkpointing.org). How is this different ?

    Several implementations listed on the checkpointing.org are tailored for specific applications. Some require running applications with special LD_PRELOAD values. Some do not restore all resources (sockets, files, pipes, signals, process-hierarchy etc). This implementation aims to be general purpose and does try to restore all the resources listed and more. See also http://lists.linux-foundation.org/pipermail/clusters_sig/attachments/20051025/88ed8b30/Comparison-CR-0001.pdf. More detailed comparison with some of the C/R implementations is given below.

  2. Why not checkpoint/restart from user space ?<p>C/R from user-space would require a lot of kernel state to exported to user-space via ioctls or /proc or new system calls.Such interfaces would become permanent and complicate maintenance.

  3. How does this implementation differ from BLCR (http://ftg.lbl.gov/checkpoint)


  4. How does this implementation differ from Cryopid (http://cryopid.berlios.de/)


  5. How does this implementation differ from Zap


  6. How does this implementation differ from OpenVZ


  7. Can non-root users checkpoint/restart an application ?

    For now, only users with CAP_SYSADMIN privileges can C/R an application. This is to ensure that the checkpoint image has not been tampered with and will be treated like a loadable kernel-module

  8. Freezer: Why not use SIGSTOP to stop the application rather than freezer ?<p>SIGCONT can be blocked/handled by the application and using SIGSTOP/SIGCONT would not be transparent to the application.
  9. Pros/Cons of recreating process-tree in-kernel (as in OpenVZ)
    1. Pros:<p>
      1. Single task calls sys_restart(), all other processes created in kernel

      2. Kernel has complete control of all processes until return from sys_restart().
    2. Cons:

      1. Can only restart entire container. To restart subset of processes that were checkpointed, the checkpoint image must be chopped from user-space

  10. Pros/Cons of recreating processes in user-space (as in Zap)
    1. Pros:
      1. Leverage existing calls like fork(), clone()
      2. Flexibility
      3. Allow partial-container checkpoint (i.e subtree checkpoint)
    2. Cons:
      1. Needs a kernel-glue/synchronization for all processes that call sys_restart()
      2. Needs a system call like clone_with_pids() to create processes with specific pid.
  11. Should we allow/deny subtree (partial-container) checkpoint ?
    1. Useful for development
    2. Useful for some workloads as long as user understands limitations, leaks
    3. Useful for gathering more information during core dumps -- perhaps enough to restart the application from the core dump, attach with a debugger, and then observe/fix the problem.
Personal tools