July 10, 2006

Sysenter Based System Call Mechanism in Linux 2.6

Reporting from linux kernel land,

Starting with version 2.5, linux kernel introduced a new system call entry mechanism on Pentium II+ processors. Why this new mechanism? Well, somebody reported performances issues with system calls on Pentium IV processors. Apparently, the system call mechanism based on software interrupts was responsible for this performance lag. Linux, or Linus more specifically, responded by implementing an alternative system call mechanism.

This mechanism made use of SYSENTER/SYSEXIT instructions available on Pentium II+ processors to implement system call entry and exit. This articles explores this new mechanism. If I have made any general statement, i.e. not mentioned specific architecture, please make a mental note that I am talking about Pentium processors. Also, all source code listings are based on the kernel i.e. (that's the kernel on backtrack v1.0).

Here is the link:

I explored this mechanism for awareness and of course, for fun ;) I wrote this article as a reference for other explorers. You can send me a note (manugarg at gmail dot com) if you find this article of any use.

Happy exploring,
"Journey is the destination of life"

Technorati tags:

Digg this; Post to del.icio.us


  1. thanks a lot. your paper is very cool.

    keep up the good work and dont forget to post them on Kernelnewbies :-)


  2. Hi,
    your article helped me a lot to understand how system calls work in Linux. Thank you!

    I'm adding my own system calls following some online tutorials and it's quite mechanical, so I didn't have much trouble with them.

    However, when trying to implement a system call similar to clone(), it doesn't work at all!

    as a user, the call is:
    int clone(int (*fn) (void *arg), void *child_stack, int flags, void *arg)

    but at kernel level it is defined as (in arch/i386/process.c):
    asmlinkage int sys_clone(struct pt_regs regs) {

    unsigned long clone_flags;
    unsigned long newsp;
    int __user *parent_tidptr, *child_tidptr;

    clone_flags = regs.ebx;
    newsp = regs.ecx;
    parent_tidptr = (int __user *)regs.edx;
    child_tidptr = (int __user *)regs.edi;
    if (!newsp)
    newsp = regs.esp;
    return do_fork(clone_flags, newsp, &regs, 0, parent_tidptr, child_tidptr);

    where did the *fn parameter go? I'm assuming it's in the regs struct, but since it's the first parameter shouldn't it be in ebx (the first register in the pt_regs struct as defined in include/asm-i386/ptrace.h)

    I try to call clone(), by using syscall(__NR_CLONE, int (*fn) (void *arg), void *child_stack, int flags, void *arg)
    but it doesn't work either.
    Are fork, clone, etc. different from all other system calls?

    Now, I'm starting to look at the assembler code trying to figure it out. Could you please give me a hint...

    Thanks again for the article (the elf vectors too!) :-)

    btw, my mail is vitobcn (a*t) gmail

  3. Extremely informative article. Keep up the good work! Your knowledge of Linux is awesome, and makes me feel I've got miles to go :)


  4. Good article. Thanks