網(wǎng)站qq客服顯示不在線(xiàn)中國(guó)知名網(wǎng)站排行榜
文章目錄
- 0x01. seccomp規(guī)則添加原理
- A. 默認(rèn)規(guī)則
- B. 自定義規(guī)則
- 0x02. seccomp沙箱“指令”格式
- 實(shí)例
- Task 01
- Task 02
- 0x03. 總結(jié)
今天打了ACTF-2023,驚呼已經(jīng)不認(rèn)識(shí)seccomp了,在被一道盲打題折磨了一整天之后,實(shí)在是不想面向題目高強(qiáng)度學(xué)習(xí)了。但是seccomp這個(gè)東西必然是要系統(tǒng)性的重學(xué)一遍了,絕不能把知識(shí)面僅限于orw。
學(xué)習(xí)目標(biāo):了解seccomp的保護(hù)原理,掌握常用的seccomp繞過(guò)姿勢(shì),學(xué)會(huì)手寫(xiě)seccomp BPF指令等。
0x01. seccomp規(guī)則添加原理
說(shuō)到seccomp,都知道它是用來(lái)限制進(jìn)程的系統(tǒng)調(diào)用的,但是對(duì)于Linux系統(tǒng)而言,有這么多的進(jìn)程,seccomp又是如何精準(zhǔn)攔截定義了規(guī)則的進(jìn)程中調(diào)用的非法的系統(tǒng)調(diào)用呢?
這就又不得不進(jìn)入一個(gè)令人不適的環(huán)節(jié)了——Linux源代碼閱讀。
在目前使用的Linux系統(tǒng)中,有兩個(gè)系統(tǒng)調(diào)用與seccomp有關(guān),一個(gè)是prctl
,另一個(gè)是seccomp
,系統(tǒng)調(diào)用號(hào)分別為157和317,對(duì)應(yīng)的內(nèi)核函數(shù)為sys_prctl
和sys_seccomp
:
SYSCALL_DEFINE3(seccomp, unsigned int, op, unsigned int, flags,void __user *, uargs)
{return do_seccomp(op, flags, uargs);
}
SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,unsigned long, arg4, unsigned long, arg5)
{...switch (option) {...case PR_GET_SECCOMP:error = prctl_get_seccomp();break;case PR_SET_SECCOMP:error = prctl_set_seccomp(arg2, (char __user *)arg3);break;...}...
}long prctl_set_seccomp(unsigned long seccomp_mode, void __user *filter)
{unsigned int op;void __user *uargs;switch (seccomp_mode) {case SECCOMP_MODE_STRICT:op = SECCOMP_SET_MODE_STRICT;/** Setting strict mode through prctl always ignored filter,* so make sure it is always NULL here to pass the internal* check in do_seccomp().*/uargs = NULL;break;case SECCOMP_MODE_FILTER:op = SECCOMP_SET_MODE_FILTER;uargs = filter;break;default:return -EINVAL;}/* prctl interface doesn't have flags, so they are always zero. */return do_seccomp(op, 0, uargs);
}
可以看到,如果將prctl
系統(tǒng)調(diào)用的第一個(gè)參數(shù)設(shè)置為PR_SET_SECCOMP
,最終調(diào)用的與sys_seccomp
相同,都是do_seccomp
。這也是設(shè)置seccomp
規(guī)則的入口函數(shù)。
/* Common entry point for both prctl and syscall. */
static long do_seccomp(unsigned int op, unsigned int flags,void __user *uargs)
{switch (op) {case SECCOMP_SET_MODE_STRICT:if (flags != 0 || uargs != NULL)return -EINVAL;return seccomp_set_mode_strict();case SECCOMP_SET_MODE_FILTER:return seccomp_set_mode_filter(flags, uargs);case SECCOMP_GET_ACTION_AVAIL:if (flags != 0)return -EINVAL;return seccomp_get_action_avail(uargs);case SECCOMP_GET_NOTIF_SIZES:if (flags != 0)return -EINVAL;return seccomp_get_notif_sizes(uargs);default:return -EINVAL;}
}
上面就是do_seccomp
函數(shù)的定義。我們要重點(diǎn)關(guān)注的是前面兩個(gè)switch分支,一個(gè)是SECCOMP_SET_MODE_STRICT
A. 默認(rèn)規(guī)則
添加默認(rèn)規(guī)則的邏輯在seccomp_set_mode_strict
中實(shí)現(xiàn):
static long seccomp_set_mode_strict(void)
{const unsigned long seccomp_mode = SECCOMP_MODE_STRICT;long ret = -EINVAL;spin_lock_irq(¤t->sighand->siglock);if (!seccomp_may_assign_mode(seccomp_mode))goto out;#ifdef TIF_NOTSCdisable_TSC();
#endifseccomp_assign_mode(current, seccomp_mode, 0);ret = 0;out:spin_unlock_irq(¤t->sighand->siglock);return ret;
}static inline bool seccomp_may_assign_mode(unsigned long seccomp_mode)
{assert_spin_locked(¤t->sighand->siglock);if (current->seccomp.mode && current->seccomp.mode != seccomp_mode)return false;return true;
}#define SECCOMP_MODE_STRICT 0
#define SECCOMP_MODE_FILTER 1
函數(shù)中的current
是一個(gè)task_struct
實(shí)例,表示當(dāng)前內(nèi)核進(jìn)程。在加鎖之后,調(diào)用了一個(gè)seccomp_may_assign_mode
函數(shù)用于判斷。從這個(gè)判斷函數(shù)可以發(fā)現(xiàn),當(dāng)我們使用BPF定義規(guī)則(此時(shí)mode為SECCOMP_MODE_FILTER
)時(shí),就不能再切換成嚴(yán)格模式了,否則該函數(shù)返回false
,直接跳過(guò)了規(guī)則修改流程。
隨后進(jìn)入主要的規(guī)則添加邏輯seccomp_assign_mode
函數(shù):
static inline void seccomp_assign_mode(struct task_struct *task,unsigned long seccomp_mode,unsigned long flags)
{assert_spin_locked(&task->sighand->siglock);task->seccomp.mode = seccomp_mode;/** Make sure SYSCALL_WORK_SECCOMP cannot be set before the mode (and* filter) is set.*/smp_mb__before_atomic();/* Assume default seccomp processes want spec flaw mitigation. */if ((flags & SECCOMP_FILTER_FLAG_SPEC_ALLOW) == 0)arch_seccomp_spec_mitigate(task);set_task_syscall_work(task, SECCOMP);
}/* Valid flags for SECCOMP_SET_MODE_FILTER */
#define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0)
#define SECCOMP_FILTER_FLAG_LOG (1UL << 1)
#define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2)
#define SECCOMP_FILTER_FLAG_NEW_LISTENER (1UL << 3)
#define SECCOMP_FILTER_FLAG_TSYNC_ESRCH (1UL << 4)
/* Received notifications wait in killable state (only respond to fatal signals) */
#define SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV (1UL << 5)#define set_task_syscall_work(t, fl) \set_bit(SYSCALL_WORK_BIT_##fl, &task_thread_info(t)->syscall_work)enum syscall_work_bit {SYSCALL_WORK_BIT_SECCOMP,SYSCALL_WORK_BIT_SYSCALL_TRACEPOINT,SYSCALL_WORK_BIT_SYSCALL_TRACE,SYSCALL_WORK_BIT_SYSCALL_EMU,SYSCALL_WORK_BIT_SYSCALL_AUDIT,SYSCALL_WORK_BIT_SYSCALL_USER_DISPATCH,SYSCALL_WORK_BIT_SYSCALL_EXIT_TRAP,
};
在這個(gè)函數(shù)之中,設(shè)置了當(dāng)前進(jìn)程的mode
,隨后出現(xiàn)了一個(gè)判斷,判斷成功時(shí)執(zhí)行arch_seccomp_spec_mitigate
函數(shù)。這個(gè)函數(shù)的內(nèi)部邏輯比較復(fù)雜,先略過(guò)。最后調(diào)用set_task_syscall_work
,這是一個(gè)宏定義,定義如上所示,就是設(shè)置一個(gè)位,表示這個(gè)線(xiàn)程已經(jīng)開(kāi)啟了seccomp檢查。
B. 自定義規(guī)則
對(duì)于自定義規(guī)則而言,添加的過(guò)程要復(fù)雜許多。
static long seccomp_set_mode_filter(unsigned int flags,const char __user *filter)
{const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;struct seccomp_filter *prepared = NULL;long ret = -EINVAL;int listener = -1;struct file *listener_f = NULL;/* Validate flags. */if (flags & ~SECCOMP_FILTER_FLAG_MASK)return -EINVAL;/** In the successful case, NEW_LISTENER returns the new listener fd.* But in the failure case, TSYNC returns the thread that died. If you* combine these two flags, there's no way to tell whether something* succeeded or failed. So, let's disallow this combination if the user* has not explicitly requested no errors from TSYNC.*/if ((flags & SECCOMP_FILTER_FLAG_TSYNC) &&(flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) &&((flags & SECCOMP_FILTER_FLAG_TSYNC_ESRCH) == 0))return -EINVAL;/** The SECCOMP_FILTER_FLAG_WAIT_KILLABLE_SENT flag doesn't make sense* without the SECCOMP_FILTER_FLAG_NEW_LISTENER flag.*/if ((flags & SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) &&((flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) == 0))return -EINVAL;/* Prepare the new filter before holding any locks. */prepared = seccomp_prepare_user_filter(filter);if (IS_ERR(prepared))return PTR_ERR(prepared);if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) {listener = get_unused_fd_flags(O_CLOEXEC);if (listener < 0) {ret = listener;goto out_free;}listener_f = init_listener(prepared);if (IS_ERR(listener_f)) {put_unused_fd(listener);ret = PTR_ERR(listener_f);goto out_free;}}/** Make sure we cannot change seccomp or nnp state via TSYNC* while another thread is in the middle of calling exec.*/if (flags & SECCOMP_FILTER_FLAG_TSYNC &&mutex_lock_killable(¤t->signal->cred_guard_mutex))goto out_put_fd;spin_lock_irq(¤t->sighand->siglock);if (!seccomp_may_assign_mode(seccomp_mode))goto out;if (has_duplicate_listener(prepared)) {ret = -EBUSY;goto out;}ret = seccomp_attach_filter(flags, prepared);if (ret)goto out;/* Do not free the successfully attached filter. */prepared = NULL;seccomp_assign_mode(current, seccomp_mode, flags);
out:spin_unlock_irq(¤t->sighand->siglock);if (flags & SECCOMP_FILTER_FLAG_TSYNC)mutex_unlock(¤t->signal->cred_guard_mutex);
out_put_fd:if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) {if (ret) {listener_f->private_data = NULL;fput(listener_f);put_unused_fd(listener);seccomp_notify_detach(prepared);} else {fd_install(listener, listener_f);ret = listener;}}
out_free:seccomp_filter_free(prepared);return ret;
}
函數(shù)中有很多的判斷條件,當(dāng)這些判斷條件不滿(mǎn)足時(shí),會(huì)直接返回一個(gè)錯(cuò)誤值。需要注意的是flags & ~SECCOMP_FILTER_FLAG_MASK = 0
,也就是flags
除了最低6位其他位必須全為0。
通過(guò)3個(gè)判斷之后,調(diào)用了seccomp_prepare_user_filter
函數(shù)初始化struct seccomp_filter
結(jié)構(gòu)體實(shí)例。
struct seccomp_filter {refcount_t refs;refcount_t users;bool log;bool wait_killable_recv;struct action_cache cache;struct seccomp_filter *prev;struct bpf_prog *prog;struct notification *notif;struct mutex notify_lock;wait_queue_head_t wqh;
};static struct seccomp_filter *
seccomp_prepare_user_filter(const char __user *user_filter)
{struct sock_fprog fprog;struct seccomp_filter *filter = ERR_PTR(-EFAULT);#ifdef CONFIG_COMPATif (in_compat_syscall()) {struct compat_sock_fprog fprog32;if (copy_from_user(&fprog32, user_filter, sizeof(fprog32)))goto out;fprog.len = fprog32.len;fprog.filter = compat_ptr(fprog32.filter);} else /* falls through to the if below. */
#endifif (copy_from_user(&fprog, user_filter, sizeof(fprog)))goto out;filter = seccomp_prepare_filter(&fprog);
out:return filter;
}struct sock_fprog { /* Required for SO_ATTACH_FILTER. */unsigned short len; /* Number of filter blocks */struct sock_filter __user *filter;
};struct sock_filter { /* Filter block */__u16 code; /* Actual filter code */__u8 jt; /* Jump true */__u8 jf; /* Jump false */__u32 k; /* Generic multiuse field */
};
從上面的結(jié)構(gòu)體定義和函數(shù)定義可以看出,我們傳入的用戶(hù)態(tài)指針需要是sock_fprog
結(jié)構(gòu)體實(shí)例,Linux中定義了一個(gè)seccomp規(guī)則的最大長(zhǎng)度為4096,即len必須位于(0,4096],上面的sock_filter
可以理解為seccomp沙箱的一條“指令”。在seccomp_prepare_user_filter
中也有一些檢查,通過(guò)返回值我們就可以知道是針對(duì)什么的檢查,后面兩個(gè)是EACCES
和ENOMEM
,一個(gè)是權(quán)限相關(guān),一個(gè)是內(nèi)存不夠,一般都不會(huì)發(fā)生。隨后就是將用戶(hù)傳遞的過(guò)濾器中的內(nèi)容保存到seccomp_filter
實(shí)例中返回。
初始化seccomp_filter
完成后,我們先略過(guò)后面對(duì)一些flags的特殊處理,判斷了一下是否能夠加載規(guī)則,隨后調(diào)用了seccomp_attach_filter
,主要是處理已有的flags,隨后將新的filter規(guī)則添加到頭部的位置,使用prev
屬性連接成一個(gè)單鏈表,如下所示。
static long seccomp_attach_filter(unsigned int flags,struct seccomp_filter *filter)
{unsigned long total_insns;struct seccomp_filter *walker;assert_spin_locked(¤t->sighand->siglock);/* Validate resulting filter length. */total_insns = filter->prog->len;for (walker = current->seccomp.filter; walker; walker = walker->prev)total_insns += walker->prog->len + 4; /* 4 instr penalty */if (total_insns > MAX_INSNS_PER_PATH)return -ENOMEM;.../** If there is an existing filter, make it the prev and don't drop its* task reference.*/filter->prev = current->seccomp.filter;seccomp_cache_prepare(filter);current->seccomp.filter = filter;atomic_inc(¤t->seccomp.filter_count);/* Now that the new filter is in place, synchronize to all threads. */if (flags & SECCOMP_FILTER_FLAG_TSYNC)seccomp_sync_threads(flags);return 0;
}
以上就是過(guò)濾器添加的大致流程。
0x02. seccomp沙箱“指令”格式
seccomp沙箱的每一條指令的長(zhǎng)度都是8字節(jié),分為4個(gè)字段——code、jt、jf、k。
struct sock_filter { /* Filter block */__u16 code; /* Actual filter code */__u8 jt; /* Jump true */__u8 jf; /* Jump false */__u32 k; /* Generic multiuse field */
};
在Linux中定義了一些方便編寫(xiě)seccomp code的宏定義(code含義定義在 /include/uapi/linux/bpf_common.h
中),這里引用資料中的注釋便于理解:
#ifndef BPF_STMT
#define BPF_STMT(code, k) { (unsigned short)(code), 0, 0, k }
#endif
#ifndef BPF_JUMP
#define BPF_JUMP(code, k, jt, jf) { (unsigned short)(code), jt, jf, k }
#endif/* Instruction classes */
#define BPF_CLASS(code) ((code) & 0x07) //指定操作的類(lèi)別
#define BPF_LD 0x00 //將值復(fù)制到累加器中
#define BPF_LDX 0x01 //將值加載到索引寄存器中
#define BPF_ST 0x02 //將累加器中的值存到暫存器
#define BPF_STX 0x03 //將索引寄存器的值存儲(chǔ)在暫存器中
#define BPF_ALU 0x04 //用索引寄存器或常數(shù)作為操作數(shù)在累加器上執(zhí)行算數(shù)或邏輯運(yùn)算
#define BPF_JMP 0x05 //跳轉(zhuǎn)
#define BPF_RET 0x06 //返回
#define BPF_MISC 0x07 // 其他類(lèi)別/* ld/ldx fields */
#define BPF_SIZE(code) ((code) & 0x18)
#define BPF_W 0x00 /* 32-bit */ //字
#define BPF_H 0x08 /* 16-bit */ //半字
#define BPF_B 0x10 /* 8-bit */ //字節(jié)
/* eBPF BPF_DW 0x18 64-bit */ //雙字
#define BPF_MODE(code) ((code) & 0xe0)
#define BPF_IMM 0x00 //常數(shù)
#define BPF_ABS 0x20 //固定偏移量的數(shù)據(jù)包數(shù)據(jù)(絕對(duì)偏移)
#define BPF_IND 0x40 //可變偏移量的數(shù)據(jù)包數(shù)據(jù)(相對(duì)偏移)
#define BPF_MEM 0x60 //暫存器中的一個(gè)字
#define BPF_LEN 0x80 //數(shù)據(jù)包長(zhǎng)度
#define BPF_MSH 0xa0/* alu/jmp fields */
#define BPF_OP(code) ((code) & 0xf0) //當(dāng)操作碼類(lèi)型為ALU時(shí),指定具體運(yùn)算符
#define BPF_ADD 0x00
#define BPF_SUB 0x10
#define BPF_MUL 0x20
#define BPF_DIV 0x30
#define BPF_OR 0x40
#define BPF_AND 0x50
#define BPF_LSH 0x60
#define BPF_RSH 0x70
#define BPF_NEG 0x80
#define BPF_MOD 0x90
#define BPF_XOR 0xa0//當(dāng)操作碼是jmp時(shí)指定跳轉(zhuǎn)類(lèi)型
#define BPF_JA 0x00
#define BPF_JEQ 0x10
#define BPF_JGT 0x20
#define BPF_JGE 0x30
#define BPF_JSET 0x40
#define BPF_SRC(code) ((code) & 0x08)
#define BPF_K 0x00 //常數(shù)
#define BPF_X 0x08 //索引寄存器
在筆者查資料的時(shí)候,發(fā)現(xiàn)這個(gè)BPF不僅能用來(lái)編寫(xiě)seccomp規(guī)則,它更像是一個(gè)較為成熟的匯編語(yǔ)言+膠水語(yǔ)言,并在2014年就擁有了自己的執(zhí)行引擎eBPF。這又是一個(gè)完全的知識(shí)體系。
網(wǎng)絡(luò)上針對(duì)BPF大多是通過(guò)C等進(jìn)行編譯獲得BPF代碼,但對(duì)于seccomp而言,我們要做的是直接編寫(xiě)B(tài)PF code。但專(zhuān)用于seccomp的BPF除了通用的BPF語(yǔ)法之外,還有一些額外的定義:
/** All BPF programs must return a 32-bit value.* The bottom 16-bits are for optional return data.* The upper 16-bits are ordered from least permissive values to most,* as a signed value (so 0x8000000 is negative).** The ordering ensures that a min_t() over composed return values always* selects the least permissive choice.*/
#define SECCOMP_RET_KILL_PROCESS 0x80000000U /* kill the process */
#define SECCOMP_RET_KILL_THREAD 0x00000000U /* kill the thread */
#define SECCOMP_RET_KILL SECCOMP_RET_KILL_THREAD
#define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */
#define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */
#define SECCOMP_RET_USER_NOTIF 0x7fc00000U /* notifies userspace */
#define SECCOMP_RET_TRACE 0x7ff00000U /* pass to a tracer or disallow */
#define SECCOMP_RET_LOG 0x7ffc0000U /* allow after logging */
#define SECCOMP_RET_ALLOW 0x7fff0000U /* allow *//* Masks for the return value sections. */
#define SECCOMP_RET_ACTION_FULL 0xffff0000U
#define SECCOMP_RET_ACTION 0x7fff0000U
#define SECCOMP_RET_DATA 0x0000ffffU
上面定義了seccomp BPF的返回值,從注釋可知,返回值的低16bit用于傳遞其他數(shù)據(jù),高16bit用于傳遞返回值的優(yōu)先級(jí)。當(dāng)一個(gè)系統(tǒng)調(diào)用匹配了多個(gè)seccomp規(guī)則時(shí),會(huì)優(yōu)先使用優(yōu)先級(jí)高的返回值,這里從SECCOMP_RET_KILL_PROCESS
的優(yōu)先級(jí)最高,SECCOMP_RET_ALLOW
最低,如果一個(gè)系統(tǒng)調(diào)用匹配了兩個(gè)規(guī)則,返回值分別為SECCOMP_RET_KILL
和SECCOMP_RET_ALLOW
,那么最終將會(huì)選擇SECCOMP_RET_KILL
作為返回值,即殺死觸發(fā)這個(gè)系統(tǒng)調(diào)用的線(xiàn)程。
/*** struct seccomp_data - the format the BPF program executes over.* @nr: the system call number* @arch: indicates system call convention as an AUDIT_ARCH_* value* as defined in <linux/audit.h>.* @instruction_pointer: at the time of the system call.* @args: up to 6 system call arguments always stored as 64-bit values* regardless of the architecture.*/
struct seccomp_data {int nr;__u32 arch;__u64 instruction_pointer;__u64 args[6];
};
上面這段代碼定義了一些編寫(xiě)seccomp BPF code可能會(huì)用到的東西,根據(jù)注釋可知,我們可以在BPF code中獲取該系統(tǒng)調(diào)用的:系統(tǒng)調(diào)用號(hào)、處理器架構(gòu)、指令地址、6個(gè)參數(shù)的值。具體選擇獲取什么通過(guò)字段k來(lái)決定,k相當(dāng)于seccomp_data
結(jié)構(gòu)體的偏移量,若指定k=0
,則為獲取nr
,即系統(tǒng)調(diào)用號(hào),若k=4
,則為獲取處理器架構(gòu)等。
我們以一個(gè)實(shí)例對(duì)seccomp BPF code進(jìn)行理解,嘗試通過(guò)機(jī)器碼恢復(fù)code本身。
line CODE JT JF K
=================================0000: 0x20 0x00 0x00 0x00000004 LD | ABS | Word, R0 = arch0001: 0x15 0x00 0x19 0xc000003e JMP | JEQ after 0x19, R0 == AUDIT_ARCH_X86_64 ?0002: 0x20 0x00 0x00 0x00000000 LD | ABS | Word, R0 = nr0003: 0x35 0x00 0x01 0x40000000 JMP | JGE after 0x01, R0 >= 0x40000000 ?0004: 0x15 0x00 0x16 0xffffffff JMP | JEQ after 0x16, R0 == 0xFFFFFFFF ?0005: 0x15 0x15 0x00 0x00000000 JMP | JEQ after 0x15, R0 == 0 ?0006: 0x15 0x14 0x00 0x00000001 JMP | JEQ after 0x14, R0 == 1 ?0007: 0x15 0x13 0x00 0x00000002 JMP | JEQ after 0x13, R0 == 2 ?...0026: 0x06 0x00 0x00 0x7fff0000 return SECCOMP_RET_ALLOW0027: 0x06 0x00 0x00 0x00000000 return SECCOMP_RET_KILL
注意第二行的K字段,這里的K指的是AUDIT_ARCH_X86_64
,定義于/include/uapi/linux/audit.h
,其中為所有架構(gòu)都定義了獨(dú)特的標(biāo)識(shí)符,而0xc000003e則是AUDIT_ARCH_X86_64
的值。對(duì)于整個(gè)seccomp code而言,可能需要的外部數(shù)據(jù)也就只有seccomp_data
了。
下面,我們就來(lái)通過(guò)一些具體的程序示例鞏固一下我們的學(xué)習(xí)成果,使用seccomp BPF code完成自定義的filter規(guī)則。
實(shí)例
Task 01
實(shí)現(xiàn)seccomp BPF filter,過(guò)濾x86-64之外所有架構(gòu)的所有系統(tǒng)調(diào)用,過(guò)濾execve。
實(shí)現(xiàn)代碼:
#include <stdio.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/unistd.h>
#include <linux/audit.h>
#include <stddef.h>int main(){struct sock_filter filter[] = {BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, arch)),BPF_JUMP(BPF_JMP | BPF_JEQ, AUDIT_ARCH_X86_64, 0, 4),BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, nr)),BPF_STMT(BPF_ALU | BPF_K | BPF_SUB, 59),BPF_JUMP(BPF_JMP | BPF_JEQ, 0, 0, 1),BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL)};struct sock_fprog prog = {.len = (unsigned short)(sizeof(filter) / sizeof(struct sock_filter)),.filter = filter,};prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);system("echo HELLO");
}
上述代碼實(shí)現(xiàn)了對(duì)處理器架構(gòu)與execve的檢查,使用了一個(gè)ALU
類(lèi)型指令將系統(tǒng)調(diào)用號(hào)減去59,隨后與0相比較。
對(duì)于seccomp BPF code而言,使用一個(gè)寄存器實(shí)際上已經(jīng)足夠了,對(duì)于多個(gè)返回值,我們可以在BPF code的最后幾行進(jìn)行統(tǒng)一定義,在編寫(xiě)前面的代碼時(shí),由于跳轉(zhuǎn)指令的數(shù)量不確定,有時(shí)可能需要預(yù)留跳轉(zhuǎn)數(shù),在code編寫(xiě)完成后再進(jìn)行計(jì)算。而對(duì)于seccomp的多個(gè)檢查,我們完全可以將code除了返回之外的所有代碼分片看待,每一片都進(jìn)行一個(gè)檢查,不同分片之間互不影響,每個(gè)分片中只使用一個(gè)寄存器即可完成檢查,因此總的seccomp BPF code也只需要一個(gè)寄存器即可實(shí)現(xiàn),這就使得我們不需要了解所有的BPF指令即可完美編寫(xiě)seccomp BPF filter。
在加載seccomp規(guī)則之前,代碼中還執(zhí)行了一次prctl
。這里引用參考資料:
PR_SET_NO_NEW_PRIVS():是在Linux 3.5 之后引入的特性,當(dāng)一個(gè)進(jìn)程或者子進(jìn)程設(shè)置了PR_SET_NO_NEW_PRIVS 屬性,則其不能訪問(wèn)一些無(wú)法共享的操作,如setuid、chroot等。配置seccomp-BPF的程序必須擁有Capabilities 中 的CAP_SYS_ADMIN,或者程序已經(jīng)定義了no_new_privs屬性。 若不這樣做 非 root 用戶(hù)使用該程序時(shí) seccomp保護(hù)將會(huì)失效,設(shè)置了 PR_SET_NO_NEW_PRIVS 位后能保證 seccomp 對(duì)所有用戶(hù)都能起作用
Task 02
實(shí)現(xiàn)seccomp BPF filter,過(guò)濾x86-64之外所有架構(gòu)的所有系統(tǒng)調(diào)用,不允許第一個(gè)參數(shù)為3的read系統(tǒng)調(diào)用。
實(shí)現(xiàn)代碼:
#include <stdio.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/unistd.h>
#include <linux/audit.h>
#include <stddef.h>
#include <fcntl.h>int main(){struct sock_filter filter[] = {BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, arch)),BPF_JUMP(BPF_JMP | BPF_JEQ, AUDIT_ARCH_X86_64, 0, 5),BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, nr)),BPF_JUMP(BPF_JMP | BPF_JEQ, 0, 0, 2),BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, args[0])),BPF_JUMP(BPF_JMP | BPF_JEQ, 3, 1, 0),BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL)};struct sock_fprog prog = {.len = (unsigned short)(sizeof(filter) / sizeof(struct sock_filter)),.filter = filter,};prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);int fd = open("/bin/ls", 0);char buffer[8];printf("%d\n", fd);read(fd, buffer, 8);
}
注意BPF_JUMP
宏定義的使用,后面的2個(gè)參數(shù)分別表示條件成立時(shí)跳過(guò)前面幾條指令,條件不成立時(shí)跳過(guò)前面幾條指令。在上面的代碼中,首先判斷處理器架構(gòu),如果不是x86_64則跳轉(zhuǎn)到KILL
,隨后首先判斷系統(tǒng)調(diào)用號(hào)是不是3,不是則跳轉(zhuǎn)到ALLOW
,是則繼續(xù)執(zhí)行,判斷第一個(gè)參數(shù)是不是3,如果是則跳轉(zhuǎn)到KILL
。
0x03. 總結(jié)
本文簡(jiǎn)要分析了seccomp添加規(guī)則的流程,以及seccomp BPF的編寫(xiě)方法。
在后面的文章中,我們將嘗試盡可能分析CTF pwn題中所有與seccomp有關(guān)的繞過(guò)姿勢(shì),并通過(guò)具體的示例進(jìn)行學(xué)習(xí)。