技术标签: 趣谈Linux操作系统随笔 linux 内核 glibc
以open()
函数为例子,进行源码分析,由于我们是在linux基础上进行分析,所以所在的文件目录结构为:sysdeps\unix
在这个目录下有**syscalls.list
,里面列着所有glibc的函数对应的系统调用**:
# File name Caller(系统调用文件名字) Syscall name(系统调用名) Args Strong name Weak names
# ...... #
open - open Ci:siv __libc_open __open open
# ...... #
同时还有一个脚本文件make-syscall.sh
(文件包装器),根据当前目录下的syscalls.list
文件,自动的对于每一个封装好的系统调用,生成一个文件,其中包装的规则在syscall-template.S
,open()
就自动生成了名为open.c的文件
#!/bin/sh
# Usage: make-syscalls.sh ../sysdeps/unix/common
# Expects $sysdirs in environment.
##############################################################################
#
# 这个脚本用于处理在各种系统调用中编码的系统调用数据。列出用于围绕适当的OS系统调用生成精简程序集系统调用包装器的文件。
# 看到syscall-template.S。更多细节在实际的包装。
#
# Syscall Signature Prefixes:
#
# E: errno and return value are not set by the call
# V: errno is not set, but errno or zero (success) is returned from the call
#
# Syscall Signature Key Letters:
#
# a: unchecked address (e.g., 1st arg to mmap)
# b: non-NULL buffer (e.g., 2nd arg to read; return value from mmap)
# B: optionally-NULL buffer (e.g., 4th arg to getsockopt)
# f: buffer of 2 ints (e.g., 4th arg to socketpair)
# F: 3rd arg to fcntl
# i: scalar (any signedness & size: int, long, long long, enum, whatever)
# I: 3rd arg to ioctl
# n: scalar buffer length (e.g., 3rd arg to read)
# N: pointer to value/return scalar buffer length (e.g., 6th arg to recvfrom)
# p: non-NULL pointer to typed object (e.g., any non-void* arg)
# P: optionally-NULL pointer to typed object (e.g., 2nd argument to gettimeofday)
# s: non-NULL string (e.g., 1st arg to open)
# S: optionally-NULL string (e.g., 1st arg to acct)
# v: vararg scalar (e.g., optional 3rd arg to open)
# V: byte-per-page vector (3rd arg to mincore)
# W: wait status, optionally-NULL pointer to int (e.g., 2nd arg of wait4)
#
##############################################################################
thisdir=$1; shift
echo ''
echo \#### DIRECTORY = $thisdir
# 检查每个比这个优先级更高的sysdep dir,并从$调用中删除在其他dir中找到的所有函数,当我们到达定义这些系统调用的目录时
sysdirs=`for dir in $sysdirs; do
test $dir = $thisdir && break; echo $dir; done`
echo \#### SYSDIRS = $sysdirs
# 在当前目录下获取系统调用列表
calls=`sed 's/#.*$//
/^[ ]*$/d' $thisdir/syscalls.list`
calls=`echo "$calls" |
while read file caller rest; do
# 删除由$dir中的文件实现的每个系统调用。
# 如果一个系统调用指定了一个“调用者”,那么只有当调用者函数也在这个目录中实现时才编译那个系统调用。
srcfile=-;
for dir in $sysdirs; do
{
test -f $dir/$file.c && srcfile=$dir/$file.c; } ||
{
test -f $dir/$file.S && srcfile=$dir/$file.S; } ||
{
test x$caller != x- &&
{
{
test -f $dir/$caller.c && srcfile=$dir/$caller.c; } ||
{
test -f $dir/$caller.S && srcfile=$dir/$caller.S; }; }; } && break;
done;
echo $file $srcfile $caller $rest;
done`
# Any calls left?
test -n "$calls" || exit 0
# This uses variables $weak, $strong, and $any_versioned.
emit_weak_aliases()
{
# A shortcoming in the current gas is that it will only allow one
# version-alias per symbol. So we create new strong aliases as needed.
vcount=""
# We use the <shlib-compat.h> macros to generate the versioned aliases
# so that the version sets can be mapped to the configuration's
# minimum version set as per shlib-versions DEFAULT lines. If an
# entry point is specified in the form NAME@VERSION:OBSOLETED, a
# SHLIB_COMPAT conditional is generated.
if [ $any_versioned = t ]; then
echo " echo '#include <shlib-compat.h>'; \\"
fi
for name in $weak; do
case $name in
*@@*)
base=`echo $name | sed 's/@@.*//'`
ver=`echo $name | sed 's/.*@@//;s/\./_/g'`
echo " echo '#if IS_IN (libc)'; \\"
if test -z "$vcount" ; then
source=$strong
vcount=1
else
source="${strong}_${vcount}"
vcount=`expr $vcount + 1`
echo " echo 'strong_alias ($strong, $source)'; \\"
fi
echo " echo 'versioned_symbol (libc, $source, $base, $ver)'; \\"
echo " echo '#else'; \\"
echo " echo 'weak_alias ($strong, $base)'; \\"
echo " echo '#endif'; \\"
;;
*@*)
base=`echo $name | sed 's/@.*//'`
ver=`echo $name | sed 's/.*@//;s/\./_/g'`
case $ver in
*:*)
compat_ver=${
ver#*:}
ver=${ver%%:*}
compat_cond=" && SHLIB_COMPAT (libc, $ver, $compat_ver)"
;;
*)
compat_cond=
;;
esac
echo " echo '#if defined SHARED && IS_IN (libc)$compat_cond'; \\"
if test -z "$vcount" ; then
source=$strong
vcount=1
else
source="${strong}_${vcount}"
vcount=`expr $vcount + 1`
echo " echo 'strong_alias ($strong, $source)'; \\"
fi
echo " echo 'compat_symbol (libc, $source, $base, $ver)'; \\"
echo " echo '#endif'; \\"
;;
!*)
name=`echo $name | sed 's/.//'`
echo " echo 'strong_alias ($strong, $name)'; \\"
echo " echo 'hidden_def ($name)'; \\"
;;
*)
echo " echo 'weak_alias ($strong, $name)'; \\"
echo " echo 'hidden_weak ($name)'; \\"
;;
esac
done
}
# 发出规则来编译剩余的系统调用$calls.
echo "$calls" |
while read file srcfile caller syscall args strong weak; do
vdso_syscall=
case x"$syscall" in
*:*@*)
vdso_syscall="${syscall#*:}"
syscall="${syscall%:*}"
;;
esac
case x"$syscall" in
x-) callnum=_ ;;
*)
# 判断$syscall是否在syscall.h中定义了一个数字
callnum=-
eval `{
echo "#include <sysdep.h>";
echo "callnum=SYS_ify ($syscall)"; } |
$asm_CPP -D__OPTIMIZE__ - |
sed -n -e "/^callnum=.*$syscall/d" \
-e "/^\(callnum=\)[ ]*\(.*\)/s//\1'\2'/p"`
;;
esac
noerrno=0
errval=0
case $args in
E*) noerrno=1; args=`echo $args | sed 's/E:\?//'`;;
V*) errval=1; args=`echo $args | sed 's/V:\?//'`;;
esac
# 根据信息派生出参数的数目
case $args in
[0-9]) nargs=$args;;
?:) nargs=0;;
?:?) nargs=1;;
?:??) nargs=2;;
?:???) nargs=3;;
?:????) nargs=4;;
?:?????) nargs=5;;
?:??????) nargs=6;;
?:???????) nargs=7;;
?:????????) nargs=8;;
?:?????????) nargs=9;;
esac
# Make sure only the first syscall rule is used, if multiple dirs
# define the same syscall.
echo ''
echo "#### CALL=$file NUMBER=$callnum ARGS=$args SOURCE=$srcfile"
# 如果多个dirs定义相同的系统调用,请确保只使用第一个系统调用规则
any_versioned=f
shared_only=f
case $weak in
*@@*) any_versioned=t ;;
*@*) any_versioned=t shared_only=t ;;
esac
case x$srcfile"$callnum" in
x--)
# 额外系统调用的未定义调用。
if [ x$caller != x- ]; then
if [ $noerrno != 0 ]; then
echo >&2 "$0: no number for $fileno, no-error syscall ($strong $weak)"
exit 2
fi
echo "unix-stub-syscalls += $strong $weak"
fi
;;
x*-) ;; ### 对于未定义的调用不做任何事情
x-*)
echo "ifeq (,\$(filter $file,\$(unix-syscalls)))"
if test $shared_only = t; then
# The versioned symbols are only in the shared library.
echo "ifneq (,\$(filter .os,\$(object-suffixes)))"
fi
# Accumulate the list of syscall files for this directory.
echo "unix-syscalls += $file"
test x$caller = x- || echo "unix-extra-syscalls += $file"
# Emit a compilation rule for this syscall.
if test $shared_only = t; then
# The versioned symbols are only in the shared library.
echo "shared-only-routines += $file"
test -n "$vdso_syscall" || echo "\$(objpfx)${file}.os: \\"
else
object_suffixes='$(object-suffixes)'
test -z "$vdso_syscall" || object_suffixes='$(object-suffixes-noshared)'
echo "\
\$(foreach p,\$(sysd-rules-targets),\
\$(foreach o,${
object_suffixes},\$(objpfx)\$(patsubst %,\$p,$file)\$o)): \\"
fi
echo " \$(..)sysdeps/unix/make-syscalls.sh"
case x"$callnum" in
x_)
echo "\
\$(make-target-directory)
(echo '/* Dummy module requested by syscalls.list */'; \\"
;;
x*)
echo "\
\$(make-target-directory)
(echo '#define SYSCALL_NAME $syscall'; \\
echo '#define SYSCALL_NARGS $nargs'; \\
echo '#define SYSCALL_SYMBOL $strong'; \\
echo '#define SYSCALL_NOERRNO $noerrno'; \\
echo '#define SYSCALL_ERRVAL $errval'; \\
echo '#include <syscall-template.S>'; \\"
;;
esac
# Append any weak aliases or versions defined for this syscall function.
emit_weak_aliases
# And finally, pipe this all into the compiler.
echo ' ) | $(compile-syscall) '"\
\$(foreach p,\$(patsubst %$file,%,\$(basename \$(@F))),\$(\$(p)CPPFLAGS))"
if test -n "$vdso_syscall"; then
# In the shared library, we're going to emit an IFUNC using a vDSO function.
# $vdso_syscall looks like "name@KERNEL_X.Y" where "name" is the symbol
# name in the vDSO and KERNEL_X.Y is its symbol version.
vdso_symbol="${vdso_syscall%@*}"
vdso_symver="${vdso_syscall#*@}"
vdso_symver=`echo "$vdso_symver" | sed 's/\./_/g'`
cat <<EOF
\$(foreach p,\$(sysd-rules-targets),\$(objpfx)\$(patsubst %,\$p,$file).os): \\
\$(..)sysdeps/unix/make-syscalls.sh
\$(make-target-directory)
(echo '#define ${strong} __redirect_${strong}'; \\
echo '#include <dl-vdso.h>'; \\
echo '#undef ${strong}'; \\
echo '#define vdso_ifunc_init() \\'; \\
echo ' PREPARE_VERSION_KNOWN (symver, ${vdso_symver})'; \\
echo '__ifunc (__redirect_${strong}, ${strong},'; \\
echo ' _dl_vdso_vsym ("${vdso_symbol}", &symver), void,'; \\
echo ' vdso_ifunc_init)'; \\
EOF
# This is doing "hidden_def (${strong})", but the compiler
# doesn't know that we've defined ${strong} in the same file, so
# we can't do it the normal way.
cat <<EOF
echo 'asm (".globl __GI_${strong}");'; \\
echo 'asm ("__GI_${strong} = ${strong}");'; \\
EOF
emit_weak_aliases
cat <<EOF
) | \$(compile-stdin.c) \
\$(foreach p,\$(patsubst %$file,%,\$(basename \$(@F))),\$(\$(p)CPPFLAGS))
EOF
fi
if test $shared_only = t; then
# The versioned symbols are only in the shared library.
echo endif
fi
echo endif
;;
esac
done
具体的包装规则syscall-template.S
/* Assembly code template for system call stubs.
Copyright (C) 2009-2019 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
/* The real guts of this work are in the macros defined in the
machine- and kernel-specific sysdep.h header file. Cancellable syscalls
should be implemented using C implementation with SYSCALL_CANCEL macro.
Each system call's object is built by a rule in sysd-syscalls
generated by make-syscalls.sh that #include's this file after
defining a few macros:
SYSCALL_NAME syscall name
SYSCALL_NARGS number of arguments this call takes
SYSCALL_SYMBOL primary symbol name
SYSCALL_NOERRNO 1 to define a no-errno version (see below)
SYSCALL_ERRVAL 1 to define an error-value version (see below)
We used to simply pipe the correct three lines below through cpp into
the assembler. The main reason to have this file instead is so that
stub objects can be assembled with -g and get source line information
that leads a user back to a source file and these fine comments. The
average user otherwise has a hard time knowing which "syscall-like"
functions in libc are plain stubs and which have nontrivial C wrappers.
Some versions of the "plain" stub generation macros are more than a few
instructions long and the untrained eye might not distinguish them from
some compiled code that inexplicably lacks source line information. */
#include <sysdep.h>
/* This indirection is needed so that SYMBOL gets macro-expanded. */
#define syscall_hidden_def(SYMBOL) hidden_def (SYMBOL)
#define T_PSEUDO(SYMBOL, NAME, N) PSEUDO (SYMBOL, NAME, N)
#define T_PSEUDO_NOERRNO(SYMBOL, NAME, N) PSEUDO_NOERRNO (SYMBOL, NAME, N)
#define T_PSEUDO_ERRVAL(SYMBOL, NAME, N) PSEUDO_ERRVAL (SYMBOL, NAME, N)
#define T_PSEUDO_END(SYMBOL) PSEUDO_END (SYMBOL)
#define T_PSEUDO_END_NOERRNO(SYMBOL) PSEUDO_END_NOERRNO (SYMBOL)
#define T_PSEUDO_END_ERRVAL(SYMBOL) PSEUDO_END_ERRVAL (SYMBOL)
#if SYSCALL_NOERRNO
/* This kind of system call stub never returns an error.
We return the return value register to the caller unexamined. */
T_PSEUDO_NOERRNO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
ret_NOERRNO
T_PSEUDO_END_NOERRNO (SYSCALL_SYMBOL)
#elif SYSCALL_ERRVAL
/* This kind of system call stub returns the errno code as its return
value, or zero for success. We may massage the kernel's return value
to meet that ABI, but we never set errno here. */
T_PSEUDO_ERRVAL (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
ret_ERRVAL
T_PSEUDO_END_ERRVAL (SYSCALL_SYMBOL)
#else
/* This is a "normal" system call stub: if there is an error,
it returns -1 and sets errno. */
T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
ret
T_PSEUDO_END (SYSCALL_SYMBOL)
#endif
syscall_hidden_def (SYSCALL_SYMBOL)
通过查看源码可知,常见的系统调用使用的是如下代码:,其宏定义为#define T_PSEUDO(SYMBOL, NAME, N) PSEUDO (SYMBOL, NAME, N)
/* This is a "normal" system call stub: if there is an error,
it returns -1 and sets errno. */
T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
ret
T_PSEUDO_END (SYSCALL_SYMBOL)
#endif
syscall_hidden_def (SYSCALL_SYMBOL)
在sysdeps\unix\sysv\linux\sh\sysdep.h
目录下的sysdep.h
文件下可以找到宏定义PSEUDO
的具体实现逻辑
#define PSEUDO(name, syscall_name, args) \
.text; \
ENTRY (name); \
DO_CALL (syscall_name, args); \
mov r0,r1; \
mov _IMM12,r2; \
shad r2,r1; \
not r1,r1; \
tst r1,r1; \
bf .Lpseudo_end; \
SYSCALL_ERROR_HANDLER; \
.Lpseudo_end:
通过观察可知,几乎所有的系统调用最终都会调用DO_CALL
这个宏,但是这个宏在32位和64位的定义是不一样的。
在sysdeps\unix\sysv\linux\i386
目录下的sysdep.h
文件,有DO_CALL
的定义:
DO_CALL
可以看到,系统调用中的每个参数所存放的对应寄存器中如下所示(系统调用号保存在eax寄存器中),最终会执行ENTER_KERNEL
陷入内核
# define ENTER_KERNEL int $0x80
/* Linux takes system call arguments in registers:
syscall number %eax call-clobbered
arg 1 %ebx call-saved
arg 2 %ecx call-clobbered
arg 3 %edx call-clobbered
arg 4 %esi call-saved
arg 5 %edi call-saved
arg 6 %ebp call-saved
......
*/
#undef DO_CALL
#define DO_CALL(syscall_name, args) \
PUSHARGS_##args \
DOARGS_##args \
movl $SYS_ify (syscall_name), %eax; \
ENTER_KERNEL \
POPARGS_##args
通过宏定义可以看到,ENTER_KERNEL
的实质定义为int $0x80
,即触发一个软中断,通过它就可以陷入(trap)内核
即调用ENTRY(entry_INT80_32)
汇编函数,这个需要结合linux-4.19的源码来看,在linux-4.19-rc3\arch\x86\entry\entry_32.S
文件中
通过注释可以看到,其寄存器与glibc中的DO_CALL定义中的注释相同
/* Arguments:
* eax system call number
* ebx arg1
* ecx arg2
* edx arg3
* esi arg4
* edi arg5
* ebp arg6
*/
ENTRY(entry_INT80_32)
ASM_CLAC
pushl %eax /* pt_regs->orig_ax */
SAVE_ALL pt_regs_ax=$-ENOSYS switch_stacks=1 /* save rest */
/*
* User mode is traced as though IRQs are on, and the interrupt gate
* turned them off.
*/
TRACE_IRQS_OFF
movl %esp, %eax
call do_int80_syscall_32
.Lsyscall_32_done:
/*......*/
INTERRUPT_RETURN
其调用流程为:
通过push和SAVE_ALL将当前用户态的寄存器,保存在pt_regs结构里面;
然后调用do_int80_syscall_32
函数 ,进行系统调用;
调用INTERRUPT_RETURN
返回;
pt_regs结构体定义:
struct pt_regs {
/*
* NB: 32-bit x86 CPUs are inconsistent as what happens in the
* following cases (where %seg represents a segment register):
*
* - pushl %seg: some do a 16-bit write and leave the high
* bits alone
* - movl %seg, [mem]: some do a 16-bit write despite the movl
* - IDT entry: some (e.g. 486) will leave the high bits of CS
* and (if applicable) SS undefined.
*
* Fortunately, x86-32 doesn't read the high bits on POP or IRET,
* so we can just treat all of the segment registers as 16-bit
* values.
*/
unsigned long bx;
unsigned long cx;
unsigned long dx;
unsigned long si;
unsigned long di;
unsigned long bp;
unsigned long ax;
unsigned short ds;
unsigned short __dsh;
unsigned short es;
unsigned short __esh;
unsigned short fs;
unsigned short __fsh;
unsigned short gs;
unsigned short __gsh;
unsigned long orig_ax;
unsigned long ip;
unsigned short cs;
unsigned short __csh;
unsigned long flags;
unsigned long sp;
unsigned short ss;
unsigned short __ssh;
};
do_int80_syscall_32()实现:
#define ia32_sys_call_table sys_call_table
系统调用表中找到相应的函数进行调用/* Handles int $0x80 */
__visible void do_int80_syscall_32(struct pt_regs *regs)
{
enter_from_user_mode();
local_irq_enable();
do_syscall_32_irqs_on(regs);
}
/* ...... */
static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
{
struct thread_info *ti = current_thread_info();
unsigned int nr = (unsigned int)regs->orig_ax;
/* ...... */
/*
* It's possible that a 32-bit syscall implementation
* takes a 64-bit parameter but nonetheless assumes that
* the high bits are zero. Make sure we zero-extend all
* of the args.
*/
regs->ax = ia32_sys_call_table[nr](
(unsigned int)regs->bx, (unsigned int)regs->cx,
(unsigned int)regs->dx, (unsigned int)regs->si,
(unsigned int)regs->di, (unsigned int)regs->bp);
syscall_return_slowpath(regs);
}
INTERRUPT_RETURN定义:
iret指令将原来用户态保存的现场恢复回来,包含代码段、指令指针寄存器等,这时候用户态进程恢复执行。
#define INTERRUPT_RETURN iret
在32位中实现系统调用,以open为例子,过程如下:
在用户层执行open(char *pathname, int flags, mode_t mode)
;
进入glibc库:
2.1 最终会调用DO_CALL(syscall_name, args),在其中进行:
2.1.1 把系统调用号与参数保存到寄存器;
2.1.2 调用ENTER_KERNEL
陷入内核:
2.1.2.1 通过push和SAVE_ALL将当前用户态的寄存器,保存在pt_regs结构里面;
2.1.2.2 然后调用do_int80_syscall_32
函数 ,进行系统调用:(进入内核)
2.1.2.2.1 将系统调用号从eax里面取出来;
2.1.2.2.2 根据系统调用号,在#define ia32_sys_call_table sys_call_table
系统调用表中找到相应的函数进行调用;
2.1.2.2.3 寄存器中保存的参数取出来,作为函数参数;
2.1.3 调用INTERRUPT_RETURN
返回,恢复用户态;
DO_CALL
在sysdeps\unix\sysv\linux\x86_64\sysdep.h
文件下
syscall
陷入内核/* The Linux/x86-64 kernel expects the system call parameters in
registers according to the following table:
syscall number rax
arg 1 rdi
arg 2 rsi
arg 3 rdx
arg 4 r10
arg 5 r8
arg 6 r9
*/
# undef DO_CALL
# define DO_CALL(syscall_name, args) \
DOARGS_##args \
movl $SYS_ify (syscall_name), %eax; \
syscall;
DO_CALL
中陷入内核syscall
syscall
指令还使用了一种特殊的寄存器,我们叫特殊模块寄存器(Model Specific Registers,简称MSR)。
这种寄存器是CPU为了完成某些特殊控制功能为目的的寄存器,其中就有系统调用。
rdmsr
和wrmsr
是用来读写特殊模块寄存器的。
wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
MSR_LSTAR
就是这样一个特殊的寄存器,当syscall指令调用的时候,会从这个寄存器里面拿出函数地址来调用,也就是调用entry_SYSCALL_64
。
在arch/x86/entry/entry_64.S
中定义了entry_SYSCALL_64。
ENTRY(entry_SYSCALL_64)
/* ....... */
/* Construct struct pt_regs on stack */
pushq $__USER_DS /* pt_regs->ss */
pushq PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */
pushq %r11 /* pt_regs->flags */
pushq $__USER_CS /* pt_regs->cs */
pushq %rcx /* pt_regs->ip */
GLOBAL(entry_SYSCALL_64_after_hwframe)
pushq %rax /* pt_regs->orig_ax */
PUSH_AND_CLEAR_REGS rax=$-ENOSYS
TRACE_IRQS_OFF
/* IRQs are off. */
movq %rax, %rdi
movq %rsp, %rsi
call do_syscall_64 /* returns with IRQs disabled */
/* ...... */
syscall_return_via_sysret:
/* rcx and r11 are already restored (see code above) */
UNWIND_HINT_EMPTY
POP_REGS pop_rdi=0 skip_r11rcx=1
/*
* Now all regs are restored except RSP and RDI.
* Save old stack pointer and switch to trampoline stack.
*/
movq %rsp, %rdi
movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
pushq RSP-RDI(%rdi) /* RSP */
pushq (%rdi) /* RDI */
/*
* We are on the trampoline stack. All regs except RDI are live.
* We can do future final exit work right here.
*/
SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
popq %rdi
popq %rsp
USERGS_SYSRET64
END(entry_SYSCALL_64)
其调用流程为:
通过pushq将当前用户态的寄存器,保存在pt_regs结构里面;
然后调用do_syscall_64
函数 ,进行系统调用;
调用USERGS_SYSRET64
返回;
do_syscall_64
函数实现:
过程为:
__visible void do_syscall_64(struct pt_regs *regs)
{
struct thread_info *ti = current_thread_info();
unsigned long nr = regs->orig_ax;
......
if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
regs->ax = sys_call_table[nr & __SYSCALL_MASK](
regs->di, regs->si, regs->dx,
regs->r10, regs->r8, regs->r9);
}
syscall_return_slowpath(regs);
}
所以,无论是32位,还是64位,都会到系统调用表sys_call_table
这里来。
USERGS_SYSRET64
:
sysretq
指令将原来用户态保存的现场恢复回来,包含代码段、指令指针寄存器等,这时候用户态进程恢复执行。
#define USERGS_SYSRET64 \
swapgs; \
sysretq;
在64位中实现系统调用,以open为例子,过程如下:
在用户层执行open(char *pathname, int flags, mode_t mode)
;
进入glibc库:
2.1 最终会调用DO_CALL(syscall_name, args),在其中进行:
2.1.1 把系统调用号与参数保存到寄存器;
2.1.2 调用syscall
陷入内核:
2.1.2.1 通过pushq将当前用户态的寄存器,保存在pt_regs结构里面;
2.1.2.2 然后调用do_syscall_64
函数 ,进行系统调用:(进入内核)
2.1.2.2.1 将系统调用号从rax里面取出来;
2.1.2.2.2 根据系统调用号,在系统调用表sys_call_table中找到相应的函数进行调用;
2.1.2.2.3 寄存器中保存的参数取出来,作为函数参数;
2.1.3 调用USERGS_SYSRET64
返回,恢复用户态;
位置:arch\x86\entry\syscalls\syscall_32.tbl
源码:
#
# 32-bit system call numbers and entry vectors
#
# The format is:
# <number> <abi> <name> <entry point> <compat entry point>
# <系统调用号> <寄存器保护规则> <系统调用名称> <系统调用在内核的实现函数(入口点)> <兼容入口点>
#
# The __ia32_sys and __ia32_compat_sys stubs are created on-the-fly for
# sys_*() system calls and compat_sys_*() compat system calls if
# IA32_EMULATION is defined, and expect struct pt_regs *regs as their only
# parameter.
#
# The abi is always "i386" for this file.
#
0 i386 restart_syscall sys_restart_syscall __ia32_sys_restart_syscall
1 i386 exit sys_exit __ia32_sys_exit
2 i386 fork sys_fork __ia32_sys_fork
3 i386 read sys_read __ia32_sys_read
4 i386 write sys_write __ia32_sys_write
5 i386 open sys_open __ia32_compat_sys_open
6 i386 close sys_close __ia32_sys_close
7 i386 waitpid sys_waitpid __ia32_sys_waitpid
8 i386 creat sys_creat __ia32_sys_creat
位置:include\linux\syscalls.h
源码:
/* __ARCH_WANT_SYSCALL_NO_AT */
asmlinkage long sys_open(const char __user *filename,
int flags, umode_t mode);
asmlinkage long sys_link(const char __user *oldname,
const char __user *newname);
asmlinkage long sys_unlink(const char __user *pathname);
以open函数为例子:
位置:fs\open.c
源码:
/* ...... */
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
{
if (force_o_largefile())
flags |= O_LARGEFILE;
return do_sys_open(AT_FDCWD, filename, flags, mode);
}
SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags,
umode_t, mode)
{
if (force_o_largefile())
flags |= O_LARGEFILE;
return do_sys_open(dfd, filename, flags, mode);
}
/* ...... */
可以看到其形式十分奇怪,通过查看syscalls.h
声明,可以知道是根据参数的数目选择对应的宏
#define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE4(name, ...) SYSCALL_DEFINEx(4, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE_MAXARGS 6
#define SYSCALL_DEFINEx(x, sname, ...) \
SYSCALL_METADATA(sname, x, __VA_ARGS__) \
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
#define __PROTECT(...) asmlinkage_protect(__VA_ARGS__)
/*
* The asmlinkage stub is aliased to a function named __se_sys_*() which
* sign-extends 32-bit ints to longs whenever needed. The actual work is
* done within __do_sys_*().
*/
#ifndef __SYSCALL_DEFINEx
#define __SYSCALL_DEFINEx(x, name, ...) \
__diag_push(); \
__diag_ignore(GCC, 8, "-Wattribute-alias", \
"Type aliasing is used to sanitize syscall arguments");\
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
__attribute__((alias(__stringify(__se_sys##name)))); \
ALLOW_ERROR_INJECTION(sys##name, ERRNO); \
static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
{ \
long ret = __do_sys##name(__MAP(x,__SC_CAST,__VA_ARGS__));\
__MAP(x,__SC_TEST,__VA_ARGS__); \
__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \
return ret; \
} \
__diag_pop(); \
static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
#endif /* __SYSCALL_DEFINEx */
宏展开后,SYSCALL_DEFINE3
得到其具体实现如下:
asmlinkage long sys_open(const char __user * filename, int flags, int mode)
{
long ret;
if (force_o_largefile())
flags |= O_LARGEFILE;
ret = do_sys_open(AT_FDCWD, filename, flags, mode);
asmlinkage_protect(3, ret, filename, flags, mode);
return ret;
}
利用arch/x86/entry/syscalls/Makefile
# SPDX-License-Identifier: GPL-2.0
out := arch/$(SRCARCH)/include/generated/asm #输出文件地址
uapi := arch/$(SRCARCH)/include/generated/uapi/asm
# 如果当前没有创建,则建立输出文件
_dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)') \
$(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)')
# 所需要的定义文件
syscall32 := $(srctree)/$(src)/syscall_32.tbl # 32位系统使用指定目录下syscall_32.tbl
syscall64 := $(srctree)/$(src)/syscall_64.tbl # 64位系统使用指定目录下syscall_64.tbl
# 所需要脚本文件的地址
syshdr := $(srctree)/$(src)/syscallhdr.sh
systbl := $(srctree)/$(src)/syscalltbl.sh
quiet_cmd_syshdr = SYSHDR $@
cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@' \
'$(syshdr_abi_$(basetarget))' \
'$(syshdr_pfx_$(basetarget))' \
'$(syshdr_offset_$(basetarget))'
quiet_cmd_systbl = SYSTBL $@
cmd_systbl = $(CONFIG_SHELL) '$(systbl)' $< $@
quiet_cmd_hypercalls = HYPERCALLS $@
cmd_hypercalls = $(CONFIG_SHELL) '$<' $@ $(filter-out $<,$^)
# 所需要依赖与指定的协议
syshdr_abi_unistd_32 := i386
$(uapi)/unistd_32.h: $(syscall32) $(syshdr)
$(call if_changed,syshdr)
syshdr_abi_unistd_32_ia32 := i386
syshdr_pfx_unistd_32_ia32 := ia32_
$(out)/unistd_32_ia32.h: $(syscall32) $(syshdr)
$(call if_changed,syshdr)
syshdr_abi_unistd_x32 := common,x32
syshdr_offset_unistd_x32 := __X32_SYSCALL_BIT
$(uapi)/unistd_x32.h: $(syscall64) $(syshdr)
$(call if_changed,syshdr)
syshdr_abi_unistd_64 := common,64
$(uapi)/unistd_64.h: $(syscall64) $(syshdr)
$(call if_changed,syshdr)
syshdr_abi_unistd_64_x32 := x32
syshdr_pfx_unistd_64_x32 := x32_
# 输出文件名与地址
$(out)/unistd_64_x32.h: $(syscall64) $(syshdr)
$(call if_changed,syshdr)
$(out)/syscalls_32.h: $(syscall32) $(systbl)
$(call if_changed,systbl)
$(out)/syscalls_64.h: $(syscall64) $(systbl)
$(call if_changed,systbl)
$(out)/xen-hypercalls.h: $(srctree)/scripts/xen-hypercalls.sh
$(call if_changed,hypercalls)
$(out)/xen-hypercalls.h: $(srctree)/include/xen/interface/xen*.h
# 建立联系并生成输出文件
uapisyshdr-y += unistd_32.h unistd_64.h unistd_x32.h
syshdr-y += syscalls_32.h
syshdr-$(CONFIG_X86_64) += unistd_32_ia32.h unistd_64_x32.h
syshdr-$(CONFIG_X86_64) += syscalls_64.h
syshdr-$(CONFIG_XEN) += xen-hypercalls.h
targets += $(uapisyshdr-y) $(syshdr-y)
PHONY += all
all: $(addprefix $(uapi)/,$(uapisyshdr-y))
all: $(addprefix $(out)/,$(syshdr-y))
@:
依赖两个脚本
第一个脚本arch/x86/entry/syscalls/syscallhdr.sh
,会在文件中生成#define NR_open
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
in="$1"
out="$2"
my_abis=`echo "($3)" | tr ',' '|'`
prefix="$4"
offset="$5"
fileguard=_ASM_X86_`basename "$out" | sed \
-e 'y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/' \
-e 's/[^A-Z0-9_]/_/g' -e 's/__/_/g'`
grep -E "^[0-9A-Fa-fXx]+[[:space:]]+${my_abis}" "$in" | sort -n | (
echo "#ifndef ${fileguard}"
echo "#define ${fileguard} 1"
echo ""
# 生成 #define NR_open
while read nr abi name entry ; do
if [ -z "$offset" ]; then
echo "#define __NR_${prefix}${name} $nr"
else
echo "#define __NR_${prefix}${name} ($offset + $nr)"
fi
done
echo ""
echo "#endif /* ${fileguard} */"
) > "$out"
第二个脚本arch/x86/entry/syscalls/syscalltbl.sh
,会在文件中生成SYSCALL(NR_open, sys_open)
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
in="$1"
out="$2"
syscall_macro() {
abi="$1"
nr="$2"
entry="$3"
# Entry can be either just a function name or "function/qualifier"
real_entry="${entry%%/*}"
if [ "$entry" = "$real_entry" ]; then
qualifier=
else
qualifier=${entry#*/}
fi
# 生成SYSCALL(NR_open, sys_open)
echo "__SYSCALL_${abi}($nr, $real_entry, $qualifier)"
}
emit() {
abi="$1"
nr="$2"
entry="$3"
compat="$4"
umlentry=""
if [ "$abi" = "64" -a -n "$compat" ]; then
echo "a compat entry for a 64-bit syscall makes no sense" >&2
exit 1
fi
# For CONFIG_UML, we need to strip the __x64_sys prefix
if [ "$abi" = "64" -a "${entry}" != "${entry#__x64_sys}" ]; then
umlentry="sys${entry#__x64_sys}"
fi
if [ -z "$compat" ]; then
if [ -n "$entry" -a -z "$umlentry" ]; then
syscall_macro "$abi" "$nr" "$entry"
elif [ -n "$umlentry" ]; then # implies -n "$entry"
echo "#ifdef CONFIG_X86"
syscall_macro "$abi" "$nr" "$entry"
echo "#else /* CONFIG_UML */"
syscall_macro "$abi" "$nr" "$umlentry"
echo "#endif"
fi
else
echo "#ifdef CONFIG_X86_32"
if [ -n "$entry" ]; then
syscall_macro "$abi" "$nr" "$entry"
fi
echo "#else"
syscall_macro "$abi" "$nr" "$compat"
echo "#endif"
fi
}
grep '^[0-9]' "$in" | sort -n | (
while read nr abi name entry compat; do
abi=`echo "$abi" | tr '[a-z]' '[A-Z]'`
if [ "$abi" = "COMMON" -o "$abi" = "64" ]; then
# COMMON is the same as 64, except that we don't expect X32
# programs to use it. Our expectation has nothing to do with
# any generated code, so treat them the same.
emit 64 "$nr" "$entry" "$compat"
elif [ "$abi" = "X32" ]; then
# X32 is equivalent to 64 on an X32-compatible kernel.
echo "#ifdef CONFIG_X86_X32_ABI"
emit 64 "$nr" "$entry" "$compat"
echo "#endif"
elif [ "$abi" = "I386" ]; then
emit "$abi" "$nr" "$entry" "$compat"
else
echo "Unknown abi $abi" >&2
exit 1
fi
done
) > "$out"
生成输出文件,建立系统调用号和系统调用实现函数之间的对应关系。
根据syscall_32.tbl
生成unistd_32.h
,位置:arch\sh\include\uapi\asm\unistd_32.h
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef __ASM_SH_UNISTD_32_H
#define __ASM_SH_UNISTD_32_H
/*
* Copyright (C) 1999 Niibe Yutaka
*/
/*
* This file contains the system call numbers.
*/
#define __NR_restart_syscall 0
#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
#define __NR_open 5
#define __NR_close 6
#define __NR_waitpid 7
#define __NR_creat 8
#define __NR_link 9
#define __NR_unlink 10
#define __NR_execve 11
#define __NR_chdir 12
#define __NR_time 13
#define __NR_mknod 14
#define __NR_chmod 15
#define __NR_lchown 16
/* ...... */
根据syscall_64.tbl
生成unistd_64.h
,位置:arch\sh\include\uapi\asm\unistd_64.h
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef __ASM_SH_UNISTD_64_H
#define __ASM_SH_UNISTD_64_H
/*
* include/asm-sh/unistd_64.h
*
* This file contains the system call numbers.
*
* Copyright (C) 2000, 2001 Paolo Alberelli
* Copyright (C) 2003 - 2007 Paul Mundt
* Copyright (C) 2004 Sean McGoogan
*
* This file is subject to the terms and conditions of the GNU General Public
* License. See the file "COPYING" in the main directory of this archive
* for more details.
*/
#define __NR_restart_syscall 0
#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
#define __NR_open 5
#define __NR_close 6
#define __NR_waitpid 7
#define __NR_creat 8
#define __NR_link 9
#define __NR_unlink 10
#define __NR_execve 11
#define __NR_chdir 12
#define __NR_time 13
#define __NR_mknod 14
#define __NR_chmod 15
#define __NR_lchown 16
/* ...... */
位置:arch\x86\entry\syscall_32.c
源码:
// SPDX-License-Identifier: GPL-2.0
/* System call table for i386. */
/* ...... */
__visible const sys_call_ptr_t ia32_sys_call_table[__NR_syscall_compat_max+1] = {
/*
* Smells like a compiler bug -- it doesn't work
* when the & below is removed.
*/
[0 ... __NR_syscall_compat_max] = &sys_ni_syscall,
#include <asm/syscalls_32.h>
位置:arch\x86\entry\syscalls\syscall_64.tbl
源码:与32位对比,其系统调用号、abi协议是不一样的,且没有兼容入口点
#
# 64-bit system call numbers and entry vectors
#
# The format is:
# <number> <abi> <name> <entry point>
#
# The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
#
# The abi is "common", "64" or "x32" for this file.
#
0 common read __x64_sys_read
1 common write __x64_sys_write
2 common open __x64_sys_open
3 common close __x64_sys_close
4 common stat __x64_sys_newstat
5 common fstat __x64_sys_newfstat
6 common lstat __x64_sys_newlstat
7 common poll __x64_sys_poll
8 common lseek __x64_sys_lseek
9 common mmap __x64_sys_mmap
10 common mprotect __x64_sys_mprotect
11 common munmap __x64_sys_munmap
## ......
位置:include\linux\syscalls.h
源码:32位与64位一致
/* __ARCH_WANT_SYSCALL_NO_AT */
asmlinkage long sys_open(const char __user *filename,
int flags, umode_t mode);
asmlinkage long sys_link(const char __user *oldname,
const char __user *newname);
asmlinkage long sys_unlink(const char __user *pathname);
以open函数为例子:
位置:fs\open.c
源码:32位与64位一致
/* ...... */
SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
{
if (force_o_largefile())
flags |= O_LARGEFILE;
return do_sys_open(AT_FDCWD, filename, flags, mode);
}
SYSCALL_DEFINE4(openat, int, dfd, const char __user *, filename, int, flags,
umode_t, mode)
{
if (force_o_largefile())
flags |= O_LARGEFILE;
return do_sys_open(dfd, filename, flags, mode);
}
/* ...... */
可以看到其形式十分奇怪,通过查看syscalls.h
声明,可以知道是根据参数的数目选择对应的宏
#define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE4(name, ...) SYSCALL_DEFINEx(4, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE_MAXARGS 6
#define SYSCALL_DEFINEx(x, sname, ...) \
SYSCALL_METADATA(sname, x, __VA_ARGS__) \
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
#define __PROTECT(...) asmlinkage_protect(__VA_ARGS__)
/*
* The asmlinkage stub is aliased to a function named __se_sys_*() which
* sign-extends 32-bit ints to longs whenever needed. The actual work is
* done within __do_sys_*().
*/
#ifndef __SYSCALL_DEFINEx
#define __SYSCALL_DEFINEx(x, name, ...) \
__diag_push(); \
__diag_ignore(GCC, 8, "-Wattribute-alias", \
"Type aliasing is used to sanitize syscall arguments");\
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
__attribute__((alias(__stringify(__se_sys##name)))); \
ALLOW_ERROR_INJECTION(sys##name, ERRNO); \
static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
{ \
long ret = __do_sys##name(__MAP(x,__SC_CAST,__VA_ARGS__));\
__MAP(x,__SC_TEST,__VA_ARGS__); \
__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \
return ret; \
} \
__diag_pop(); \
static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
#endif /* __SYSCALL_DEFINEx */
宏展开后,SYSCALL_DEFINE3
得到其具体实现如下:
asmlinkage long sys_open(const char __user * filename, int flags, int mode)
{
long ret;
if (force_o_largefile())
flags |= O_LARGEFILE;
ret = do_sys_open(AT_FDCWD, filename, flags, mode);
asmlinkage_protect(3, ret, filename, flags, mode);
return ret;
}
利用arch/x86/entry/syscalls/Makefile
# SPDX-License-Identifier: GPL-2.0
out := arch/$(SRCARCH)/include/generated/asm #输出文件地址
uapi := arch/$(SRCARCH)/include/generated/uapi/asm
# 如果当前没有创建,则建立输出文件
_dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)') \
$(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)')
# 所需要的定义文件
syscall32 := $(srctree)/$(src)/syscall_32.tbl # 32位系统使用指定目录下syscall_32.tbl
syscall64 := $(srctree)/$(src)/syscall_64.tbl # 64位系统使用指定目录下syscall_64.tbl
# 所需要脚本文件的地址
syshdr := $(srctree)/$(src)/syscallhdr.sh
systbl := $(srctree)/$(src)/syscalltbl.sh
quiet_cmd_syshdr = SYSHDR $@
cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@' \
'$(syshdr_abi_$(basetarget))' \
'$(syshdr_pfx_$(basetarget))' \
'$(syshdr_offset_$(basetarget))'
quiet_cmd_systbl = SYSTBL $@
cmd_systbl = $(CONFIG_SHELL) '$(systbl)' $< $@
quiet_cmd_hypercalls = HYPERCALLS $@
cmd_hypercalls = $(CONFIG_SHELL) '$<' $@ $(filter-out $<,$^)
# 所需要依赖与指定的协议
syshdr_abi_unistd_32 := i386
$(uapi)/unistd_32.h: $(syscall32) $(syshdr)
$(call if_changed,syshdr)
syshdr_abi_unistd_32_ia32 := i386
syshdr_pfx_unistd_32_ia32 := ia32_
$(out)/unistd_32_ia32.h: $(syscall32) $(syshdr)
$(call if_changed,syshdr)
syshdr_abi_unistd_x32 := common,x32
syshdr_offset_unistd_x32 := __X32_SYSCALL_BIT
$(uapi)/unistd_x32.h: $(syscall64) $(syshdr)
$(call if_changed,syshdr)
syshdr_abi_unistd_64 := common,64
$(uapi)/unistd_64.h: $(syscall64) $(syshdr)
$(call if_changed,syshdr)
syshdr_abi_unistd_64_x32 := x32
syshdr_pfx_unistd_64_x32 := x32_
# 输出文件名与地址
$(out)/unistd_64_x32.h: $(syscall64) $(syshdr)
$(call if_changed,syshdr)
$(out)/syscalls_32.h: $(syscall32) $(systbl)
$(call if_changed,systbl)
$(out)/syscalls_64.h: $(syscall64) $(systbl)
$(call if_changed,systbl)
$(out)/xen-hypercalls.h: $(srctree)/scripts/xen-hypercalls.sh
$(call if_changed,hypercalls)
$(out)/xen-hypercalls.h: $(srctree)/include/xen/interface/xen*.h
# 建立联系并生成输出文件
uapisyshdr-y += unistd_32.h unistd_64.h unistd_x32.h
syshdr-y += syscalls_32.h
syshdr-$(CONFIG_X86_64) += unistd_32_ia32.h unistd_64_x32.h
syshdr-$(CONFIG_X86_64) += syscalls_64.h
syshdr-$(CONFIG_XEN) += xen-hypercalls.h
targets += $(uapisyshdr-y) $(syshdr-y)
PHONY += all
all: $(addprefix $(uapi)/,$(uapisyshdr-y))
all: $(addprefix $(out)/,$(syshdr-y))
@:
依赖两个脚本
第一个脚本arch/x86/entry/syscalls/syscallhdr.sh
,会在文件中生成#define NR_open
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
in="$1"
out="$2"
my_abis=`echo "($3)" | tr ',' '|'`
prefix="$4"
offset="$5"
fileguard=_ASM_X86_`basename "$out" | sed \
-e 'y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/' \
-e 's/[^A-Z0-9_]/_/g' -e 's/__/_/g'`
grep -E "^[0-9A-Fa-fXx]+[[:space:]]+${my_abis}" "$in" | sort -n | (
echo "#ifndef ${fileguard}"
echo "#define ${fileguard} 1"
echo ""
# 生成 #define NR_open
while read nr abi name entry ; do
if [ -z "$offset" ]; then
echo "#define __NR_${prefix}${name} $nr"
else
echo "#define __NR_${prefix}${name} ($offset + $nr)"
fi
done
echo ""
echo "#endif /* ${fileguard} */"
) > "$out"
第二个脚本arch/x86/entry/syscalls/syscalltbl.sh
,会在文件中生成SYSCALL(NR_open, sys_open)
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
in="$1"
out="$2"
syscall_macro() {
abi="$1"
nr="$2"
entry="$3"
# Entry can be either just a function name or "function/qualifier"
real_entry="${entry%%/*}"
if [ "$entry" = "$real_entry" ]; then
qualifier=
else
qualifier=${entry#*/}
fi
# 生成SYSCALL(NR_open, sys_open)
echo "__SYSCALL_${abi}($nr, $real_entry, $qualifier)"
}
emit() {
abi="$1"
nr="$2"
entry="$3"
compat="$4"
umlentry=""
if [ "$abi" = "64" -a -n "$compat" ]; then
echo "a compat entry for a 64-bit syscall makes no sense" >&2
exit 1
fi
# For CONFIG_UML, we need to strip the __x64_sys prefix
if [ "$abi" = "64" -a "${entry}" != "${entry#__x64_sys}" ]; then
umlentry="sys${entry#__x64_sys}"
fi
if [ -z "$compat" ]; then
if [ -n "$entry" -a -z "$umlentry" ]; then
syscall_macro "$abi" "$nr" "$entry"
elif [ -n "$umlentry" ]; then # implies -n "$entry"
echo "#ifdef CONFIG_X86"
syscall_macro "$abi" "$nr" "$entry"
echo "#else /* CONFIG_UML */"
syscall_macro "$abi" "$nr" "$umlentry"
echo "#endif"
fi
else
echo "#ifdef CONFIG_X86_32"
if [ -n "$entry" ]; then
syscall_macro "$abi" "$nr" "$entry"
fi
echo "#else"
syscall_macro "$abi" "$nr" "$compat"
echo "#endif"
fi
}
grep '^[0-9]' "$in" | sort -n | (
while read nr abi name entry compat; do
abi=`echo "$abi" | tr '[a-z]' '[A-Z]'`
if [ "$abi" = "COMMON" -o "$abi" = "64" ]; then
# COMMON is the same as 64, except that we don't expect X32
# programs to use it. Our expectation has nothing to do with
# any generated code, so treat them the same.
emit 64 "$nr" "$entry" "$compat"
elif [ "$abi" = "X32" ]; then
# X32 is equivalent to 64 on an X32-compatible kernel.
echo "#ifdef CONFIG_X86_X32_ABI"
emit 64 "$nr" "$entry" "$compat"
echo "#endif"
elif [ "$abi" = "I386" ]; then
emit "$abi" "$nr" "$entry" "$compat"
else
echo "Unknown abi $abi" >&2
exit 1
fi
done
) > "$out"
生成输出文件,建立系统调用号和系统调用实现函数之间的对应关系。
根据syscall_32.tbl
生成unistd_32.h
,位置:arch\sh\include\uapi\asm\unistd_32.h
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef __ASM_SH_UNISTD_32_H
#define __ASM_SH_UNISTD_32_H
/*
* Copyright (C) 1999 Niibe Yutaka
*/
/*
* This file contains the system call numbers.
*/
#define __NR_restart_syscall 0
#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
#define __NR_open 5
#define __NR_close 6
#define __NR_waitpid 7
#define __NR_creat 8
#define __NR_link 9
#define __NR_unlink 10
#define __NR_execve 11
#define __NR_chdir 12
#define __NR_time 13
#define __NR_mknod 14
#define __NR_chmod 15
#define __NR_lchown 16
/* ...... */
根据syscall_64.tbl
生成unistd_64.h
,位置:arch\sh\include\uapi\asm\unistd_64.h
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef __ASM_SH_UNISTD_64_H
#define __ASM_SH_UNISTD_64_H
/*
* include/asm-sh/unistd_64.h
*
* This file contains the system call numbers.
*
* Copyright (C) 2000, 2001 Paolo Alberelli
* Copyright (C) 2003 - 2007 Paul Mundt
* Copyright (C) 2004 Sean McGoogan
*
* This file is subject to the terms and conditions of the GNU General Public
* License. See the file "COPYING" in the main directory of this archive
* for more details.
*/
#define __NR_restart_syscall 0
#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
#define __NR_open 5
#define __NR_close 6
#define __NR_waitpid 7
#define __NR_creat 8
#define __NR_link 9
#define __NR_unlink 10
#define __NR_execve 11
#define __NR_chdir 12
#define __NR_time 13
#define __NR_mknod 14
#define __NR_chmod 15
#define __NR_lchown 16
/* ...... */
位置:arch\x86\entry\syscall_32.c
源码:
// SPDX-License-Identifier: GPL-2.0
/* System call table for i386. */
/* ...... */
#define __SYSCALL_64(nr, sym, qual) [nr] = sym,
asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = {
/*
* Smells like a compiler bug -- it doesn't work
* when the & below is removed.
*/
[0 ... __NR_syscall_max] = &sys_ni_syscall,
#include <asm/syscalls_64.h>
};
用户进程调用 open 函数
glibc 的 syscal.list
列出 glibc 函数对应的系统调用
glibc 的脚本 make_syscall.sh
根据 syscal.list
生成对应的宏定义(函数映射到系统调用)
glibc 的 syscal-template.S
使用这些宏, 定义了系统调用的调用方式(也是通过宏)
其中会调用 DO_CALL
(也是一个宏), 32位与 64位实现不同
32位 DO_CALL
(位于 i386 目录下 sysdep.h)
将调用参数放入寄存器中, 由系统调用名得到系统调用号, 放入 eax
执行 ENTER_KERNEL
(一个宏), 对应int $0x80
触发软中断, 进入内核
调用软中断处理函数 entry_INT80_32
(内核启动时, 由 trap_init()
配置)
entry_INT80_32
将用户态寄存器存入 pt_regs
中(保存现场以及系统调用参数), 调用 do_syscall_32_iraq_on
do_syscall_32_iraq_on
从 pt_regs
中取系统调用号(eax), 从系统调用表得到对应实现函数, 取 pt_regs
中存储的参数, 调用系统调用
entry_INT80_32
调用 INTERRUPT_RUTURN
(一个宏)对应iret
指令, 系统调用结果存在pt_regs
的 eax 位置, 根据 pt_regs 恢复用户态进程
64位 DO_CALL
(位于 x86_64 目录下 sysdep.h)
通过系统调用名得到系统调用号, 存入 rax; 不同中断, 执行 syscall
指令
MSR(特殊模块寄存器), 辅助完成某些功能(包括系统调用)
trap_init()
会调用 cpu_init->syscall_init
设置该寄存器
syscall
从 MSR 寄存器中, 拿出函数地址进行调用, 即调用 entry_SYSCALL_64
entry_SYSCALL_64
先保存用户态寄存器到 pt_regs
中
调用 entry_SYSCALL64_slow_pat->do_syscall_64
do_syscall
_64 从 rax 取系统调用号, 从系统调用表得到对应实现函数, 取 pt_regs
中存储的参数, 调用系统调用
返回执行 USERGS_SYSRET64
(一个宏), 对应执行 swapgs
和 sysretq
指令; 系统调用结果存在 pt_regs
的 ax 位置, 根据 pt_regs
恢复用户态进程
系统调用表 sys_call_table
32位 定义在 arch/x86/entry/syscalls/syscall_32.tbl
64位 定义在 arch/x86/entry/syscalls/syscall_64.tbl
syscall_*.tbl
内容包括: 系统调用号, 系统调用名, 内核实现函数名(以 sys 开头)
内核实现函数的声明: include/linux/syscall.h
内核实现函数的实现: 某个 .c 文件, 例如 sys_open
的实现在 fs/open.c
.c 文件中, 以宏的方式替代函数名, 用多层宏构建函数头
编译过程中, 通过 syscall_*.tbl
生成 unistd_*.h
文件
unistd_*.h
包含系统调用与实现函数的对应关系
syscall_*.h include 了 unistd_*.h
头文件, 并定义了系统调用表(数组)
文章浏览阅读6.2k次,点赞5次,收藏53次。MD04是SAP运行MRP后的结果查询事务码,其他功能强大,标准功能比如:查询/更改物料主数据、转换计划订单到生产订单/采购申请、查询/更改各个MRP元素对应的单据等等。而且还可以附加标准的菜单或者自定义添加事务码上去。老铁将做一个MD04详细说明系列,本篇为第一篇之MRP元素说明。 我们来看看MD04基本界面可以看到主要的数据显示界面,有日期、MRP元素、MRP元素数据、再计划日期、收货/需求、可用数量、工厂、库存地点等栏位。光标定位到MRP元素列,按F1即可看到各类MRP元素的说..._sap md04
文章浏览阅读929次。作者丨qiuqiuqiu@知乎(已授权)来源 | https://zhuanlan.zhihu.com/p/400474142编辑 | AI约读社Yolo-FastestV2简单、快速、..._yolo fastestv2
文章浏览阅读604次。进入公司后,如果项目分布范围比较广,一般会有测试分析师对不同项目做不同的测试策略。功能测试阶段需要的基础输入文档有:《需求规格说明书》、《产品概要设计》《产品详细设计》《设计文档》、设计图、项目代码、构建版本等。功能测试的输出物有:《测试内部草稿》(仅作为测试团队内部使用、成员协同编辑)《测试计划》《测试用例》《缺陷报告》《功能测试报告》《测试总结》《测试知识库》修订版(仅作为测试团队内部使用)等。其中,功能测试计划可以参考《一份标准的测试计划包含哪些要素》中的关键内容元素;测试计划完成后需要由不同角_功能验收测试过程中,通过()对测试用例执行过程及发现的缺陷进行管理
文章浏览阅读787次。vscode新手注意事项(字体间隔,报错提示波浪线,头文件路径,opencv头文件路径)一.字体空格刚安装vscode,不设置字体的话,字体间的间隔会很难受,需要进行如下配置。在 设置->首选项 选择 文本编辑器->字体 ,将“FONT Family ”选项修改成如下。Consolas,Consolas,monospace,Consolas二.报错提示波浪线vscode对代码进行错误提示,进行如下配置。(1)在设置->首选项 搜索errorsquiggles._vscode 插件里间距错误提醒是哪个
文章浏览阅读171次。云服务器上划虚拟主机 内容精选换一换您可以为需要容灾的云服务器在指定的保护组下创建保护实例。在当前的生产站点遇到不可抗力导致大规模服务器故障时,您可以调用保护组的操作接口进行故障切换,从而确保保护实例上运行的业务正常连续。为每一个需要复制的服务器挑选一个保护组,并创建一个保护实例。创建保护实例过程中,会在保护组的容灾站点创建对应的服务器和磁盘,服务器规格可根据需要进行选择,运行在专属主机和普通EC...
文章浏览阅读2.3k次。0240 计算机维修技术- M7 T, _$ E0 ef4 E1.[单选题]评定主板的性能首先要看()。. a4 k, f/ F% N0 C/ o4 d奥鹏作业答案可以联系QQ 7612960214 d' d: qk, l( H$ kA.C.CPU6 |, q" c! V# q1 R- i& eB.内存- z) K* M3 P: H2 {/ R0 bC.主板结构' n1 E2 ..._微型计算机常见有哪些故关型?并举例说明灰尘对微机设备会产生哪些故障关型与危害
文章浏览阅读498次。1.dropout是为了防止过拟合而使用的;Dropout这个概念已经推出4年了,它的详细描述见论文。可是呢,它仿佛是个犹抱琵琶半遮面的美女,难以捉摸!!许多文献都对dropout有过描述,但解释的含糊不清,这里呢,我也不打算解释清楚,只是通过tensorflow来看一看dropout的运行机理。文章分两部分,第一部分介绍tensorflow中的dropout函数,第二部分是我的..._对tensor dropout
文章浏览阅读1.6k次,点赞3次,收藏3次。Fayson的github: https://github.com/fayson/cdhproject推荐关注微信公众号:“Hadoop实操”,ID:gh_c4c535955d0f1 文档编写目的Fayson在前面的文章中介绍过什么是Spark Thrift,Spark Thrift的缺陷,以及Spark Thrift在CDH5中的使用情况,参考《0643-Spark SQL Thrif..._spark sql nosuchmethoderror: org.apache.hadoop.hive.ql.session.sessionstate$
文章浏览阅读1.3k次。(一个非素数x,一定可以表示成两个数(除了0和x本身以外)的乘积,这两个数必然有一个小于等于x的平方根,故可以使用 sqrt函数去求素数和)_构建质数表的函数
文章浏览阅读310次。首先你得先安装Ubuntu操作系统(我是在VMWare14中安装的Ubuntu18.10版本)。阿里镜像:https://opsx.alibaba.com/mirror我这里下载的文件为:ubuntu-18.10-desktop-amd64.isoVMWare安装Ubuntu18.10过程省略…打开Ubuntu虚拟机,打开火狐浏览器,输入网址下载QT5.12(linux版本,约..._qt5 镜像 ubuntu18
文章浏览阅读2.1k次。打包下载: Android面试题带答案.doc(108.5 KB, 下载次数: 2126) 2012-1-11 11:20 上传点击文件名下载附件 下载积分: 下载豆 -1 Android面试题1. 下列哪些语句关于内存回收的说明是正确的? (b ) A、 程序员必须创建一个线程来释放内存 B、 内存回收程序负责释放无用内_csdn android 移动软件开发 面试题
文章浏览阅读10w+次,点赞105次,收藏254次。有的人认为,前端很好学,后端不好学。也有的人认为,前端不好学,后端好学,归根到底还得看个人兴趣。前端和后端做简单的叙述后端:入门难,深入更难,枯燥乏味,没有太大成就感,看一堆业务逻辑代码。前端:入门简单,先易后难,能看到自己做出来的展示界面,有成就感。前端和后端两者工作的内容和负责的东西是完全的不同01展示的方式不同前端指的是用户..._后端