Skip to content

Conversation

@kernel-patches-daemon-bpf
Copy link

Pull request for series with
subject: bpf trampoline support "jmp" mode
version: 3
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024844

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 7dc211c
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024844
version: 3

@kernel-patches-review-bot
Copy link

commit dc216e32a9fd39212bcfaaf8a2e88100b709d61a
Author: Menglong Dong <[email protected]>

ftrace: introduce FTRACE_OPS_FL_JMP

This commit adds infrastructure for using JMP instead of CALL instructions
for ftrace hooks to avoid RSB corruption. It introduces a flag and helper
functions that encode the JMP bit in the LSB of direct call addresses.

> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 07f8c309e..015dd1049 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h

[ ... ]

> @@ -577,6 +578,38 @@ static inline void arch_ftrace_set_direct_caller(struct ftrace_regs *fregs,
>  						 unsigned long addr) { }
>  #endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */
>
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_JMP
> +static inline bool ftrace_is_jmp(unsigned long addr)
> +{
> +	return addr & 1;
> +}
> +
> +static inline unsigned long ftrace_jmp_set(unsigned long addr)
> +{
> +	return addr | 1UL;
> +}
> +
> +static inline unsigned long ftrace_jmp_get(unsigned long addr)
> +{
> +	return addr & ~1UL;
> +}
> +#else
> +static inline bool ftrace_is_jmp(unsigned long addr)
> +{
> +	return false;
> +}
> +
> +static inline unsigned long ftrace_jmp_set(unsigned long addr)
> +{
> +	return addr;
> +}
> +
> +static inline unsigned long ftrace_jmp_get(unsigned long addr)
> +{
> +	return addr;
> +}
> +#endif /* CONFIG_DYNAMIC_FTRACE_WITH_JMP */

[ ... ]

> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 59cfacb8a..bbb37c0f8 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c

[ ... ]

> @@ -5951,7 +5951,8 @@ static void remove_direct_functions_hash(struct ftrace_hash *hash, unsigned long
>  	for (i = 0; i < size; i++) {
>  		hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
>  			del = __ftrace_lookup_ip(direct_functions, entry->ip);
> -			if (del && del->direct == addr) {
> +			if (del && ftrace_jmp_get(del->direct) ==
> +				   ftrace_jmp_get(addr)) {
>  				remove_hash_entry(direct_functions, del);
>  				kfree(del);
>  			}

[ ... ]

> @@ -6016,8 +6017,15 @@ int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
>  	if (ftrace_hash_empty(hash))
>  		return -EINVAL;
>
> +	/* This is a "raw" address, and this should never happen. */
> +	if (WARN_ON_ONCE(ftrace_is_jmp(addr)))
> +		return -EINVAL;
> +
>  	mutex_lock(&direct_mutex);
>
> +	if (ops->flags & FTRACE_OPS_FL_JMP)
> +		addr = ftrace_jmp_set(addr);
> +
>  	/* Make sure requested entries are not already registered.. */

[ ... ]

>  	/* ... and add the new entries */
>  	size = 1 << hash->size_bits;
>  	for (i = 0; i < size; i++) {
>  		hlist_for_each_entry(entry, &hash->buckets[i], hlist) {
>  			new = add_hash_entry(new_hash, entry->ip);
>  			if (!new)
>  				goto out_unlock;
>  			/* Update both the copy and the hash entry */
>  			new->direct = addr;
>  			entry->direct = addr;
>  		}
>  	}

After addr is potentially encoded with ftrace_jmp_set(), it gets stored in
entry->direct. However, ftrace_find_rec_direct() returns entry->direct
without masking:

unsigned long ftrace_find_rec_direct(unsigned long ip)
{
	struct ftrace_func_entry *entry;

	entry = __ftrace_lookup_ip(direct_functions, ip);
	if (!entry)
		return 0;

	return entry->direct;
}

This encoded address flows to two consumers:

1. ftrace_get_addr_new() and ftrace_get_addr_curr() return it directly,
   which eventually reaches architecture-specific ftrace_call_replace().
   Can the architecture code handle addresses with LSB set?

2. The debug output at kernel/trace/ftrace.c in ftrace_enabled_show():
   seq_printf(m, "\n\tdirect-->%pS", (void *)direct);

   Won't this print the wrong symbol due to the LSB being set?

>  	ops->func = call_direct_funcs;
>  	ops->flags = MULTI_FLAGS;
>  	ops->trampoline = FTRACE_REGS_ADDR;
>  	ops->direct_call = addr;

Similarly, ops->direct_call stores the encoded address. In
call_direct_funcs():

static void call_direct_funcs(unsigned long ip, unsigned long pip,
			      struct ftrace_ops *ops, struct ftrace_regs *fregs)
{
	unsigned long addr = READ_ONCE(ops->direct_call);

	if (!addr)
		return;

	arch_ftrace_set_direct_caller(fregs, addr);
}

The encoded address is passed directly to arch_ftrace_set_direct_caller()
without masking. Looking at arch implementations like x86:

static inline void
__arch_ftrace_set_direct_caller(struct pt_regs *regs, unsigned long addr)
{
	/* Emulate a call */
	regs->orig_ax = addr;
}

Does arch_ftrace_set_direct_caller() expect addresses with the LSB set?
The implementations directly store the address into registers without any
masking.

The commit message says "we can tell if we should use 'jmp' for the
callback in ftrace_call_replace()", but no architecture code is updated
to check the LSB. Should ftrace_find_rec_direct() and call_direct_funcs()
mask the JMP bit before returning addresses to architecture code?

> @@ -6136,8 +6146,13 @@ __modify_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
>
>  	lockdep_assert_held_once(&direct_mutex);
>
> +	/* This is a "raw" address, and this should never happen. */
> +	if (WARN_ON_ONCE(ftrace_is_jmp(addr)))
> +		return -EINVAL;
> +
> +	if (ops->flags & FTRACE_OPS_FL_JMP)
> +		addr = ftrace_jmp_set(addr);

Same encoding happens here with the same flow issues.



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: ftrace: introduce FTRACE_OPS_FL_JMP
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/19466635856

@kernel-patches-daemon-bpf
Copy link
Author

Forwarding comment 3547646897 via email
In-Reply-To: [email protected]
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: ec12ab2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024844
version: 3

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: d6ec090
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024844
version: 3

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: d6ec090
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024844
version: 3

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: d088da9
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024844
version: 3

For now, the "nop" will be replaced with a "call" instruction when a
function is hooked by the ftrace. However, sometimes the "call" can break
the RSB and introduce extra overhead. Therefore, introduce the flag
FTRACE_OPS_FL_JMP, which indicate that the ftrace_ops should be called
with a "jmp" instead of "call". For now, it is only used by the direct
call case.

When a direct ftrace_ops is marked with FTRACE_OPS_FL_JMP, the last bit of
the ops->direct_call will be set to 1. Therefore, we can tell if we should
use "jmp" for the callback in ftrace_call_replace().

Signed-off-by: Menglong Dong <[email protected]>
Implement the DYNAMIC_FTRACE_WITH_JMP for x86_64. In ftrace_call_replace,
we will use JMP32_INSN_OPCODE instead of CALL_INSN_OPCODE if the address
should use "jmp".

Meanwhile, adjust the direct call in the ftrace_regs_caller. The RSB is
balanced in the "jmp" mode. Take the function "foo" for example:

 original_caller:
 call foo -> foo:
         call fentry -> fentry:
                 [do ftrace callbacks ]
                 move tramp_addr to stack
                 RET -> tramp_addr
                         tramp_addr:
                         [..]
                         call foo_body -> foo_body:
                                 [..]
                                 RET -> back to tramp_addr
                         [..]
                         RET -> back to original_caller

Signed-off-by: Menglong Dong <[email protected]>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: e0940c6
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024844
version: 3

Some places calculate the origin_call by checking if
BPF_TRAMP_F_SKIP_FRAME is set. However, it should use
BPF_TRAMP_F_ORIG_STACK for this propose. Just fix them.

Signed-off-by: Menglong Dong <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
In the origin call case, if BPF_TRAMP_F_SKIP_FRAME is not set, it means
that the trampoline is not called, but "jmp".

Introduce the function bpf_trampoline_use_jmp() to check if the trampoline
is in "jmp" mode.

Do some adjustment on the "jmp" mode for the x86_64. The main adjustment
that we make is for the stack parameter passing case, as the stack
alignment logic changes in the "jmp" mode without the "rip". What's more,
the location of the parameters on the stack also changes.

Signed-off-by: Menglong Dong <[email protected]>
In the origin logic, the bpf_arch_text_poke() assume that the old and new
instructions have the same opcode. However, they can have different opcode
if we want to replace a "call" insn with a "jmp" insn.

Therefore, add the new function parameter "old_t" along with the "new_t",
which are used to indicate the old and new poke type. Meanwhile, adjust
the implement of bpf_arch_text_poke() for all the archs.

"BPF_MOD_NOP" is added to make the code more readable. In
bpf_arch_text_poke(), we still check if the new and old address is NULL to
determine if nop insn should be used, which I think is more safe.

Signed-off-by: Menglong Dong <[email protected]>
Implement the "jmp" mode for the bpf trampoline. For the ftrace_managed
case, we need only to set the FTRACE_OPS_FL_JMP on the tr->fops if "jmp"
is needed.

For the bpf poke case, we will check the origin poke type with the
"origin_flags", and current poke type with "tr->flags". The function
bpf_trampoline_update_fentry() is introduced to do the job.

The "jmp" mode will only be enabled with CONFIG_DYNAMIC_FTRACE_WITH_JMP
enabled and BPF_TRAMP_F_SHARE_IPMODIFY is not set. With
BPF_TRAMP_F_SHARE_IPMODIFY, we need to get the origin call ip from the
stack, so we can't use the "jmp" mode.

Signed-off-by: Menglong Dong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants