Skip to content

Conversation

@kernel-patches-daemon-bpf
Copy link

Pull request for series with
subject: bpf: Fix FIONREAD and copied_seq issues
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024224

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 4722981
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024224
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 7dc211c
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024224
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: ec12ab2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024224
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: d6ec090
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024224
version: 1

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: d6ec090
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024224
version: 1

A socket using sockmap has its own independent receive queue: ingress_msg.
This queue may contain data from its own protocol stack or from other
sockets.

The issue is that when reading from ingress_msg, we update tp->copied_seq
by default. However, if the data is not from its own protocol stack,
tcp->rcv_nxt is not increased. Later, if we convert this socket to a
native socket, reading from this socket may fail because copied_seq might
be significantly larger than rcv_nxt.

This fix also addresses the syzkaller-reported bug referenced in the
Closes tag.

This patch marks the skmsg objects in ingress_msg. When reading, we update
copied_seq only if the data is from its own protocol stack.

                                                     FD1:read()
                                                     --  FD1->copied_seq++
                                                         |  [read data]
                                                         |
                                [enqueue data]           v
                  [sockmap]     -> ingress to self ->  ingress_msg queue
FD1 native stack  ------>                                 ^
-- FD1->rcv_nxt++               -> redirect to other      | [enqueue data]
                                       |                  |
                                       |             ingress to FD1
                                       v                  ^
                                      ...                 |  [sockmap]
                                                     FD2 native stack

Closes: https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983
Fixes: 04919be ("tcp: Introduce tcp_read_skb()")
Signed-off-by: Jiayuan Chen <[email protected]>
A socket using sockmap has its own independent receive queue: ingress_msg.
This queue may contain data from its own protocol stack or from other
sockets.

Therefore, for sockmap, relying solely on copied_seq and rcv_nxt to
calculate FIONREAD is not enough.

This patch adds a new ingress_size field in the psock structure to record
the data length in ingress_msg. Additionally, we implement new ioctl
interfaces for TCP and UDP to intercept FIONREAD operations. While Unix
and VSOCK also support sockmap and have similar FIONREAD calculation
issues, fixing them would require more extensive changes
(please let me know if modifications are needed). I believe it's not
appropriate to include those changes under this fix patch.

Previous work by John Fastabend made some efforts towards FIONREAD support:
commit e5c6de5 ("bpf, sockmap: Incorrectly handling copied_seq")
Although the current patch is based on the previous work by John Fastabend,
it is acceptable for our Fixes tag to point to the same commit.

                                                     FD1:read()
                                                     --  FD1->copied_seq++
                                                         |  [read data]
                                                         |
                                [enqueue data]           v
                  [sockmap]     -> ingress to self ->  ingress_msg queue
FD1 native stack  ------>                                 ^
-- FD1->rcv_nxt++               -> redirect to other      | [enqueue data]
                                       |                  |
                                       |             ingress to FD1
                                       v                  ^
                                      ...                 |  [sockmap]
                                                     FD2 native stack

Fixes: 04919be ("tcp: Introduce tcp_read_skb()")
Signed-off-by: Jiayuan Chen <[email protected]>
This commit adds two new test functions: one to reproduce the bug reported
by syzkaller [1], and another to cover the calculation of copied_seq.

The tests primarily involve installing  and uninstalling sockmap on
sockets, then reading data to verify proper functionality.

Additionally, extend the do_test_sockmap_skb_verdict_fionread() function
to support UDP FIONREAD testing.

[1] https://syzkaller.appspot.com/bug?extid=06dbd397158ec0ea4983

Signed-off-by: Jiayuan Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants