Skip to content

Commit a432c77

Browse files
committed
Merge branch 'sctp-packetization-path-MTU'
Xin Long says: ==================== sctp: implement RFC8899: Packetization Layer Path MTU Discovery for SCTP transport Overview(From RFC8899): In contrast to PMTUD, Packetization Layer Path MTU Discovery (PLPMTUD) [RFC4821] introduces a method that does not rely upon reception and validation of PTB messages. It is therefore more robust than Classical PMTUD. This has become the recommended approach for implementing discovery of the PMTU [BCP145]. It uses a general strategy in which the PL sends probe packets to search for the largest size of unfragmented datagram that can be sent over a network path. Probe packets are sent to explore using a larger packet size. If a probe packet is successfully delivered (as determined by the PL), then the PLPMTU is raised to the size of the successful probe. If a black hole is detected (e.g., where packets of size PLPMTU are consistently not received), the method reduces the PLPMTU. SCTP Probe Packets: As the RFC suggested, the probe packets consist of an SCTP common header followed by a HEARTBEAT chunk and a PAD chunk. The PAD chunk is used to control the length of the probe packet. The HEARTBEAT chunk is used to trigger the sending of a HEARTBEAT ACK chunk to confirm this probe on the HEARTBEAT sender. The HEARTBEAT chunk also carries a Heartbeat Information parameter that includes the probe size to help an implementation associate a HEARTBEAT ACK with the size of probe that was sent. The sender use the nonce and the probe size to verify the information returned. Detailed Implementation on SCTP: +------+ +------->| Base |-----------------+ Connectivity | +------+ | or BASE_PLPMTU | | | confirmation failed | | v | | Connectivity +-------+ | | and BASE_PLPMTU | Error | | | confirmed +-------+ | | | Consistent | v | connectivity Black Hole | +--------+ | and BASE_PLPMTU detected | | Search |<---------------+ confirmed | +--------+ | ^ | | | | | Raise | | Search | timer | | algorithm | expired | | completed | | | | | v | +-----------------+ +---| Search Complete | +-----------------+ When PLPMTUD is enabled, it's in Base state, and starts to probe with BASE_PLPMTU (1200). If this probe succeeds, it goes to Search state; If this probe fails, it goes to Error state under which pl.pmtu goes down to MIN_PLPMTU (512) and keeps probing with BASE_PLPMTU until it succeeds and goes to Search state. During the Search state, the probe size is growing by a Big step (32) every time when the last probe succeeds at the beginning. Once a probe (such as 1420) fails after trying MAX_PROBES (3) times, the probe_size goes back to the last one (1420 - 32 = 1388), meanwhile 'probe_high' is set to 1420 and the growing step becomes a Small one (4). Then the probe is continuing with a Small step grown each round. Until it gets the optimal size (such as 1400) when probe with its next probe size (1404) fails, it sync this size to pathmtu and goes to Complete state. In Complete state, it will only does a probe check for the pathmtu just set, if it fails, which means a Black Hole is detected and it goes back to Base state. If it succeeds, it goes back to Search state again, and probe is continuing with growing a Small step (1400 + 4). If this probe fails, probe_high is set and goes back to 1388 and then Complete state, which is kind of a loop normally. However if the env's pathmtu changes to a big size somehow, this probe will succeed and then probe continues with growing a Big step (1400 + 32) each round until another probe fails. PTB Messages Process: PLPMTUD doesn't rely on these package to find the pmtu, and shouldn't trust it either. When processing them, it only changes the probe_size to PL_PTB_SIZE(info - hlen) if 'pl.pmtu < PL_PTB_SIZE < the current probe_size' druing Search state. As this could help probe_size to get to the optimal size faster, for exmaple: pl.pmtu = 1388, probe_size = 1420, while the env's pathmtu = 1400. When probe_size is 1420, a Toobig packet with 1400 comes back. If probe size changes to use 1400, it will save quite a few rounds to get there. But of course after having this value, PLPMTUD will still verify it on its own before using it. Patches: - Patch 1-6: introduce some new constants/variables from the RFC, systcl and members in transport, APIs for the following patches, chunks and a timer for the probe sending and some codes for the probe receiving. - Patch 7-9: implement the state transition on the tx path, rx path and toobig ICMP packet processing. This is the main algorithm part. - Patch 10: activate this feature - Patch 11-14: improve the process for ICMP packets for SCTP over UDP, so that it can also be covered by this feature. Tests: - do sysctl and setsockopt tests for this feature's enabling and disabling. - get these pr_debug points for this feature by # cat /sys/kernel/debug/dynamic_debug/control | grep PLP and enable them on kernel dynamic debug, then play with the pathmtu and check if the state transition and plpmtu change match the RFC. - do the above tests for SCTP over IPv4/IPv6 and SCTP over UDP. v1->v2: - See Patch 06/14. ==================== Signed-off-by: David S. Miller <[email protected]>
2 parents aff0824 + 9e47df0 commit a432c77

File tree

23 files changed

+779
-127
lines changed

23 files changed

+779
-127
lines changed

Documentation/networking/ip-sysctl.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2834,6 +2834,14 @@ encap_port - INTEGER
28342834

28352835
Default: 0
28362836

2837+
plpmtud_probe_interval - INTEGER
2838+
The time interval (in milliseconds) for sending PLPMTUD probe chunks.
2839+
These chunks are sent at the specified interval with a variable size
2840+
to probe the mtu of a given path between 2 endpoints. PLPMTUD will
2841+
be disabled when 0 is set, and other values for it must be >= 5000.
2842+
2843+
Default: 0
2844+
28372845

28382846
``/proc/sys/net/core/*``
28392847
========================

include/linux/sctp.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ enum sctp_cid {
9898
SCTP_CID_I_FWD_TSN = 0xC2,
9999
SCTP_CID_ASCONF_ACK = 0x80,
100100
SCTP_CID_RECONF = 0x82,
101+
SCTP_CID_PAD = 0x84,
101102
}; /* enum */
102103

103104

@@ -410,6 +411,12 @@ struct sctp_heartbeat_chunk {
410411
};
411412

412413

414+
/* PAD chunk could be bundled with heartbeat chunk to probe pmtu */
415+
struct sctp_pad_chunk {
416+
struct sctp_chunkhdr uh;
417+
};
418+
419+
413420
/* For the abort and shutdown ACK we must carry the init tag in the
414421
* common header. Just the common header is all that is needed with a
415422
* chunk descriptor.

include/net/netns/sctp.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,9 @@ struct netns_sctp {
8484
/* HB.interval - 30 seconds */
8585
unsigned int hb_interval;
8686

87+
/* The interval for PLPMTUD probe timer */
88+
unsigned int probe_interval;
89+
8790
/* Association.Max.Retrans - 10 attempts
8891
* Path.Max.Retrans - 5 attempts (per destination address)
8992
* Max.Init.Retransmits - 8 attempts

include/net/sctp/command.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ enum sctp_verb {
5959
SCTP_CMD_HB_TIMERS_START, /* Start the heartbeat timers. */
6060
SCTP_CMD_HB_TIMER_UPDATE, /* Update a heartbeat timers. */
6161
SCTP_CMD_HB_TIMERS_STOP, /* Stop the heartbeat timers. */
62+
SCTP_CMD_PROBE_TIMER_UPDATE, /* Update a probe timer. */
6263
SCTP_CMD_TRANSPORT_HB_SENT, /* Reset the status of a transport. */
6364
SCTP_CMD_TRANSPORT_IDLE, /* Do manipulations on idle transport */
6465
SCTP_CMD_TRANSPORT_ON, /* Mark the transport as active. */

include/net/sctp/constants.h

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ enum sctp_event_timeout {
7777
SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD,
7878
SCTP_EVENT_TIMEOUT_HEARTBEAT,
7979
SCTP_EVENT_TIMEOUT_RECONF,
80+
SCTP_EVENT_TIMEOUT_PROBE,
8081
SCTP_EVENT_TIMEOUT_SACK,
8182
SCTP_EVENT_TIMEOUT_AUTOCLOSE,
8283
};
@@ -200,6 +201,23 @@ enum sctp_sock_state {
200201
SCTP_SS_CLOSING = TCP_CLOSE_WAIT,
201202
};
202203

204+
enum sctp_plpmtud_state {
205+
SCTP_PL_DISABLED,
206+
SCTP_PL_BASE,
207+
SCTP_PL_SEARCH,
208+
SCTP_PL_COMPLETE,
209+
SCTP_PL_ERROR,
210+
};
211+
212+
#define SCTP_BASE_PLPMTU 1200
213+
#define SCTP_MAX_PLPMTU 9000
214+
#define SCTP_MIN_PLPMTU 512
215+
216+
#define SCTP_MAX_PROBES 3
217+
218+
#define SCTP_PL_BIG_STEP 32
219+
#define SCTP_PL_MIN_STEP 4
220+
203221
/* These functions map various type to printable names. */
204222
const char *sctp_cname(const union sctp_subtype id); /* chunk types */
205223
const char *sctp_oname(const union sctp_subtype id); /* other events */
@@ -424,4 +442,6 @@ enum {
424442
*/
425443
#define SCTP_AUTH_RANDOM_LENGTH 32
426444

445+
#define SCTP_PROBE_TIMER_MIN 5000
446+
427447
#endif /* __sctp_constants_h__ */

include/net/sctp/sctp.h

Lines changed: 54 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,8 @@ struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *,
145145
struct sctphdr *, struct sctp_association **,
146146
struct sctp_transport **);
147147
void sctp_err_finish(struct sock *, struct sctp_transport *);
148+
int sctp_udp_v4_err(struct sock *sk, struct sk_buff *skb);
149+
int sctp_udp_v6_err(struct sock *sk, struct sk_buff *skb);
148150
void sctp_icmp_frag_needed(struct sock *, struct sctp_association *,
149151
struct sctp_transport *t, __u32 pmtu);
150152
void sctp_icmp_redirect(struct sock *, struct sctp_transport *,
@@ -573,14 +575,15 @@ static inline struct dst_entry *sctp_transport_dst_check(struct sctp_transport *
573575
/* Calculate max payload size given a MTU, or the total overhead if
574576
* given MTU is zero
575577
*/
576-
static inline __u32 sctp_mtu_payload(const struct sctp_sock *sp,
577-
__u32 mtu, __u32 extra)
578+
static inline __u32 __sctp_mtu_payload(const struct sctp_sock *sp,
579+
const struct sctp_transport *t,
580+
__u32 mtu, __u32 extra)
578581
{
579582
__u32 overhead = sizeof(struct sctphdr) + extra;
580583

581584
if (sp) {
582585
overhead += sp->pf->af->net_header_len;
583-
if (sp->udp_port)
586+
if (sp->udp_port && (!t || t->encap_port))
584587
overhead += sizeof(struct udphdr);
585588
} else {
586589
overhead += sizeof(struct ipv6hdr);
@@ -592,6 +595,12 @@ static inline __u32 sctp_mtu_payload(const struct sctp_sock *sp,
592595
return mtu ? mtu - overhead : overhead;
593596
}
594597

598+
static inline __u32 sctp_mtu_payload(const struct sctp_sock *sp,
599+
__u32 mtu, __u32 extra)
600+
{
601+
return __sctp_mtu_payload(sp, NULL, mtu, extra);
602+
}
603+
595604
static inline __u32 sctp_dst_mtu(const struct dst_entry *dst)
596605
{
597606
return SCTP_TRUNC4(max_t(__u32, dst_mtu(dst),
@@ -615,6 +624,48 @@ static inline __u32 sctp_min_frag_point(struct sctp_sock *sp, __u16 datasize)
615624
return sctp_mtu_payload(sp, SCTP_DEFAULT_MINSEGMENT, datasize);
616625
}
617626

627+
static inline int sctp_transport_pl_hlen(struct sctp_transport *t)
628+
{
629+
return __sctp_mtu_payload(sctp_sk(t->asoc->base.sk), t, 0, 0);
630+
}
631+
632+
static inline void sctp_transport_pl_reset(struct sctp_transport *t)
633+
{
634+
if (t->probe_interval && (t->param_flags & SPP_PMTUD_ENABLE) &&
635+
(t->state == SCTP_ACTIVE || t->state == SCTP_UNKNOWN)) {
636+
if (t->pl.state == SCTP_PL_DISABLED) {
637+
t->pl.state = SCTP_PL_BASE;
638+
t->pl.pmtu = SCTP_BASE_PLPMTU;
639+
t->pl.probe_size = SCTP_BASE_PLPMTU;
640+
sctp_transport_reset_probe_timer(t);
641+
}
642+
} else {
643+
if (t->pl.state != SCTP_PL_DISABLED) {
644+
if (del_timer(&t->probe_timer))
645+
sctp_transport_put(t);
646+
t->pl.state = SCTP_PL_DISABLED;
647+
}
648+
}
649+
}
650+
651+
static inline void sctp_transport_pl_update(struct sctp_transport *t)
652+
{
653+
if (t->pl.state == SCTP_PL_DISABLED)
654+
return;
655+
656+
if (del_timer(&t->probe_timer))
657+
sctp_transport_put(t);
658+
659+
t->pl.state = SCTP_PL_BASE;
660+
t->pl.pmtu = SCTP_BASE_PLPMTU;
661+
t->pl.probe_size = SCTP_BASE_PLPMTU;
662+
}
663+
664+
static inline bool sctp_transport_pl_enabled(struct sctp_transport *t)
665+
{
666+
return t->pl.state != SCTP_PL_DISABLED;
667+
}
668+
618669
static inline bool sctp_newsk_ready(const struct sock *sk)
619670
{
620671
return sock_flag(sk, SOCK_DEAD) || sk->sk_socket;

include/net/sctp/sm.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,7 @@ sctp_state_fn_t sctp_sf_cookie_wait_icmp_abort;
151151
/* Prototypes for timeout event state functions. */
152152
sctp_state_fn_t sctp_sf_do_6_3_3_rtx;
153153
sctp_state_fn_t sctp_sf_send_reconf;
154+
sctp_state_fn_t sctp_sf_send_probe;
154155
sctp_state_fn_t sctp_sf_do_6_2_sack;
155156
sctp_state_fn_t sctp_sf_autoclose_timer_expire;
156157

@@ -225,11 +226,13 @@ struct sctp_chunk *sctp_make_new_encap_port(
225226
const struct sctp_association *asoc,
226227
const struct sctp_chunk *chunk);
227228
struct sctp_chunk *sctp_make_heartbeat(const struct sctp_association *asoc,
228-
const struct sctp_transport *transport);
229+
const struct sctp_transport *transport,
230+
__u32 probe_size);
229231
struct sctp_chunk *sctp_make_heartbeat_ack(const struct sctp_association *asoc,
230232
const struct sctp_chunk *chunk,
231233
const void *payload,
232234
const size_t paylen);
235+
struct sctp_chunk *sctp_make_pad(const struct sctp_association *asoc, int len);
233236
struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc,
234237
const struct sctp_chunk *chunk,
235238
__be16 cause_code, const void *payload,
@@ -310,6 +313,7 @@ int sctp_do_sm(struct net *net, enum sctp_event_type event_type,
310313
void sctp_generate_t3_rtx_event(struct timer_list *t);
311314
void sctp_generate_heartbeat_event(struct timer_list *t);
312315
void sctp_generate_reconf_event(struct timer_list *t);
316+
void sctp_generate_probe_event(struct timer_list *t);
313317
void sctp_generate_proto_unreach_event(struct timer_list *t);
314318

315319
void sctp_ootb_pkt_free(struct sctp_packet *packet);

include/net/sctp/structs.h

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,7 @@ struct sctp_sock {
177177
* will be inherited by all new associations.
178178
*/
179179
__u32 hbinterval;
180+
__u32 probe_interval;
180181

181182
__be16 udp_port;
182183
__be16 encap_port;
@@ -385,6 +386,7 @@ struct sctp_sender_hb_info {
385386
union sctp_addr daddr;
386387
unsigned long sent_at;
387388
__u64 hb_nonce;
389+
__u32 probe_size;
388390
};
389391

390392
int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt,
@@ -656,6 +658,7 @@ struct sctp_chunk {
656658
data_accepted:1, /* At least 1 chunk accepted */
657659
auth:1, /* IN: was auth'ed | OUT: needs auth */
658660
has_asconf:1, /* IN: have seen an asconf before */
661+
pmtu_probe:1, /* Used by PLPMTUD, can be set in s HB chunk */
659662
tsn_missing_report:2, /* Data chunk missing counter. */
660663
fast_retransmit:2; /* Is this chunk fast retransmitted? */
661664
};
@@ -858,6 +861,7 @@ struct sctp_transport {
858861
* the destination address every heartbeat interval.
859862
*/
860863
unsigned long hbinterval;
864+
unsigned long probe_interval;
861865

862866
/* SACK delay timeout */
863867
unsigned long sackdelay;
@@ -934,6 +938,9 @@ struct sctp_transport {
934938
/* Timer to handler reconf chunk rtx */
935939
struct timer_list reconf_timer;
936940

941+
/* Timer to send a probe HB packet for PLPMTUD */
942+
struct timer_list probe_timer;
943+
937944
/* Since we're using per-destination retransmission timers
938945
* (see above), we're also using per-destination "transmitted"
939946
* queues. This probably ought to be a private struct
@@ -976,6 +983,14 @@ struct sctp_transport {
976983
char cacc_saw_newack;
977984
} cacc;
978985

986+
struct {
987+
__u16 pmtu;
988+
__u16 probe_size;
989+
__u16 probe_high;
990+
__u8 probe_count;
991+
__u8 state;
992+
} pl; /* plpmtud related */
993+
979994
/* 64-bit random number sent with heartbeat. */
980995
__u64 hb_nonce;
981996

@@ -993,6 +1008,7 @@ void sctp_transport_free(struct sctp_transport *);
9931008
void sctp_transport_reset_t3_rtx(struct sctp_transport *);
9941009
void sctp_transport_reset_hb_timer(struct sctp_transport *);
9951010
void sctp_transport_reset_reconf_timer(struct sctp_transport *transport);
1011+
void sctp_transport_reset_probe_timer(struct sctp_transport *transport);
9961012
int sctp_transport_hold(struct sctp_transport *);
9971013
void sctp_transport_put(struct sctp_transport *);
9981014
void sctp_transport_update_rto(struct sctp_transport *, __u32);
@@ -1007,6 +1023,8 @@ bool sctp_transport_update_pmtu(struct sctp_transport *t, u32 pmtu);
10071023
void sctp_transport_immediate_rtx(struct sctp_transport *);
10081024
void sctp_transport_dst_release(struct sctp_transport *t);
10091025
void sctp_transport_dst_confirm(struct sctp_transport *t);
1026+
void sctp_transport_pl_send(struct sctp_transport *t);
1027+
void sctp_transport_pl_recv(struct sctp_transport *t);
10101028

10111029

10121030
/* This is the structure we use to queue packets as they come into
@@ -1795,6 +1813,7 @@ struct sctp_association {
17951813
* will be inherited by all new transports.
17961814
*/
17971815
unsigned long hbinterval;
1816+
unsigned long probe_interval;
17981817

17991818
__be16 encap_port;
18001819

include/uapi/linux/sctp.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,7 @@ typedef __s32 sctp_assoc_t;
141141
#define SCTP_EXPOSE_POTENTIALLY_FAILED_STATE 131
142142
#define SCTP_EXPOSE_PF_STATE SCTP_EXPOSE_POTENTIALLY_FAILED_STATE
143143
#define SCTP_REMOTE_UDP_ENCAPS_PORT 132
144+
#define SCTP_PLPMTUD_PROBE_INTERVAL 133
144145

145146
/* PR-SCTP policies */
146147
#define SCTP_PR_SCTP_NONE 0x0000
@@ -1213,4 +1214,11 @@ enum sctp_sched_type {
12131214
SCTP_SS_MAX = SCTP_SS_RR
12141215
};
12151216

1217+
/* Probe Interval socket option */
1218+
struct sctp_probeinterval {
1219+
sctp_assoc_t spi_assoc_id;
1220+
struct sockaddr_storage spi_address;
1221+
__u32 spi_interval;
1222+
};
1223+
12161224
#endif /* _UAPI_SCTP_H */

net/sctp/associola.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ static struct sctp_association *sctp_association_init(
9898
* sock configured value.
9999
*/
100100
asoc->hbinterval = msecs_to_jiffies(sp->hbinterval);
101+
asoc->probe_interval = msecs_to_jiffies(sp->probe_interval);
101102

102103
asoc->encap_port = sp->encap_port;
103104

@@ -625,6 +626,7 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
625626
* association configured value.
626627
*/
627628
peer->hbinterval = asoc->hbinterval;
629+
peer->probe_interval = asoc->probe_interval;
628630

629631
peer->encap_port = asoc->encap_port;
630632

@@ -714,6 +716,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
714716
return NULL;
715717
}
716718

719+
sctp_transport_pl_reset(peer);
720+
717721
/* Attach the remote transport to our asoc. */
718722
list_add_tail_rcu(&peer->transports, &asoc->peer.transport_addr_list);
719723
asoc->peer.transport_count++;
@@ -812,6 +816,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
812816
spc_state = SCTP_ADDR_CONFIRMED;
813817

814818
transport->state = SCTP_ACTIVE;
819+
sctp_transport_pl_reset(transport);
815820
break;
816821

817822
case SCTP_TRANSPORT_DOWN:
@@ -821,6 +826,7 @@ void sctp_assoc_control_transport(struct sctp_association *asoc,
821826
*/
822827
if (transport->state != SCTP_UNCONFIRMED) {
823828
transport->state = SCTP_INACTIVE;
829+
sctp_transport_pl_reset(transport);
824830
spc_state = SCTP_ADDR_UNREACHABLE;
825831
} else {
826832
sctp_transport_dst_release(transport);

0 commit comments

Comments
 (0)