Skip to content

Commit 2f5dc00

Browse files
vladimirolteandavem330
authored andcommitted
net: bridge: switchdev: let drivers inform which bridge ports are offloaded
On reception of an skb, the bridge checks if it was marked as 'already forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it is, it assigns the source hardware domain of that skb based on the hardware domain of the ingress port. Then during forwarding, it enforces that the egress port must have a different hardware domain than the ingress one (this is done in nbp_switchdev_allowed_egress). Non-switchdev drivers don't report any physical switch id (neither through devlink nor .ndo_get_port_parent_id), therefore the bridge assigns them a hardware domain of 0, and packets coming from them will always have skb->offload_fwd_mark = 0. So there aren't any restrictions. Problems appear due to the fact that DSA would like to perform software fallback for bonding and team interfaces that the physical switch cannot offload. +-- br0 ---+ / / | \ / / | \ / | | bond0 / | | / \ swp0 swp1 swp2 swp3 swp4 There, it is desirable that the presence of swp3 and swp4 under a non-offloaded LAG does not preclude us from doing hardware bridging beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high enough that software bridging between {swp0,swp1,swp2} and bond0 is not impractical. But this creates an impossible paradox given the current way in which port hardware domains are assigned. When the driver receives a packet from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to something. - If we set it to 0, then the bridge will forward it towards swp1, swp2 and bond0. But the switch has already forwarded it towards swp1 and swp2 (not to bond0, remember, that isn't offloaded, so as far as the switch is concerned, ports swp3 and swp4 are not looking up the FDB, and the entire bond0 is a destination that is strictly behind the CPU). But we don't want duplicated traffic towards swp1 and swp2, so it's not ok to set skb->offload_fwd_mark = 0. - If we set it to 1, then the bridge will not forward the skb towards the ports with the same switchdev mark, i.e. not to swp1, swp2 and bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should have forwarded the skb there. So the real issue is that bond0 will be assigned the same hardware domain as {swp0,swp1,swp2}, because the function that assigns hardware domains to bridge ports, nbp_switchdev_add(), recurses through bond0's lower interfaces until it finds something that implements devlink (calls dev_get_port_parent_id with bool recurse = true). This is a problem because the fact that bond0 can be offloaded by swp3 and swp4 in our example is merely an assumption. A solution is to give the bridge explicit hints as to what hardware domain it should use for each port. Currently, the bridging offload is very 'silent': a driver registers a netdevice notifier, which is put on the netns's notifier chain, and which sniffs around for NETDEV_CHANGEUPPER events where the upper is a bridge, and the lower is an interface it knows about (one registered by this driver, normally). Then, from within that notifier, it does a bunch of stuff behind the bridge's back, without the bridge necessarily knowing that there's somebody offloading that port. It looks like this: ip link set swp0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v call_netdevice_notifiers | v dsa_slave_netdevice_event | v oh, hey! it's for me! | v .port_bridge_join What we do to solve the conundrum is to be less silent, and change the switchdev drivers to present themselves to the bridge. Something like this: ip link set swp0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v bridge: Aye! I'll use this call_netdevice_notifiers ^ ppid as the | | hardware domain for v | this port, and zero dsa_slave_netdevice_event | if I got nothing. | | v | oh, hey! it's for me! | | | v | .port_bridge_join | | | +------------------------+ switchdev_bridge_port_offload(swp0, swp0) Then stacked interfaces (like bond0 on top of swp3/swp4) would be treated differently in DSA, depending on whether we can or cannot offload them. The offload case: ip link set bond0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v bridge: Aye! I'll use this call_netdevice_notifiers ^ ppid as the | | switchdev mark for v | bond0. dsa_slave_netdevice_event | Coincidentally (or not), | | bond0 and swp0, swp1, swp2 v | all have the same switchdev hmm, it's not quite for me, | mark now, since the ASIC but my driver has already | is able to forward towards called .port_lag_join | all these ports in hw. for it, because I have | a port with dp->lag_dev == bond0. | | | v | .port_bridge_join | for swp3 and swp4 | | | +------------------------+ switchdev_bridge_port_offload(bond0, swp3) switchdev_bridge_port_offload(bond0, swp4) And the non-offload case: ip link set bond0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v bridge waiting: call_netdevice_notifiers ^ huh, switchdev_bridge_port_offload | | wasn't called, okay, I'll use a v | hwdom of zero for this one. dsa_slave_netdevice_event : Then packets received on swp0 will | : not be software-forwarded towards v : swp1, but they will towards bond0. it's not for me, but bond0 is an upper of swp3 and swp4, but their dp->lag_dev is NULL because they couldn't offload it. Basically we can draw the conclusion that the lowers of a bridge port can come and go, so depending on the configuration of lowers for a bridge port, it can dynamically toggle between offloaded and unoffloaded. Therefore, we need an equivalent switchdev_bridge_port_unoffload too. This patch changes the way any switchdev driver interacts with the bridge. From now on, everybody needs to call switchdev_bridge_port_offload and switchdev_bridge_port_unoffload, otherwise the bridge will treat the port as non-offloaded and allow software flooding to other ports from the same ASIC. Note that these functions lay the ground for a more complex handshake between switchdev drivers and the bridge in the future. For drivers that will request a replay of the switchdev objects when they offload and unoffload a bridge port (DSA, dpaa2-switch, ocelot), we place the call to switchdev_bridge_port_unoffload() strategically inside the NETDEV_PRECHANGEUPPER notifier's code path, and not inside NETDEV_CHANGEUPPER. This is because the switchdev object replay helpers need the netdev adjacency lists to be valid, and that is only true in NETDEV_PRECHANGEUPPER. Cc: Vadym Kochan <[email protected]> Cc: Taras Chornyi <[email protected]> Cc: Ioana Ciornei <[email protected]> Cc: Lars Povlsen <[email protected]> Cc: Steen Hegelund <[email protected]> Cc: [email protected] Cc: Claudiu Manoil <[email protected]> Cc: Alexandre Belloni <[email protected]> Cc: Grygorii Strashko <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Tested-by: Ioana Ciornei <[email protected]> # dpaa2-switch: regression Acked-by: Ioana Ciornei <[email protected]> # dpaa2-switch Tested-by: Horatiu Vultur <[email protected]> # ocelot-switch Signed-off-by: David S. Miller <[email protected]>
1 parent 8582661 commit 2f5dc00

File tree

17 files changed

+298
-57
lines changed

17 files changed

+298
-57
lines changed

drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1930,8 +1930,13 @@ static int dpaa2_switch_port_bridge_join(struct net_device *netdev,
19301930
if (err)
19311931
goto err_egress_flood;
19321932

1933+
err = switchdev_bridge_port_offload(netdev, netdev, extack);
1934+
if (err)
1935+
goto err_switchdev_offload;
1936+
19331937
return 0;
19341938

1939+
err_switchdev_offload:
19351940
err_egress_flood:
19361941
dpaa2_switch_port_set_fdb(port_priv, NULL);
19371942
return err;
@@ -1957,6 +1962,11 @@ static int dpaa2_switch_port_restore_rxvlan(struct net_device *vdev, int vid, vo
19571962
return dpaa2_switch_port_vlan_add(arg, vlan_proto, vid);
19581963
}
19591964

1965+
static void dpaa2_switch_port_pre_bridge_leave(struct net_device *netdev)
1966+
{
1967+
switchdev_bridge_port_unoffload(netdev);
1968+
}
1969+
19601970
static int dpaa2_switch_port_bridge_leave(struct net_device *netdev)
19611971
{
19621972
struct ethsw_port_priv *port_priv = netdev_priv(netdev);
@@ -2078,6 +2088,9 @@ static int dpaa2_switch_port_netdevice_event(struct notifier_block *nb,
20782088
if (err)
20792089
goto out;
20802090

2091+
if (!info->linking)
2092+
dpaa2_switch_port_pre_bridge_leave(netdev);
2093+
20812094
break;
20822095
case NETDEV_CHANGEUPPER:
20832096
upper_dev = info->upper_dev;

drivers/net/ethernet/marvell/prestera/prestera_main.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -746,7 +746,8 @@ static int prestera_netdev_port_event(struct net_device *lower,
746746
case NETDEV_CHANGEUPPER:
747747
if (netif_is_bridge_master(upper)) {
748748
if (info->linking)
749-
return prestera_bridge_port_join(upper, port);
749+
return prestera_bridge_port_join(upper, port,
750+
extack);
750751
else
751752
prestera_bridge_port_leave(upper, port);
752753
} else if (netif_is_lag_master(upper)) {

drivers/net/ethernet/marvell/prestera/prestera_switchdev.c

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -480,7 +480,8 @@ prestera_bridge_1d_port_join(struct prestera_bridge_port *br_port)
480480
}
481481

482482
int prestera_bridge_port_join(struct net_device *br_dev,
483-
struct prestera_port *port)
483+
struct prestera_port *port,
484+
struct netlink_ext_ack *extack)
484485
{
485486
struct prestera_switchdev *swdev = port->sw->swdev;
486487
struct prestera_bridge_port *br_port;
@@ -500,6 +501,10 @@ int prestera_bridge_port_join(struct net_device *br_dev,
500501
goto err_brport_create;
501502
}
502503

504+
err = switchdev_bridge_port_offload(br_port->dev, port->dev, extack);
505+
if (err)
506+
goto err_switchdev_offload;
507+
503508
if (bridge->vlan_enabled)
504509
return 0;
505510

@@ -510,6 +515,8 @@ int prestera_bridge_port_join(struct net_device *br_dev,
510515
return 0;
511516

512517
err_port_join:
518+
switchdev_bridge_port_unoffload(br_port->dev);
519+
err_switchdev_offload:
513520
prestera_bridge_port_put(br_port);
514521
err_brport_create:
515522
prestera_bridge_put(bridge);
@@ -584,6 +591,8 @@ void prestera_bridge_port_leave(struct net_device *br_dev,
584591
else
585592
prestera_bridge_1d_port_leave(br_port);
586593

594+
switchdev_bridge_port_unoffload(br_port->dev);
595+
587596
prestera_hw_port_learning_set(port, false);
588597
prestera_hw_port_flood_set(port, BR_FLOOD | BR_MCAST_FLOOD, 0);
589598
prestera_port_vid_stp_set(port, PRESTERA_VID_ALL, BR_STATE_FORWARDING);

drivers/net/ethernet/marvell/prestera/prestera_switchdev.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ int prestera_switchdev_init(struct prestera_switch *sw);
88
void prestera_switchdev_fini(struct prestera_switch *sw);
99

1010
int prestera_bridge_port_join(struct net_device *br_dev,
11-
struct prestera_port *port);
11+
struct prestera_port *port,
12+
struct netlink_ext_ack *extack);
1213

1314
void prestera_bridge_port_leave(struct net_device *br_dev,
1415
struct prestera_port *port);

drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -335,14 +335,16 @@ mlxsw_sp_bridge_port_find(struct mlxsw_sp_bridge *bridge,
335335

336336
static struct mlxsw_sp_bridge_port *
337337
mlxsw_sp_bridge_port_create(struct mlxsw_sp_bridge_device *bridge_device,
338-
struct net_device *brport_dev)
338+
struct net_device *brport_dev,
339+
struct netlink_ext_ack *extack)
339340
{
340341
struct mlxsw_sp_bridge_port *bridge_port;
341342
struct mlxsw_sp_port *mlxsw_sp_port;
343+
int err;
342344

343345
bridge_port = kzalloc(sizeof(*bridge_port), GFP_KERNEL);
344346
if (!bridge_port)
345-
return NULL;
347+
return ERR_PTR(-ENOMEM);
346348

347349
mlxsw_sp_port = mlxsw_sp_port_dev_lower_find(brport_dev);
348350
bridge_port->lagged = mlxsw_sp_port->lagged;
@@ -359,12 +361,23 @@ mlxsw_sp_bridge_port_create(struct mlxsw_sp_bridge_device *bridge_device,
359361
list_add(&bridge_port->list, &bridge_device->ports_list);
360362
bridge_port->ref_count = 1;
361363

364+
err = switchdev_bridge_port_offload(brport_dev, mlxsw_sp_port->dev,
365+
extack);
366+
if (err)
367+
goto err_switchdev_offload;
368+
362369
return bridge_port;
370+
371+
err_switchdev_offload:
372+
list_del(&bridge_port->list);
373+
kfree(bridge_port);
374+
return ERR_PTR(err);
363375
}
364376

365377
static void
366378
mlxsw_sp_bridge_port_destroy(struct mlxsw_sp_bridge_port *bridge_port)
367379
{
380+
switchdev_bridge_port_unoffload(bridge_port->dev);
368381
list_del(&bridge_port->list);
369382
WARN_ON(!list_empty(&bridge_port->vlans_list));
370383
kfree(bridge_port);
@@ -390,9 +403,10 @@ mlxsw_sp_bridge_port_get(struct mlxsw_sp_bridge *bridge,
390403
if (IS_ERR(bridge_device))
391404
return ERR_CAST(bridge_device);
392405

393-
bridge_port = mlxsw_sp_bridge_port_create(bridge_device, brport_dev);
394-
if (!bridge_port) {
395-
err = -ENOMEM;
406+
bridge_port = mlxsw_sp_bridge_port_create(bridge_device, brport_dev,
407+
extack);
408+
if (IS_ERR(bridge_port)) {
409+
err = PTR_ERR(bridge_port);
396410
goto err_bridge_port_create;
397411
}
398412

drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -93,9 +93,12 @@ static int sparx5_port_attr_set(struct net_device *dev, const void *ctx,
9393
}
9494

9595
static int sparx5_port_bridge_join(struct sparx5_port *port,
96-
struct net_device *bridge)
96+
struct net_device *bridge,
97+
struct netlink_ext_ack *extack)
9798
{
9899
struct sparx5 *sparx5 = port->sparx5;
100+
struct net_device *ndev = port->ndev;
101+
int err;
99102

100103
if (bitmap_empty(sparx5->bridge_mask, SPX5_PORTS))
101104
/* First bridged port */
@@ -109,19 +112,29 @@ static int sparx5_port_bridge_join(struct sparx5_port *port,
109112

110113
set_bit(port->portno, sparx5->bridge_mask);
111114

115+
err = switchdev_bridge_port_offload(ndev, ndev, extack);
116+
if (err)
117+
goto err_switchdev_offload;
118+
112119
/* Port enters in bridge mode therefor don't need to copy to CPU
113120
* frames for multicast in case the bridge is not requesting them
114121
*/
115-
__dev_mc_unsync(port->ndev, sparx5_mc_unsync);
122+
__dev_mc_unsync(ndev, sparx5_mc_unsync);
116123

117124
return 0;
125+
126+
err_switchdev_offload:
127+
clear_bit(port->portno, sparx5->bridge_mask);
128+
return err;
118129
}
119130

120131
static void sparx5_port_bridge_leave(struct sparx5_port *port,
121132
struct net_device *bridge)
122133
{
123134
struct sparx5 *sparx5 = port->sparx5;
124135

136+
switchdev_bridge_port_unoffload(port->ndev);
137+
125138
clear_bit(port->portno, sparx5->bridge_mask);
126139
if (bitmap_empty(sparx5->bridge_mask, SPX5_PORTS))
127140
sparx5->hw_bridge_dev = NULL;
@@ -139,11 +152,15 @@ static int sparx5_port_changeupper(struct net_device *dev,
139152
struct netdev_notifier_changeupper_info *info)
140153
{
141154
struct sparx5_port *port = netdev_priv(dev);
155+
struct netlink_ext_ack *extack;
142156
int err = 0;
143157

158+
extack = netdev_notifier_info_to_extack(&info->info);
159+
144160
if (netif_is_bridge_master(info->upper_dev)) {
145161
if (info->linking)
146-
err = sparx5_port_bridge_join(port, info->upper_dev);
162+
err = sparx5_port_bridge_join(port, info->upper_dev,
163+
extack);
147164
else
148165
sparx5_port_bridge_leave(port, info->upper_dev);
149166

drivers/net/ethernet/mscc/ocelot_net.c

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1216,17 +1216,28 @@ static int ocelot_netdevice_bridge_join(struct net_device *dev,
12161216

12171217
ocelot_port_bridge_join(ocelot, port, bridge);
12181218

1219+
err = switchdev_bridge_port_offload(brport_dev, dev, extack);
1220+
if (err)
1221+
goto err_switchdev_offload;
1222+
12191223
err = ocelot_switchdev_sync(ocelot, port, brport_dev, bridge, extack);
12201224
if (err)
12211225
goto err_switchdev_sync;
12221226

12231227
return 0;
12241228

12251229
err_switchdev_sync:
1230+
switchdev_bridge_port_unoffload(brport_dev);
1231+
err_switchdev_offload:
12261232
ocelot_port_bridge_leave(ocelot, port, bridge);
12271233
return err;
12281234
}
12291235

1236+
static void ocelot_netdevice_pre_bridge_leave(struct net_device *brport_dev)
1237+
{
1238+
switchdev_bridge_port_unoffload(brport_dev);
1239+
}
1240+
12301241
static int ocelot_netdevice_bridge_leave(struct net_device *dev,
12311242
struct net_device *brport_dev,
12321243
struct net_device *bridge)
@@ -1279,6 +1290,18 @@ static int ocelot_netdevice_lag_join(struct net_device *dev,
12791290
return err;
12801291
}
12811292

1293+
static void ocelot_netdevice_pre_lag_leave(struct net_device *dev,
1294+
struct net_device *bond)
1295+
{
1296+
struct net_device *bridge_dev;
1297+
1298+
bridge_dev = netdev_master_upper_dev_get(bond);
1299+
if (!bridge_dev || !netif_is_bridge_master(bridge_dev))
1300+
return;
1301+
1302+
ocelot_netdevice_pre_bridge_leave(bond);
1303+
}
1304+
12821305
static int ocelot_netdevice_lag_leave(struct net_device *dev,
12831306
struct net_device *bond)
12841307
{
@@ -1355,6 +1378,43 @@ ocelot_netdevice_lag_changeupper(struct net_device *dev,
13551378
return NOTIFY_DONE;
13561379
}
13571380

1381+
static int
1382+
ocelot_netdevice_prechangeupper(struct net_device *dev,
1383+
struct net_device *brport_dev,
1384+
struct netdev_notifier_changeupper_info *info)
1385+
{
1386+
if (netif_is_bridge_master(info->upper_dev) && !info->linking)
1387+
ocelot_netdevice_pre_bridge_leave(brport_dev);
1388+
1389+
if (netif_is_lag_master(info->upper_dev) && !info->linking)
1390+
ocelot_netdevice_pre_lag_leave(dev, info->upper_dev);
1391+
1392+
return NOTIFY_DONE;
1393+
}
1394+
1395+
static int
1396+
ocelot_netdevice_lag_prechangeupper(struct net_device *dev,
1397+
struct netdev_notifier_changeupper_info *info)
1398+
{
1399+
struct net_device *lower;
1400+
struct list_head *iter;
1401+
int err = NOTIFY_DONE;
1402+
1403+
netdev_for_each_lower_dev(dev, lower, iter) {
1404+
struct ocelot_port_private *priv = netdev_priv(lower);
1405+
struct ocelot_port *ocelot_port = &priv->port;
1406+
1407+
if (ocelot_port->bond != dev)
1408+
return NOTIFY_OK;
1409+
1410+
err = ocelot_netdevice_prechangeupper(dev, lower, info);
1411+
if (err)
1412+
return err;
1413+
}
1414+
1415+
return NOTIFY_DONE;
1416+
}
1417+
13581418
static int
13591419
ocelot_netdevice_changelowerstate(struct net_device *dev,
13601420
struct netdev_lag_lower_state_info *info)
@@ -1382,6 +1442,17 @@ static int ocelot_netdevice_event(struct notifier_block *unused,
13821442
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
13831443

13841444
switch (event) {
1445+
case NETDEV_PRECHANGEUPPER: {
1446+
struct netdev_notifier_changeupper_info *info = ptr;
1447+
1448+
if (ocelot_netdevice_dev_check(dev))
1449+
return ocelot_netdevice_prechangeupper(dev, dev, info);
1450+
1451+
if (netif_is_lag_master(dev))
1452+
return ocelot_netdevice_lag_prechangeupper(dev, info);
1453+
1454+
break;
1455+
}
13851456
case NETDEV_CHANGEUPPER: {
13861457
struct netdev_notifier_changeupper_info *info = ptr;
13871458

drivers/net/ethernet/rocker/rocker.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,8 @@ struct rocker_world_ops {
119119
int (*port_obj_fdb_del)(struct rocker_port *rocker_port,
120120
u16 vid, const unsigned char *addr);
121121
int (*port_master_linked)(struct rocker_port *rocker_port,
122-
struct net_device *master);
122+
struct net_device *master,
123+
struct netlink_ext_ack *extack);
123124
int (*port_master_unlinked)(struct rocker_port *rocker_port,
124125
struct net_device *master);
125126
int (*port_neigh_update)(struct rocker_port *rocker_port,

drivers/net/ethernet/rocker/rocker_main.c

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1670,13 +1670,14 @@ rocker_world_port_fdb_del(struct rocker_port *rocker_port,
16701670
}
16711671

16721672
static int rocker_world_port_master_linked(struct rocker_port *rocker_port,
1673-
struct net_device *master)
1673+
struct net_device *master,
1674+
struct netlink_ext_ack *extack)
16741675
{
16751676
struct rocker_world_ops *wops = rocker_port->rocker->wops;
16761677

16771678
if (!wops->port_master_linked)
16781679
return -EOPNOTSUPP;
1679-
return wops->port_master_linked(rocker_port, master);
1680+
return wops->port_master_linked(rocker_port, master, extack);
16801681
}
16811682

16821683
static int rocker_world_port_master_unlinked(struct rocker_port *rocker_port,
@@ -3107,6 +3108,7 @@ struct rocker_port *rocker_port_dev_lower_find(struct net_device *dev,
31073108
static int rocker_netdevice_event(struct notifier_block *unused,
31083109
unsigned long event, void *ptr)
31093110
{
3111+
struct netlink_ext_ack *extack = netdev_notifier_info_to_extack(ptr);
31103112
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
31113113
struct netdev_notifier_changeupper_info *info;
31123114
struct rocker_port *rocker_port;
@@ -3123,7 +3125,8 @@ static int rocker_netdevice_event(struct notifier_block *unused,
31233125
rocker_port = netdev_priv(dev);
31243126
if (info->linking) {
31253127
err = rocker_world_port_master_linked(rocker_port,
3126-
info->upper_dev);
3128+
info->upper_dev,
3129+
extack);
31273130
if (err)
31283131
netdev_warn(dev, "failed to reflect master linked (err %d)\n",
31293132
err);

0 commit comments

Comments
 (0)