Skip to content

Commit f0c227c

Browse files
committed
Merge tag 'mlx5-updates-2021-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says: ==================== mlx5-updates-2021-06-14 1) Trivial Lag refactroing in preparation for upcomming Single FDB lag feature - First 3 patches 2) Scalable IRQ distriburion for Sub-functions A subfunction (SF) is a lightweight function that has a parent PCI function (PF) on which it is deployed. Currently, mlx5 subfunction is sharing the IRQs (MSI-X) with their parent PCI function. Before this series the PF allocates enough IRQs to cover all the cores in a system, Newly created SFs will re-use all the IRQs that the PF has allocated for itself. Hence, the more SFs are created, there are more EQs per IRQs. Therefore, whenever we handle an interrupt, we need to pull all SFs EQs and PF EQs instead of PF EQs without SFs on the system. This leads to a hard impact on the performance of SFs and PF. For example, on machine with: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores. PCI Express 3 with BW of 126 Gb/s. ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16. test case: iperf TX BW single CPU, affinity of app and IRQ are the same. PF only: no SFs on the system, 56 IRQs. SF (before), 250 SFs Sharing the same 56 IRQs . SF (now), 250 SFs + 255 avaiable IRQs for the NIC. (please see IRQ spread scheme below). application SF-IRQ channel BW(Gb/sec) interrupts/sec iperf TX affinity PF only cpu={0} cpu={0} cpu={0} 79 8200 SF (before) cpu={0} cpu={0} cpu={0} 51.3 (-35%) 9500 SF (now) cpu={0} cpu={0} cpu={0} 78 (-2%) 8200 command: $ taskset -c 0 iperf -c 11.1.1.1 -P 3 -i 6 -t 30 | grep SUM The different between the SF examples is that before this series we allocate num_cpus (56) IRQs, and all of them were shared among the PF and the SFs. And after this series, we allocate 255 IRQs, and we spread the SFs among the above IRQs. This have significantly decreased the load on each IRQ and the number of EQs per IRQ is down by 95% (251->11). In this patchset the solution proposed is to have a dedicated IRQ pool for SFs to use. the pool will allocate a large number of IRQs for SFs to grab from in order to minimize irq sharing between the different SFs. IRQs will not be requested from the OS until they are 1st requested by an SF consumer, and will be eventually released when the last SF consumer releases them. For the detailed IRQ spread and allocation scheme please see last patch: ("net/mlx5: Round-Robin EQs over IRQs") ==================== Signed-off-by: David S. Miller <[email protected]>
2 parents 08ab4d7 + c36326d commit f0c227c

File tree

17 files changed

+794
-435
lines changed

17 files changed

+794
-435
lines changed

drivers/infiniband/hw/mlx5/odp.c

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1559,12 +1559,16 @@ int mlx5r_odp_create_eq(struct mlx5_ib_dev *dev, struct mlx5_ib_pf_eq *eq)
15591559
}
15601560

15611561
eq->irq_nb.notifier_call = mlx5_ib_eq_pf_int;
1562-
param = (struct mlx5_eq_param){
1563-
.irq_index = 0,
1562+
param = (struct mlx5_eq_param) {
15641563
.nent = MLX5_IB_NUM_PF_EQE,
15651564
};
15661565
param.mask[0] = 1ull << MLX5_EVENT_TYPE_PAGE_FAULT;
1566+
if (!zalloc_cpumask_var(&param.affinity, GFP_KERNEL)) {
1567+
err = -ENOMEM;
1568+
goto err_wq;
1569+
}
15671570
eq->core = mlx5_eq_create_generic(dev->mdev, &param);
1571+
free_cpumask_var(param.affinity);
15681572
if (IS_ERR(eq->core)) {
15691573
err = PTR_ERR(eq->core);
15701574
goto err_wq;

drivers/net/ethernet/mellanox/mlx5/core/en_main.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5114,7 +5114,7 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
51145114
mlx5e_set_netdev_mtu_boundaries(priv);
51155115
mlx5e_set_dev_port_mtu(priv);
51165116

5117-
mlx5_lag_add(mdev, netdev);
5117+
mlx5_lag_add_netdev(mdev, netdev);
51185118

51195119
mlx5e_enable_async_events(priv);
51205120
mlx5e_enable_blocking_events(priv);
@@ -5162,7 +5162,7 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
51625162
priv->en_trap = NULL;
51635163
}
51645164
mlx5e_disable_async_events(priv);
5165-
mlx5_lag_remove(mdev);
5165+
mlx5_lag_remove_netdev(mdev, priv->netdev);
51665166
mlx5_vxlan_reset_to_default(mdev->vxlan);
51675167
}
51685168

drivers/net/ethernet/mellanox/mlx5/core/en_rep.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -976,7 +976,7 @@ static void mlx5e_uplink_rep_enable(struct mlx5e_priv *priv)
976976
if (MLX5_CAP_GEN(mdev, uplink_follow))
977977
mlx5_modify_vport_admin_state(mdev, MLX5_VPORT_STATE_OP_MOD_UPLINK,
978978
0, 0, MLX5_VPORT_ADMIN_STATE_AUTO);
979-
mlx5_lag_add(mdev, netdev);
979+
mlx5_lag_add_netdev(mdev, netdev);
980980
priv->events_nb.notifier_call = uplink_rep_async_event;
981981
mlx5_notifier_register(mdev, &priv->events_nb);
982982
mlx5e_dcbnl_initialize(priv);
@@ -1009,7 +1009,7 @@ static void mlx5e_uplink_rep_disable(struct mlx5e_priv *priv)
10091009
mlx5e_dcbnl_delete_app(priv);
10101010
mlx5_notifier_unregister(mdev, &priv->events_nb);
10111011
mlx5e_rep_tc_disable(priv);
1012-
mlx5_lag_remove(mdev);
1012+
mlx5_lag_remove_netdev(mdev, priv->netdev);
10131013
}
10141014

10151015
static MLX5E_DEFINE_STATS_GRP(sw_rep, 0);

0 commit comments

Comments
 (0)