Skip to content

Conversation

@yseraf
Copy link

@yseraf yseraf commented Jun 9, 2020

In case of BNA error (Buffer Not Available), the GMAC sets the BNA bit and
triggers a RCOMP interrupt. At the next end of frame received, if the BNA
condition is not corrected, only the BNA bit is set but no more RCOMP
interrupts are generated. A RCOMP interrupt is guaranteed if the DMA queue
base address is written while Rx is disabled fixing the race condition.

Signed-off-by: Yannick Lanz [email protected]

EDIT: The commit has changed since this original one.

@yseraf
Copy link
Author

yseraf commented Jun 9, 2020

On the SAMA5D4, after some time (weeks to months), the netif becomes unusable. When the problem appears, there is no more IRQ from the MACB periph nor NET_RX softIRQ but the rx_resource_errors counter is incrementing on each packet.

Following the SAMA5D4's datasheet, this error counter is incrementing when a BNA condition is set.
What is also specified in the datasheet is that a BNA condition generates a RCOMP interrupt with the BNA bit set in the RSR register. What is not explained is (confirmed by our tests) that a second BNA condition without having moved the Rx ring queue pointer (if condition has not been cleared) doesn't generate an interrupt.

In some condition, the BNA bit is cleared but not the condition and the polling is letting place to the IRQ. In this case, there is no way to receive packets anymore (no more IRQ nor softIRQ).

@yseraf
Copy link
Author

yseraf commented Jun 9, 2020

Also we don't know if it the same for the SAMA5D2/3 or the other CPUs using this peripheral. In all cases, using a high intensity traffic generator (like PacketSender) on the SAMA5D4 makes the netif unusable after some minutes.
To finish, clearing completely the Rx ring in case of error is not elegant (and we can lost up to 512 packets) but that's an easy way to remove the race condition and our other attempts has all ended with race conditions.
We would ideally need more information regarding the internal behavior of the MACB to improve the fix.

@yseraf yseraf marked this pull request as draft June 10, 2020 09:24
@yseraf yseraf force-pushed the feature/fix-bna branch from 9f682b8 to 87346a0 Compare June 11, 2020 08:29
@yseraf
Copy link
Author

yseraf commented Jun 11, 2020

The patch has been modified to free only a slot (drop the packet) and remove the BNA condition without dropping the whole Rx ring.
Also, since the interrupts are disabled and not masked, we need to test reception and BNA condition after having enabled again the Rx IRQ and leaving the polling mode otherwise, we will let stale a incoming packet until the next reception (not really a problem) or enter a deadlock in case of BNA condition (no more polling, no more interrupt).

The netif can supports 30Mbps of UDP traffic (not tested more) with this patch without any issue.

@yseraf yseraf marked this pull request as ready for review June 11, 2020 08:36
@yseraf yseraf force-pushed the feature/fix-bna branch from 87346a0 to d4c4772 Compare June 11, 2020 10:54
yseraf added 2 commits June 14, 2020 18:02
Warning log level is more appropriate to avoid any confusion.

Signed-off-by: Yannick Lanz <[email protected]>
…ndition

for the GEM peripheral. Fix netif freeze in case of high incoming load.

Signed-off-by: Yannick Lanz <[email protected]>
@yseraf yseraf force-pushed the feature/fix-bna branch from d4c4772 to 3b542e6 Compare June 14, 2020 16:07
@yseraf yseraf changed the base branch from master to linux-5.4-at91 June 14, 2020 16:08
@yseraf
Copy link
Author

yseraf commented Aug 3, 2020

Anybody there ? Am I supposed to submit my patch somewhere else ?

@claudiubeznea
Copy link

Hi @hilt0n! What is the packet size you are using? I guess that you are sending traffic at line rate. Did you try to add MACB_CAPS_NEEDS_RSTONUBR as capability for sama5d4_config? See:

static const struct macb_config sama5d4_config = {
.caps = MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII | MACB_CAPS_NEEDS_RSTONUBR,
.dma_burst_length = 4,
.clk_init = macb_clk_init,
.init = macb_init,
.usrio = &macb_default_usrio,
};

If yes, is the behavior the same? Is the performance lower than what you achieve with this patch?
Could you please share the performance results before your chances on the kernel you are using?

Thank you,
Claudiu

@yseraf
Copy link
Author

yseraf commented Aug 21, 2020

Hi @claudiubeznea,

Thank you for the support. You are right regarding packet count, 30Mbit/s doesn't mean nothing.
I send 30k packets/second in UDP (packet size of about 20-30 Bytes).

I just tested MACB_CAPS_NEEDS_RSTONUBR flag and it doesn't fix the issue.
Following my tests and memories, the BNA error doesn't set the flag UBR but generate RCOMP interrupt with the BNA flag set (at least on the SAMA5D4).

The datasheet seems to be a lot of copy/paste from other CPU with previous (maybe?) version of the peripheral.
The support of Microchip didn't help since they just gave a brief of the datasheet.

Regarding performances, I tested with iperf3 with base options:

Base

Client -> Server

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   111 MBytes  93.0 Mbits/sec   48             sender
[  5]   0.00-10.00  sec   110 MBytes  92.6 Mbits/sec                  receiver

Server -> Client

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec   112 MBytes  93.5 Mbits/sec   85             sender
[  5]   0.00-10.01  sec   111 MBytes  93.0 Mbits/sec                  receiver

Fix

Client -> Server

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec   111 MBytes  93.1 Mbits/sec   88             sender
[  5]   0.00-10.01  sec   111 MBytes  92.7 Mbits/sec                  receiver

Server -> Client

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec   113 MBytes  94.4 Mbits/sec   20             sender
[  5]   0.00-10.01  sec   112 MBytes  93.9 Mbits/sec                  receiver

The results are really similar, do you have any proposal regarding more pertinent parameters for the tests ?

EDIT: The kernel is the 4.19.78 (linux-at91) and updated results

noglitch pushed a commit that referenced this pull request Oct 2, 2020
[ Upstream commit 95a3d8f ]

When xfstests generic/451, there is an BUG at mm/memcontrol.c:
  page:ffffea000560f2c0 refcount:2 mapcount:0 mapping:000000008544e0ea
       index:0xf
  mapping->aops:cifs_addr_ops dentry name:"tst-aio-dio-cycle-write.451"
  flags: 0x2fffff80000001(locked)
  raw: 002fffff80000001 ffffc90002023c50 ffffea0005280088 ffff88815cda0210
  raw: 000000000000000f 0000000000000000 00000002ffffffff ffff88817287d000
  page dumped because: VM_BUG_ON_PAGE(page->mem_cgroup)
  page->mem_cgroup:ffff88817287d000
  ------------[ cut here ]------------
  kernel BUG at mm/memcontrol.c:2659!
  invalid opcode: 0000 [#1] SMP
  CPU: 2 PID: 2038 Comm: xfs_io Not tainted 5.8.0-rc1 #44
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_
    073836-buildvm-ppc64le-16.ppc.4
  RIP: 0010:commit_charge+0x35/0x50
  Code: 0d 48 83 05 54 b2 02 05 01 48 89 77 38 c3 48 c7
        c6 78 4a ea ba 48 83 05 38 b2 02 05 01 e8 63 0d9
  RSP: 0018:ffffc90002023a50 EFLAGS: 00010202
  RAX: 0000000000000000 RBX: ffff88817287d000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffff88817ac97ea0 RDI: ffff88817ac97ea0
  RBP: ffffea000560f2c0 R08: 0000000000000203 R09: 0000000000000005
  R10: 0000000000000030 R11: ffffc900020237a8 R12: 0000000000000000
  R13: 0000000000000001 R14: 0000000000000001 R15: ffff88815a1272c0
  FS:  00007f5071ab0800(0000) GS:ffff88817ac80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000055efcd5ca000 CR3: 000000015d312000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   mem_cgroup_charge+0x166/0x4f0
   __add_to_page_cache_locked+0x4a9/0x710
   add_to_page_cache_locked+0x15/0x20
   cifs_readpages+0x217/0x1270
   read_pages+0x29a/0x670
   page_cache_readahead_unbounded+0x24f/0x390
   __do_page_cache_readahead+0x3f/0x60
   ondemand_readahead+0x1f1/0x470
   page_cache_async_readahead+0x14c/0x170
   generic_file_buffered_read+0x5df/0x1100
   generic_file_read_iter+0x10c/0x1d0
   cifs_strict_readv+0x139/0x170
   new_sync_read+0x164/0x250
   __vfs_read+0x39/0x60
   vfs_read+0xb5/0x1e0
   ksys_pread64+0x85/0xf0
   __x64_sys_pread64+0x22/0x30
   do_syscall_64+0x69/0x150
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f5071fcb1af
  Code: Bad RIP value.
  RSP: 002b:00007ffde2cdb8e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
  RAX: ffffffffffffffda RBX: 00007ffde2cdb990 RCX: 00007f5071fcb1af
  RDX: 0000000000001000 RSI: 000055efcd5ca000 RDI: 0000000000000003
  RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000001000 R11: 0000000000000293 R12: 0000000000000001
  R13: 000000000009f000 R14: 0000000000000000 R15: 0000000000001000
  Modules linked in:
  ---[ end trace 725fa14a3e1af65c ]---

Since commit 3fea5a4 ("mm: memcontrol: convert page cache to a new
mem_cgroup_charge() API") not cancel the page charge, the pages maybe
double add to pagecache:
thread1                       | thread2
cifs_readpages
readpages_get_pages
 add_to_page_cache_locked(head,index=n)=0
                              | readpages_get_pages
                              | add_to_page_cache_locked(head,index=n+1)=0
 add_to_page_cache_locked(head, index=n+1)=-EEXIST
 then, will next loop with list head page's
 index=n+1 and the page->mapping not NULL
readpages_get_pages
add_to_page_cache_locked(head, index=n+1)
 commit_charge
  VM_BUG_ON_PAGE

So, we should not do the next loop when any page add to page cache
failed.

Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Zhang Xiaoxu <[email protected]>
Signed-off-by: Steve French <[email protected]>
Acked-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
cristibirsan pushed a commit that referenced this pull request Aug 6, 2021
[ Upstream commit 5e21bb4 ]

These two types of XDP progs (BPF_XDP_DEVMAP, BPF_XDP_CPUMAP) will not be
executed directly in the driver, therefore we should also not directly
run them from here. To run in these two situations, there must be further
preparations done, otherwise these may cause a kernel panic.

For more details, see also dev_xdp_attach().

  [   46.982479] BUG: kernel NULL pointer dereference, address: 0000000000000000
  [   46.984295] #PF: supervisor read access in kernel mode
  [   46.985777] #PF: error_code(0x0000) - not-present page
  [   46.987227] PGD 800000010dca4067 P4D 800000010dca4067 PUD 10dca6067 PMD 0
  [   46.989201] Oops: 0000 [#1] SMP PTI
  [   46.990304] CPU: 7 PID: 562 Comm: a.out Not tainted 5.13.0+ #44
  [   46.992001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/24
  [   46.995113] RIP: 0010:___bpf_prog_run+0x17b/0x1710
  [   46.996586] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02
  [   47.001562] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246
  [   47.003115] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000
  [   47.005163] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98
  [   47.007135] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff
  [   47.009171] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98
  [   47.011172] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8
  [   47.013244] FS:  00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000
  [   47.015705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   47.017475] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0
  [   47.019558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [   47.021595] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [   47.023574] PKRU: 55555554
  [   47.024571] Call Trace:
  [   47.025424]  __bpf_prog_run32+0x32/0x50
  [   47.026296]  ? printk+0x53/0x6a
  [   47.027066]  ? ktime_get+0x39/0x90
  [   47.027895]  bpf_test_run.cold.28+0x23/0x123
  [   47.028866]  ? printk+0x53/0x6a
  [   47.029630]  bpf_prog_test_run_xdp+0x149/0x1d0
  [   47.030649]  __sys_bpf+0x1305/0x23d0
  [   47.031482]  __x64_sys_bpf+0x17/0x20
  [   47.032316]  do_syscall_64+0x3a/0x80
  [   47.033165]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [   47.034254] RIP: 0033:0x7f04a51364dd
  [   47.035133] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 48
  [   47.038768] RSP: 002b:00007fff8f9fc518 EFLAGS: 00000213 ORIG_RAX: 0000000000000141
  [   47.040344] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f04a51364dd
  [   47.041749] RDX: 0000000000000048 RSI: 0000000020002a80 RDI: 000000000000000a
  [   47.043171] RBP: 00007fff8f9fc530 R08: 0000000002049300 R09: 0000000020000100
  [   47.044626] R10: 0000000000000004 R11: 0000000000000213 R12: 0000000000401070
  [   47.046088] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  [   47.047579] Modules linked in:
  [   47.048318] CR2: 0000000000000000
  [   47.049120] ---[ end trace 7ad34443d5be719a ]---
  [   47.050273] RIP: 0010:___bpf_prog_run+0x17b/0x1710
  [   47.051343] Code: 49 03 14 cc e8 76 f6 fe ff e9 ad fe ff ff 0f b6 43 01 48 0f bf 4b 02 48 83 c3 08 89 c2 83 e0 0f c0 ea 04 02
  [   47.054943] RSP: 0018:ffffc900005afc58 EFLAGS: 00010246
  [   47.056068] RAX: 0000000000000000 RBX: ffffc9000023f068 RCX: 0000000000000000
  [   47.057522] RDX: 0000000000000000 RSI: 0000000000000079 RDI: ffffc900005afc98
  [   47.058961] RBP: 0000000000000000 R08: ffffc9000023f048 R09: c0000000ffffdfff
  [   47.060390] R10: 0000000000000001 R11: ffffc900005afb40 R12: ffffc900005afc98
  [   47.061803] R13: 0000000000000001 R14: 0000000000000001 R15: ffffffff825258a8
  [   47.063249] FS:  00007f04a5207580(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000
  [   47.065070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   47.066307] CR2: 0000000000000000 CR3: 0000000100182005 CR4: 0000000000770ee0
  [   47.067747] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [   47.069217] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [   47.070652] PKRU: 55555554
  [   47.071318] Kernel panic - not syncing: Fatal exception
  [   47.072854] Kernel Offset: disabled
  [   47.073683] ---[ end Kernel panic - not syncing: Fatal exception ]---

Fixes: 9216477 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap")
Fixes: fbee97f ("bpf: Add support to attach bpf program to a devmap entry")
Reported-by: Abaci <[email protected]>
Signed-off-by: Xuan Zhuo <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Dust Li <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: David Ahern <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
@yseraf yseraf closed this Aug 9, 2021
@yseraf yseraf deleted the feature/fix-bna branch August 9, 2021 08:48
noglitch pushed a commit that referenced this pull request Mar 14, 2022
[ Upstream commit c1020d3 ]

On an arm64 platform with the Spectrum ASIC, after loading and executing
a new kernel via kexec, the following trace [1] is observed. This seems
to be caused by the fact that the device is not properly shutdown before
executing the new kernel.

Fix this by implementing a shutdown method which mirrors the remove
method, as recommended by the kexec maintainer [2][3].

[1]
BUG: Bad page state in process devlink pfn:22f73d
page:fffffe00089dcf40 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x2ffff00000000000()
raw: 2ffff00000000000 0000000000000000 ffffffff089d0201 0000000000000000
raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount
Modules linked in:
CPU: 1 PID: 16346 Comm: devlink Tainted: G B 5.8.0-rc6-custom-273020-gac6b365b1bf5 #44
Hardware name: Marvell Armada 7040 TX4810M (DT)
Call trace:
 dump_backtrace+0x0/0x1d0
 show_stack+0x1c/0x28
 dump_stack+0xbc/0x118
 bad_page+0xcc/0xf8
 check_free_page_bad+0x80/0x88
 __free_pages_ok+0x3f8/0x418
 __free_pages+0x38/0x60
 kmem_freepages+0x200/0x2a8
 slab_destroy+0x28/0x68
 slabs_destroy+0x60/0x90
 ___cache_free+0x1b4/0x358
 kfree+0xc0/0x1d0
 skb_free_head+0x2c/0x38
 skb_release_data+0x110/0x1a0
 skb_release_all+0x2c/0x38
 consume_skb+0x38/0x130
 __dev_kfree_skb_any+0x44/0x50
 mlxsw_pci_rdq_fini+0x8c/0xb0
 mlxsw_pci_queue_fini.isra.0+0x28/0x58
 mlxsw_pci_queue_group_fini+0x58/0x88
 mlxsw_pci_aqs_fini+0x2c/0x60
 mlxsw_pci_fini+0x34/0x50
 mlxsw_core_bus_device_unregister+0x104/0x1d0
 mlxsw_devlink_core_bus_device_reload_down+0x2c/0x48
 devlink_reload+0x44/0x158
 devlink_nl_cmd_reload+0x270/0x290
 genl_rcv_msg+0x188/0x2f0
 netlink_rcv_skb+0x5c/0x118
 genl_rcv+0x3c/0x50
 netlink_unicast+0x1bc/0x278
 netlink_sendmsg+0x194/0x390
 __sys_sendto+0xe0/0x158
 __arm64_sys_sendto+0x2c/0x38
 el0_svc_common.constprop.0+0x70/0x168
 do_el0_svc+0x28/0x88
 el0_sync_handler+0x88/0x190
 el0_sync+0x140/0x180

[2]
https://www.mail-archive.com/[email protected]/msg1195432.html

[3]
https://patchwork.kernel.org/project/linux-scsi/patch/[email protected]/#20116693

Cc: Eric Biederman <[email protected]>
Signed-off-by: Danielle Ratson <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
@claudiubeznea
Copy link

Please contact the authors of f027418. According to commit tags they are:
Co-developed-by: Scott McNutt [email protected]
Signed-off-by: Scott McNutt [email protected]
Signed-off-by: Robert Hancock [email protected]

This commit is integrated in Linux mainline and we cherry picked it with all the mainline contribution tags and authorship. So, what is not mentioned here? You patch is not integrated here thus it is left behind.

If you have questions about why you were not mentioned please contact the original authors and contributors of f027418.

Note that it is also possible that other people were facing the same issue that you encountered here and published it to mainline (you didn't do this, you only published it here, this is not Linux mainline is vendor tree). I see their solution is more simple and elegant and significant differs (at least in lines of code) from your solution. Anyway, if you have authorship questions please contact them. They are the authors.

And before accusing:
"You visibly don't really understand how OSS work... When you use the work of someone else, you have to mention it otherwise, this is stolen...
My work on this patch:f027418"
please teach your self about how mainline Linux contributions work. The commit you mentioned is a commit integrated in Linux mainline that we cherry picked from mainline with all the mainline contribution. It is the way Linux stable cherry picking works (except probably we didn't mentioned here the original commit ID from Torvalds' tree). Note that patch " net: macb: Fix lost RX packet wakeup race in NAPI receive " has author and the rest of tags as they are in public Linux kernel tree.

Have a good day!

@yseraf
Copy link
Author

yseraf commented Oct 13, 2022

You are right, my apologies. I over reacted that's why I deleted my post.

I think I was more annoyed by atmel/microchip not taking this patch (which makes the ethernet driver just working) into account.
Happy to see that sama5 has finally a working ethernet driver integrated into the mainline.

Regarding source of bug patch proposal, I'm a bit surprised this is not an ascending processus. I though it was more a down to up approach like silicon manufacturers (like mchp) pushing bug fixes to arm mainline pushing then to kernel mainline.
Not sure for the next time how to do it...

cristibirsan pushed a commit that referenced this pull request Jan 19, 2023
[ Upstream commit ea60a4a ]

When the dev init failed, should cleanup the sysfs, otherwise, the
module will never be loaded since can not create duplicate sysfs
directory:

  sysfs: cannot create duplicate filename '/fs/orangefs'

  CPU: 1 PID: 6549 Comm: insmod Tainted: G        W          6.0.0+ #44
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x34/0x44
   sysfs_warn_dup.cold+0x17/0x24
   sysfs_create_dir_ns+0x16d/0x180
   kobject_add_internal+0x156/0x3a0
   kobject_init_and_add+0xcf/0x120
   orangefs_sysfs_init+0x7e/0x3a0 [orangefs]
   orangefs_init+0xfe/0x1000 [orangefs]
   do_one_initcall+0x87/0x2a0
   do_init_module+0xdf/0x320
   load_module+0x2f98/0x3330
   __do_sys_finit_module+0x113/0x1b0
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x46/0xb0

  kobject_add_internal failed for orangefs with -EEXIST, don't try to register things with the same name in the same directory.

Fixes: 2f83ace ("orangefs: put register_chrdev immediately before register_filesystem")
Signed-off-by: Zhang Xiaoxu <[email protected]>
Signed-off-by: Mike Marshall <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
cristibirsan pushed a commit that referenced this pull request Jan 19, 2023
[ Upstream commit ea60a4a ]

When the dev init failed, should cleanup the sysfs, otherwise, the
module will never be loaded since can not create duplicate sysfs
directory:

  sysfs: cannot create duplicate filename '/fs/orangefs'

  CPU: 1 PID: 6549 Comm: insmod Tainted: G        W          6.0.0+ #44
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x34/0x44
   sysfs_warn_dup.cold+0x17/0x24
   sysfs_create_dir_ns+0x16d/0x180
   kobject_add_internal+0x156/0x3a0
   kobject_init_and_add+0xcf/0x120
   orangefs_sysfs_init+0x7e/0x3a0 [orangefs]
   orangefs_init+0xfe/0x1000 [orangefs]
   do_one_initcall+0x87/0x2a0
   do_init_module+0xdf/0x320
   load_module+0x2f98/0x3330
   __do_sys_finit_module+0x113/0x1b0
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x46/0xb0

  kobject_add_internal failed for orangefs with -EEXIST, don't try to register things with the same name in the same directory.

Fixes: 2f83ace ("orangefs: put register_chrdev immediately before register_filesystem")
Signed-off-by: Zhang Xiaoxu <[email protected]>
Signed-off-by: Mike Marshall <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants