Linux 5.4 at91 rt #45

gclement · 2020-06-12T12:51:44Z

No description provided.

Code relating to the safe context and anything dealing with the previous log buffer implementation is no longer in use. Remove it. Signed-off-by: John Ogness <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

All messages printed via vpritnk_deferred() were being automatically treated as emergency messages. Messages printed via vprintk_deferred() should be set to the default loglevel. LOGLEVEL_SCHED is no longer relevant. Also, enforce the loglevel mask for emergency messages. Signed-off-by: John Ogness <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

This does not work as rtmutex in NMI context. As per John, it is not needed. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

It is no longer provided by the printk core code. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Emergency messages exist as a mechanism for the kernel to communicate critical information to users. It is not meant for use by userspace. Only allow facility=0 messages to be processed by the emergency message code. Signed-off-by: John Ogness <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

SEEK_DATA will seek to the last clear record. If this clear record is no longer in the ring buffer, devkmsg_llseek() will go into an infinite loop. Fix that by resetting the clear sequence if the old clear record is no longer in the ring buffer. Signed-off-by: John Ogness <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

If messages which are injected via kmsg are dropped then they don't need to be printed as warnings. This is to avoid latency spikes if the interface decides to print a lot of important messages. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

The kmsg dumper can be called from any context, but the dumping helpers were using a mutex to synchronize the iterator against concurrent dumps. Rather than trying to synchronize the iterator, use a local copy of the iterator during the dump. Then no synchronization is required. Reported-by: Scott Wood <[email protected]> Signed-off-by: John Ogness <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

… wants has gone When user-space wants to read the first message, that is when user->seq is 0, and that message has gone, it currently automatically resets user->seq to current first seq. This mis-aligns with mainline kernel. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/dev-kmsg#n39 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/printk/printk.c#n899 We should inform user-space that what it wants has gone by returning EPIPE in such scenario. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: He Zhe <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

The syslog and kmsg_dump readers are provided buffers to fill. Both try to maximize the provided buffer usage by calculating the maximum number of messages that can fit. However, if after the calculation, messages are dropped and new messages added, the calculation will no longer match. For syslog, add a check to make sure the provided buffer is not overfilled. For kmsg_dump, start over by recalculating the messages available. Signed-off-by: John Ogness <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Instead of using an emergency loglevel to determine if atomic messages should be printed, use oops_in_progress. This conforms to the decision that latency-causing atomic messages never be generated during normal operation. Signed-off-by: John Ogness <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

The atomic console implementation requires that IER is synchronized between atomic and non-atomic usage. However, it was implemented such that the console_atomic_lock was performed for all IER access, even if that port was not a console. The implementation also used a usage counter to keep track of IER clear/restore windows. However, this is not needed because the console_atomic_lock synchronization of IER access with prevent any situations where IER is prematurely restored or left cleared. Move the IER access functions to inline macros. They will only console_atomic_lock if the port is a console. Remove the restore_ier() function by having clear_ier() return the prior IER value so that the caller can restore it using set_ier(). Rename the IER access functions to match other 8250 wrapper macros. Suggested-by: Dick Hollenbeck <[email protected]> Signed-off-by: John Ogness <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

A few 8250 implementations have their own IER access. If the port is a console, wrap the accesses with console_atomic_lock. Signed-off-by: John Ogness <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

As preparation for replacing the embedded rwsem, give percpu-rwsem its own lockdep_map. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Juri Lelli <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Use bool where possible. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Juri Lelli <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

As preparation to rework __percpu_down_read() move the __this_cpu_inc() into it. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Juri Lelli <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

In preparation for removing the embedded rwsem and building a custom lock, extract the read-trylock primitive. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Juri Lelli <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

The filesystem freezer uses percpu-rwsem in a way that is effectively write_non_owner() and achieves this with a few horrible hacks that rely on the rwsem (!percpu) implementation. When PREEMPT_RT replaces the rwsem implementation with a PI aware variant this comes apart. Remove the embedded rwsem and implement it using a waitqueue and an atomic_t. - make readers_block an atomic, and use it, with the waitqueue for a blocking test-and-set write-side. - have the read-side wait for the 'lock' state to clear. Have the waiters use FIFO queueing and mark them (reader/writer) with a new WQ_FLAG. Use a custom wake_function to wake either a single writer or all readers until a writer. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Juri Lelli <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Now that __percpu_up_read() is only ever used from percpu_up_read() merge them, it's a small function. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

We are missing this annotation in percpu_down_write(). Correct this. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Bit spinlocks are problematic if PREEMPT_RT is enabled, because they disable preemption, which is undesired for latency reasons and breaks when regular spinlocks are taken within the bit_spinlock locked region because regular spinlocks are converted to 'sleeping spinlocks' on RT. So RT replaces the bit spinlocks with regular spinlocks to avoid this problem. Bit spinlocks are also not covered by lock debugging, e.g. lockdep. Substitute the BH_Uptodate_Lock bit spinlock with a regular spinlock. Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> [bigeasy: remove the wrapper and use always spinlock_t and move it into the padding hole] Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

The spinlock pkg_temp_lock has the potential of being taken in atomic context because it can be acquired from the thermal IRQ vector. It's static and limited scope so go ahead and make it a raw spinlock. Signed-off-by: Clark Williams <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Since commit 2887594 ("rcu: Add support for consolidated-RCU reader checking") there is an additional check to ensure that a RCU related lock is held while the RCU list is iterated. This section holds the SRCU reader lock instead. Add annotation to list_for_each_entry_rcu() that pmus_srcu must be acquired during the list traversal. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

kmemleak_lock as a rwlock on RT can possibly be acquired in atomic context which does work on RT. Since the kmemleak operation is performed in atomic context make it a raw_spinlock_t so it can also be acquired on RT. This is used for debugging and is not enabled by default in a production like environment (where performance/latency matters) so it makes sense to make it a raw_spinlock_t instead trying to get rid of the atomic context. Turn also the kmemleak_object->lock into raw_spinlock_t which is acquired (nested) while the kmemleak_lock is held. The time spent in "echo scan > kmemleak" slightly improved on 64core box with this patch applied after boot. Acked-by: Catalin Marinas <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: He Zhe <[email protected]> Signed-off-by: Liu Haitao <[email protected]> Signed-off-by: Yongxin Liu <[email protected]> [bigeasy: Redo the description. Merge the individual bits: He Zhe did the kmemleak_lock, Liu Haitao the ->lock and Yongxin Liu forwarded the patch.] Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Use a typdef for the conditional function instead defining it each time in the function prototype. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

on_each_cpu_cond_mask() allocates a new CPU mask. The newly allocated mask is a subset of the provided mask based on the conditional function. This memory allocation could be avoided by extending smp_call_function_many() with the conditional function and performing the remote function call based on the mask and the conditional function. Rename smp_call_function_many() to smp_call_function_many_cond() and add the smp_cond_func_t argument. If smp_cond_func_t is provided then it is used before invoking the function. Provide smp_call_function_many() with cond_func set to NULL. Let on_each_cpu_cond_mask() use smp_call_function_many_cond(). Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

The allocation mask is no longer used by on_each_cpu_cond() and on_each_cpu_cond_mask() and ca be removed. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

vmw_fifo_ping_host() disables preemption around a test and a register write via vmw_write(). The write function acquires a spinlock_t typed lock which is not allowed in a preempt_disable()ed section on PREEMPT_RT. This has been reported in the bugzilla. It has been explained by Thomas Hellstrom that this preempt_disable()ed section is not required for correctness. Remove the preempt_disable() section. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206591 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

The proc file `compact_unevictable_allowed' should allow 0 and 1 only, the `extra*' attribues have been set properly but without proc_dointvec_minmax() as the `proc_handler' the limit will not be enforced. Use proc_dointvec_minmax() as the `proc_handler' to enfoce the valid specified range. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating the vgic and timer states to prevent the calling task from migrating to another CPU. It does so to prevent the task from writing to the incorrect per-CPU GIC distributor registers. On -rt kernels, it's possible to maintain the same guarantee with the use of migrate_{disable,enable}(), with the added benefit that the migrate-disabled region is preemptible. Update kvm_arch_vcpu_ioctl_run() to do so. Cc: Christoffer Dall <[email protected]> Reported-by: Manish Jaggi <[email protected]> Signed-off-by: Josh Cartwright <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

fpsimd_flush_thread() invokes kfree() via sve_free() within a preempt disabled section which is not working on -RT. Delay freeing of memory until preemption is enabled again. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Currently the driver will disable the clock and enable it one line later if it is switching from periodic mode into one shot. This can be avoided and causes a needless warning on -RT. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

As default the TCLIB uses the 32KiHz base clock rate for clock events. Add a compile time selection to allow higher clock resulution. (fixed up by Sami Pietikäinen <[email protected]>) Signed-off-by: Benedikt Spranger <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

The locallock protects the per-CPU variable tce_page. The function attempts to allocate memory while tce_page is protected (by disabling interrupts). Use local_irq_save() instead of local_irq_disable(). Cc: [email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

While converting the openpic emulation code to use a raw_spinlock_t enables guests to run on RT, there's still a performance issue. For interrupts sent in directed delivery mode with a multiple CPU mask, the emulated openpic will loop through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop through all the pending interrupts for that VCPU. This is done while holding the raw_lock, meaning that in all this time the interrupts and preemption are disabled on the host Linux. A malicious user app can max both these number and cause a DoS. This temporary fix is sent for two reasons. First is so that users who want to use the in-kernel MPIC emulation are aware of the potential latencies, thus making sure that the hardware MPIC and their usage scenario does not involve interrupts sent in directed delivery mode, and the number of possible pending interrupts is kept small. Secondly, this should incentivize the development of a proper openpic emulation that would be better suited for RT. Acked-by: Scott Wood <[email protected]> Signed-off-by: Bogdan Purcareata <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

The current highmem handling on -RT is not compatible and needs fixups. Signed-off-by: Thomas Gleixner <[email protected]>

This is invoked from the secondary CPU in atomic context. On x86 we use tsc instead. On Power we XOR it against mftb() so lets use stack address as the initial value. Cc: [email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

The current highmem handling on -RT is not compatible and needs fixups. Signed-off-by: Thomas Gleixner <[email protected]>

|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931 |in_atomic(): 1, irqs_disabled(): 0, pid: 31807, name: sleep |Preemption disabled at:[<ffffffff8148019b>] proc_exit_connector+0xbb/0x140 | |CPU: 4 PID: 31807 Comm: sleep Tainted: G W E 4.8.0-rt11-rt #106 |Call Trace: | [<ffffffff813436cd>] dump_stack+0x65/0x88 | [<ffffffff8109c425>] ___might_sleep+0xf5/0x180 | [<ffffffff816406b0>] __rt_spin_lock+0x20/0x50 | [<ffffffff81640978>] rt_read_lock+0x28/0x30 | [<ffffffff8156e209>] netlink_broadcast_filtered+0x49/0x3f0 | [<ffffffff81522621>] ? __kmalloc_reserve.isra.33+0x31/0x90 | [<ffffffff8156e5cd>] netlink_broadcast+0x1d/0x20 | [<ffffffff8147f57a>] cn_netlink_send_mult+0x19a/0x1f0 | [<ffffffff8147f5eb>] cn_netlink_send+0x1b/0x20 | [<ffffffff814801d8>] proc_exit_connector+0xf8/0x140 | [<ffffffff81077f71>] do_exit+0x5d1/0xba0 | [<ffffffff810785cc>] do_group_exit+0x4c/0xc0 | [<ffffffff81078654>] SyS_exit_group+0x14/0x20 | [<ffffffff81640a72>] entry_SYSCALL_64_fastpath+0x1a/0xa4 Since ab8ed95 ("connector: fix out-of-order cn_proc netlink message delivery") which is v4.7-rc6. Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

They're nondeterministic, and lead to ___might_sleep() splats in -rt. OTOH, they're a lot less wasteful than an rtmutex per page. Signed-off-by: Mike Galbraith <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

In v4.7, the driver switched to percpu compression streams, disabling preemption via get/put_cpu_ptr(). Use a per-zcomp_strm lock here. We also have to fix an lock order issue in zram_decompress_page() such that zs_map_object() nests inside of zcomp_stream_put() as it does in zram_bvec_write(). Signed-off-by: Mike Galbraith <[email protected]> [bigeasy: get_locked_var() -> per zcomp_strm lock] Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Currently, the squashfs multi_cpu decompressor makes use of get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption during decompression. Because the workload is distributed across CPUs, all CPUs can observe a very high wakeup latency, which has been seen to be as much as 8000us. Convert this decompressor to make use of a local lock, which will allow execution of the decompressor with preemption-enabled, but also ensure concurrent accesses to the percpu compressor data on the local CPU will be serialized. Cc: [email protected] Reported-by: Alexander Stein <[email protected]> Tested-by: Alexander Stein <[email protected]> Signed-off-by: Julia Cartwright <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

ioread8() operations to TPM MMIO addresses can stall the cpu when immediately following a sequence of iowrite*()'s to the same region. For example, cyclitest measures ~400us latency spikes when a non-RT usermode application communicates with an SPI-based TPM chip (Intel Atom E3940 system, PREEMPT_RT kernel). The spikes are caused by a stalling ioread8() operation following a sequence of 30+ iowrite8()s to the same address. I believe this happens because the write sequence is buffered (in cpu or somewhere along the bus), and gets flushed on the first LOAD instruction (ioread*()) that follows. The enclosed change appears to fix this issue: read the TPM chip's access register (status code) after every iowrite*() operation to amortize the cost of flushing data to chip across multiple instructions. Signed-off-by: Haris Okanovic <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

To avoid allocation allow rt tasks to cache one sigqueue struct in task struct. Signed-off-by: Thomas Gleixner <[email protected]>

Creates long latencies for no value Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>

Add a /sys/kernel entry to indicate that the kernel is a realtime kernel. Clark says that he needs this for udev rules, udev needs to evaluate if its a PREEMPT_RT kernel a few thousand times and parsing uname output is too slow or so. Are there better solutions? Should it exist and return 0 on !-rt? Signed-off-by: Clark Williams <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]>

Signed-off-by: Thomas Gleixner <[email protected]>

noglitch · 2020-06-12T14:38:42Z

Now deployed as linux-5.4-at91-rt branch.
Thanks.

[ Upstream commit 347fb0c ] While mounting a crafted image provided by user, kernel panics due to the invalid chunk item whose end is less than start. [66.387422] loop: module loaded [66.389773] loop0: detected capacity change from 262144 to 0 [66.427708] BTRFS: device fsid a62e00e8-e94e-4200-8217-12444de93c2e devid 1 transid 12 /dev/loop0 scanned by mount (613) [66.431061] BTRFS info (device loop0): disk space caching is enabled [66.431078] BTRFS info (device loop0): has skinny extents [66.437101] BTRFS error: insert state: end < start 29360127 37748736 [66.437136] ------------[ cut here ]------------ [66.437140] WARNING: CPU: 16 PID: 613 at fs/btrfs/extent_io.c:557 insert_state.cold+0x1a/0x46 [btrfs] [66.437369] CPU: 16 PID: 613 Comm: mount Tainted: G O 5.11.0-rc1-custom #45 [66.437374] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 [66.437378] RIP: 0010:insert_state.cold+0x1a/0x46 [btrfs] [66.437420] RSP: 0018:ffff93e5414c3908 EFLAGS: 00010286 [66.437427] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000 [66.437431] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff [66.437434] RBP: ffff93e5414c3938 R08: 0000000000000001 R09: 0000000000000001 [66.437438] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d72aa0 [66.437441] R13: ffff8ec78bc71628 R14: 0000000000000000 R15: 0000000002400000 [66.437447] FS: 00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000 [66.437451] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [66.437455] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0 [66.437460] PKRU: 55555554 [66.437464] Call Trace: [66.437475] set_extent_bit+0x652/0x740 [btrfs] [66.437539] set_extent_bits_nowait+0x1d/0x20 [btrfs] [66.437576] add_extent_mapping+0x1e0/0x2f0 [btrfs] [66.437621] read_one_chunk+0x33c/0x420 [btrfs] [66.437674] btrfs_read_chunk_tree+0x6a4/0x870 [btrfs] [66.437708] ? kvm_sched_clock_read+0x18/0x40 [66.437739] open_ctree+0xb32/0x1734 [btrfs] [66.437781] ? bdi_register_va+0x1b/0x20 [66.437788] ? super_setup_bdi_name+0x79/0xd0 [66.437810] btrfs_mount_root.cold+0x12/0xeb [btrfs] [66.437854] ? __kmalloc_track_caller+0x217/0x3b0 [66.437873] legacy_get_tree+0x34/0x60 [66.437880] vfs_get_tree+0x2d/0xc0 [66.437888] vfs_kern_mount.part.0+0x78/0xc0 [66.437897] vfs_kern_mount+0x13/0x20 [66.437902] btrfs_mount+0x11f/0x3c0 [btrfs] [66.437940] ? kfree+0x5ff/0x670 [66.437944] ? __kmalloc_track_caller+0x217/0x3b0 [66.437962] legacy_get_tree+0x34/0x60 [66.437974] vfs_get_tree+0x2d/0xc0 [66.437983] path_mount+0x48c/0xd30 [66.437998] __x64_sys_mount+0x108/0x140 [66.438011] do_syscall_64+0x38/0x50 [66.438018] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [66.438023] RIP: 0033:0x7f0138827f6e [66.438033] RSP: 002b:00007ffecd79edf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 [66.438040] RAX: ffffffffffffffda RBX: 00007f013894c264 RCX: 00007f0138827f6e [66.438044] RDX: 00005593a4a41360 RSI: 00005593a4a33690 RDI: 00005593a4a3a6c0 [66.438047] RBP: 00005593a4a33440 R08: 0000000000000000 R09: 0000000000000001 [66.438050] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [66.438054] R13: 00005593a4a3a6c0 R14: 00005593a4a41360 R15: 00005593a4a33440 [66.438078] irq event stamp: 18169 [66.438082] hardirqs last enabled at (18175): [<ffffffffb81154bf>] console_unlock+0x4ff/0x5f0 [66.438088] hardirqs last disabled at (18180): [<ffffffffb8115427>] console_unlock+0x467/0x5f0 [66.438092] softirqs last enabled at (16910): [<ffffffffb8a00fe2>] asm_call_irq_on_stack+0x12/0x20 [66.438097] softirqs last disabled at (16905): [<ffffffffb8a00fe2>] asm_call_irq_on_stack+0x12/0x20 [66.438103] ---[ end trace e114b111db64298b ]--- [66.438107] BTRFS error: found node 12582912 29360127 on insert of 37748736 29360127 [66.438127] BTRFS critical: panic in extent_io_tree_panic:679: locking error: extent tree was modified by another thread while locked (errno=-17 Object already exists) [66.441069] ------------[ cut here ]------------ [66.441072] kernel BUG at fs/btrfs/extent_io.c:679! [66.442064] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [66.443018] CPU: 16 PID: 613 Comm: mount Tainted: G W O 5.11.0-rc1-custom #45 [66.444538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 [66.446223] RIP: 0010:extent_io_tree_panic.isra.0+0x23/0x25 [btrfs] [66.450878] RSP: 0018:ffff93e5414c3948 EFLAGS: 00010246 [66.451840] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000 [66.453141] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff [66.454445] RBP: ffff93e5414c3948 R08: 0000000000000001 R09: 0000000000000001 [66.455743] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d728c0 [66.457055] R13: ffff8ec78bc71628 R14: ffff8ec782d72aa0 R15: 0000000002400000 [66.458356] FS: 00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000 [66.459841] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [66.460895] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0 [66.462196] PKRU: 55555554 [66.462692] Call Trace: [66.463139] set_extent_bit.cold+0x30/0x98 [btrfs] [66.464049] set_extent_bits_nowait+0x1d/0x20 [btrfs] [66.490466] add_extent_mapping+0x1e0/0x2f0 [btrfs] [66.514097] read_one_chunk+0x33c/0x420 [btrfs] [66.534976] btrfs_read_chunk_tree+0x6a4/0x870 [btrfs] [66.555718] ? kvm_sched_clock_read+0x18/0x40 [66.575758] open_ctree+0xb32/0x1734 [btrfs] [66.595272] ? bdi_register_va+0x1b/0x20 [66.614638] ? super_setup_bdi_name+0x79/0xd0 [66.633809] btrfs_mount_root.cold+0x12/0xeb [btrfs] [66.652938] ? __kmalloc_track_caller+0x217/0x3b0 [66.671925] legacy_get_tree+0x34/0x60 [66.690300] vfs_get_tree+0x2d/0xc0 [66.708221] vfs_kern_mount.part.0+0x78/0xc0 [66.725808] vfs_kern_mount+0x13/0x20 [66.742730] btrfs_mount+0x11f/0x3c0 [btrfs] [66.759350] ? kfree+0x5ff/0x670 [66.775441] ? __kmalloc_track_caller+0x217/0x3b0 [66.791750] legacy_get_tree+0x34/0x60 [66.807494] vfs_get_tree+0x2d/0xc0 [66.823349] path_mount+0x48c/0xd30 [66.838753] __x64_sys_mount+0x108/0x140 [66.854412] do_syscall_64+0x38/0x50 [66.869673] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [66.885093] RIP: 0033:0x7f0138827f6e [66.945613] RSP: 002b:00007ffecd79edf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 [66.977214] RAX: ffffffffffffffda RBX: 00007f013894c264 RCX: 00007f0138827f6e [66.994266] RDX: 00005593a4a41360 RSI: 00005593a4a33690 RDI: 00005593a4a3a6c0 [67.011544] RBP: 00005593a4a33440 R08: 0000000000000000 R09: 0000000000000001 [67.028836] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [67.045812] R13: 00005593a4a3a6c0 R14: 00005593a4a41360 R15: 00005593a4a33440 [67.216138] ---[ end trace e114b111db64298c ]--- [67.237089] RIP: 0010:extent_io_tree_panic.isra.0+0x23/0x25 [btrfs] [67.325317] RSP: 0018:ffff93e5414c3948 EFLAGS: 00010246 [67.347946] RAX: 0000000000000000 RBX: 0000000001bfffff RCX: 0000000000000000 [67.371343] RDX: 0000000000000000 RSI: ffffffffb90d4660 RDI: 00000000ffffffff [67.394757] RBP: ffff93e5414c3948 R08: 0000000000000001 R09: 0000000000000001 [67.418409] R10: ffff93e5414c3658 R11: 0000000000000000 R12: ffff8ec782d728c0 [67.441906] R13: ffff8ec78bc71628 R14: ffff8ec782d72aa0 R15: 0000000002400000 [67.465436] FS: 00007f01386a8580(0000) GS:ffff8ec809000000(0000) knlGS:0000000000000000 [67.511660] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [67.535047] CR2: 00007f01382fa000 CR3: 0000000109a34000 CR4: 0000000000750ee0 [67.558449] PKRU: 55555554 [67.581146] note: mount[613] exited with preempt_count 2 The image has a chunk item which has a logical start 37748736 and length 18446744073701163008 (-8M). The calculated end 29360127 overflows. EEXIST was caught by insert_state() because of the duplicate end and extent_io_tree_panic() was called. Add overflow check of chunk item end to tree checker so it can be detected early at mount time. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208929 CC: [email protected] # 4.19+ Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Su Yue <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit f6044cc upstream. When preparing an AER-CTR request, the driver copies the key provided by the user into a data structure that is accessible by the firmware. If the target device is QAT GEN4, the key size is rounded up by 16 since a rounded up size is expected by the device. If the key size is rounded up before the copy, the size used for copying the key might be bigger than the size of the region containing the key, causing an out-of-bounds read. Fix by doing the copy first and then update the keylen. This is to fix the following warning reported by KASAN: [ 138.150574] BUG: KASAN: global-out-of-bounds in qat_alg_skcipher_init_com.isra.0+0x197/0x250 [intel_qat] [ 138.150641] Read of size 32 at addr ffffffff88c402c0 by task cryptomgr_test/2340 [ 138.150651] CPU: 15 PID: 2340 Comm: cryptomgr_test Not tainted 6.2.0-rc1+ #45 [ 138.150659] Hardware name: Intel Corporation ArcherCity/ArcherCity, BIOS EGSDCRB1.86B.0087.D13.2208261706 08/26/2022 [ 138.150663] Call Trace: [ 138.150668] <TASK> [ 138.150922] kasan_check_range+0x13a/0x1c0 [ 138.150931] memcpy+0x1f/0x60 [ 138.150940] qat_alg_skcipher_init_com.isra.0+0x197/0x250 [intel_qat] [ 138.151006] qat_alg_skcipher_init_sessions+0xc1/0x240 [intel_qat] [ 138.151073] crypto_skcipher_setkey+0x82/0x160 [ 138.151085] ? prepare_keybuf+0xa2/0xd0 [ 138.151095] test_skcipher_vec_cfg+0x2b8/0x800 Fixes: 67916c9 ("crypto: qat - add AES-CTR support for QAT GEN4 devices") Cc: <[email protected]> Reported-by: Vladis Dronov <[email protected]> Signed-off-by: Giovanni Cabiddu <[email protected]> Reviewed-by: Fiona Trahe <[email protected]> Reviewed-by: Vladis Dronov <[email protected]> Tested-by: Vladis Dronov <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

wiphy_unregister()/wiphy_free() has been recently decoupled from wilc_netdev_cleanup() to fix a faulty error path in sdio/spi probe functions. However this change introduced a new failure when simply loading then unloading the driver: $ modprobe wilc1000-sdio; modprobe -r wilc1000-sdio WARNING: CPU: 0 PID: 115 at net/wireless/core.c:1145 wiphy_unregister+0x904/0xc40 [cfg80211] Modules linked in: wilc1000_sdio(-) wilc1000 cfg80211 bluetooth ecdh_generic ecc CPU: 0 UID: 0 PID: 115 Comm: modprobe Not tainted 6.13.0-rc6+ #45 Hardware name: Atmel SAMA5 Call trace: unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x44/0x70 dump_stack_lvl from __warn+0x118/0x27c __warn from warn_slowpath_fmt+0xcc/0x140 warn_slowpath_fmt from wiphy_unregister+0x904/0xc40 [cfg80211] wiphy_unregister [cfg80211] from wilc_sdio_remove+0xb0/0x15c [wilc1000_sdio] wilc_sdio_remove [wilc1000_sdio] from sdio_bus_remove+0x104/0x3f0 sdio_bus_remove from device_release_driver_internal+0x424/0x5dc device_release_driver_internal from driver_detach+0x120/0x224 driver_detach from bus_remove_driver+0x17c/0x314 bus_remove_driver from sys_delete_module+0x310/0x46c sys_delete_module from ret_fast_syscall+0x0/0x1c Exception stack(0xd0acbfa8 to 0xd0acbff0) bfa0: 0044b210 0044b210 0044b24c 00000800 00000000 00000000 bfc0: 0044b210 0044b210 00000000 00000081 00000000 0044b210 00000000 00000000 bfe0: 00448e24 b6af99c4 0043ea0d aea2e12c irq event stamp: 0 hardirqs last enabled at (0): [<00000000>] 0x0 hardirqs last disabled at (0): [<c01588f0>] copy_process+0x1c4c/0x7bec softirqs last enabled at (0): [<c0158944>] copy_process+0x1ca0/0x7bec softirqs last disabled at (0): [<00000000>] 0x0 The warning is triggered by the fact that there is still a wireless_device linked to the wiphy we are unregistering, due to wiphy_unregister() now being called after net device unregister (performed in wilc_netdev_cleanup()). Fix this warning by moving wiphy_unregister() after wilc_netdev_cleanup() is nominal paths (ie: driver removal). wilc_netdev_cleanup() ordering is left untouched in error paths in probe function because net device is not registered in those paths (so the warning can not trigger), yet the wiphy can still be registered, and we still some cleanup steps from wilc_netdev_cleanup(). Fixes: 1be9449 ("wifi: wilc1000: unregister wiphy only if it has been registered") Signed-off-by: Alexis Lothoré <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://patch.msgid.link/[email protected]

commit d527f51 upstream. There is a UAF when xfstests on cifs: BUG: KASAN: use-after-free in smb2_is_network_name_deleted+0x27/0x160 Read of size 4 at addr ffff88810103fc08 by task cifsd/923 CPU: 1 PID: 923 Comm: cifsd Not tainted 6.1.0-rc4+ linux4sam#45 ... Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report+0x171/0x472 kasan_report+0xad/0x130 kasan_check_range+0x145/0x1a0 smb2_is_network_name_deleted+0x27/0x160 cifs_demultiplex_thread.cold+0x172/0x5a4 kthread+0x165/0x1a0 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 923: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_slab_alloc+0x54/0x60 kmem_cache_alloc+0x147/0x320 mempool_alloc+0xe1/0x260 cifs_small_buf_get+0x24/0x60 allocate_buffers+0xa1/0x1c0 cifs_demultiplex_thread+0x199/0x10d0 kthread+0x165/0x1a0 ret_from_fork+0x1f/0x30 Freed by task 921: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x143/0x1b0 kmem_cache_free+0xe3/0x4d0 cifs_small_buf_release+0x29/0x90 SMB2_negotiate+0x8b7/0x1c60 smb2_negotiate+0x51/0x70 cifs_negotiate_protocol+0xf0/0x160 cifs_get_smb_ses+0x5fa/0x13c0 mount_get_conns+0x7a/0x750 cifs_mount+0x103/0xd00 cifs_smb3_do_mount+0x1dd/0xcb0 smb3_get_tree+0x1d5/0x300 vfs_get_tree+0x41/0xf0 path_mount+0x9b3/0xdd0 __x64_sys_mount+0x190/0x1d0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The UAF is because: mount(pid: 921) | cifsd(pid: 923) -------------------------------|------------------------------- | cifs_demultiplex_thread SMB2_negotiate | cifs_send_recv | compound_send_recv | smb_send_rqst | wait_for_response | wait_event_state [1] | | standard_receive3 | cifs_handle_standard | handle_mid | mid->resp_buf = buf; [2] | dequeue_mid [3] KILL the process [4] | resp_iov[i].iov_base = buf | free_rsp_buf [5] | | is_network_name_deleted [6] | callback 1. After send request to server, wait the response until mid->mid_state != SUBMITTED; 2. Receive response from server, and set it to mid; 3. Set the mid state to RECEIVED; 4. Kill the process, the mid state already RECEIVED, get 0; 5. Handle and release the negotiate response; 6. UAF. It can be easily reproduce with add some delay in [3] - [6]. Only sync call has the problem since async call's callback is executed in cifsd process. Add an extra state to mark the mid state to READY before wakeup the waitter, then it can get the resp safely. Fixes: ec637e3 ("[CIFS] Avoid extra large buffer allocation (and memcpy) in cifs_readpages") Reviewed-by: Paulo Alcantara (SUSE) <[email protected]> Signed-off-by: Zhang Xiaoxu <[email protected]> Signed-off-by: Steve French <[email protected]> [fs/cifs was moved to fs/smb/client since 38c8a9a ("smb: move client and server files to common directory fs/smb"). We apply the patch to fs/cifs with some minor context changes.] Signed-off-by: He Zhe <[email protected]> Signed-off-by: Xiangyu Chen <[email protected]> [ chanho: Backported to v5.4.y ] Signed-off-by: Chanho Min <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

jogness and others added 30 commits May 31, 2020 17:59

printk: remove unused code

763c586

Code relating to the safe context and anything dealing with the previous log buffer implementation is no longer in use. Remove it. Signed-off-by: John Ogness <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

serial: 8250: remove that trylock in serial8250_console_write_atomic()

d675b2e

This does not work as rtmutex in NMI context. As per John, it is not needed. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

serial: 8250: export symbols which are used by symbols

f4a967a

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

arm: remove printk_nmi_.*()

eb687a8

It is no longer provided by the printk core code. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

serial: 8250: fsl/ingenic/mtk: fix atomic console

da8443e

A few 8250 implementations have their own IER access. If the port is a console, wrap the accesses with console_atomic_lock. Signed-off-by: John Ogness <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

locking/percpu-rwsem: Convert to bool

01f78ba

Use bool where possible. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Juri Lelli <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

locking/percpu-rwsem: Fold __percpu_up_read()

dc80b0e

Now that __percpu_up_read() is only ever used from percpu_up_read() merge them, it's a small function. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

smp: Use smp_cond_func_t as type for the conditional function

3ecf743

Use a typdef for the conditional function instead defining it each time in the function prototype. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

smp: Remove allocation mask from on_each_cpu_cond.*()

fff0efe

The allocation mask is no longer used by on_each_cpu_cond() and on_each_cpu_cond_mask() and ca be removed. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

Josh Cartwright and others added 23 commits May 31, 2020 18:02

x86: Enable RT also on 32bit

9e2914b

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

ARM: Allow to enable RT

ec5d838

Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

ARM64: Allow to enable RT

dac57d9

Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

powerpc: Disable highmem on RT

e1b059f

The current highmem handling on -RT is not compatible and needs fixups. Signed-off-by: Thomas Gleixner <[email protected]>

POWERPC: Allow to enable RT

de8c099

Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior <[email protected]>

mips: Disable highmem on RT

0aed31d

The current highmem handling on -RT is not compatible and needs fixups. Signed-off-by: Thomas Gleixner <[email protected]>

signals: Allow rt tasks to cache one sigqueue struct

bfbbdee

To avoid allocation allow rt tasks to cache one sigqueue struct in task struct. Signed-off-by: Thomas Gleixner <[email protected]>

genirq: Disable irqpoll on -rt

1b78ffd

Creates long latencies for no value Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>

Add localversion for -RT release

5d4cd1b

Signed-off-by: Thomas Gleixner <[email protected]>

Linux 5.4.40-rt24 REBASE

b215edc

noglitch self-assigned this Jun 12, 2020

noglitch closed this Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Linux 5.4 at91 rt #45

Linux 5.4 at91 rt #45

Uh oh!

gclement commented Jun 12, 2020

Uh oh!

noglitch commented Jun 12, 2020

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

19 participants

Linux 5.4 at91 rt #45

Linux 5.4 at91 rt #45

Uh oh!

Conversation

gclement commented Jun 12, 2020

Uh oh!

noglitch commented Jun 12, 2020

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

19 participants