Skip to content

Commit 63f6b1e

Browse files
bp3tk0vsmb49
authored andcommitted
x86/mce: Defer processing of early errors
BugLink: https://bugs.launchpad.net/bugs/1946788 [ Upstream commit 3bff147 ] When a fatal machine check results in a system reset, Linux does not clear the error(s) from machine check bank(s) - hardware preserves the machine check banks across a warm reset. During initialization of the kernel after the reboot, Linux reads, logs, and clears all machine check banks. But there is a problem. In: 5de97c9 ("x86/mce: Factor out and deprecate the /dev/mcelog driver") the call to mce_register_decode_chain() moved later in the boot sequence. This means that /dev/mcelog doesn't see those early error logs. This was partially fixed by: cd9c57c ("x86/MCE: Dump MCE to dmesg if no consumers") which made sure that the logs were not lost completely by printing to the console. But parsing console logs is error prone. Users of /dev/mcelog should expect to find any early errors logged to standard places. Add a new flag MCP_QUEUE_LOG to machine_check_poll() to be used in early machine check initialization to indicate that any errors found should just be queued to genpool. When mcheck_late_init() is called it will call mce_schedule_work() to actually log and flush any errors queued in the genpool. [ Based on an original patch, commit message by and completely productized by Tony Luck. ] Fixes: 5de97c9 ("x86/mce: Factor out and deprecate the /dev/mcelog driver") Reported-by: Sumanth Kamatala <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Kelsey Skunberg <[email protected]>
1 parent c0eb7a2 commit 63f6b1e

File tree

2 files changed

+9
-3
lines changed

2 files changed

+9
-3
lines changed

arch/x86/include/asm/mce.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,7 @@ enum mcp_flags {
265265
MCP_TIMESTAMP = BIT(0), /* log time stamp */
266266
MCP_UC = BIT(1), /* log uncorrected errors */
267267
MCP_DONTLOG = BIT(2), /* only clear, don't log */
268+
MCP_QUEUE_LOG = BIT(3), /* only queue to genpool */
268269
};
269270
bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
270271

arch/x86/kernel/cpu/mce/core.c

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -817,7 +817,10 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
817817
if (mca_cfg.dont_log_ce && !mce_usable_address(&m))
818818
goto clear_it;
819819

820-
mce_log(&m);
820+
if (flags & MCP_QUEUE_LOG)
821+
mce_gen_pool_add(&m);
822+
else
823+
mce_log(&m);
821824

822825
clear_it:
823826
/*
@@ -1618,10 +1621,12 @@ static void __mcheck_cpu_init_generic(void)
16181621
m_fl = MCP_DONTLOG;
16191622

16201623
/*
1621-
* Log the machine checks left over from the previous reset.
1624+
* Log the machine checks left over from the previous reset. Log them
1625+
* only, do not start processing them. That will happen in mcheck_late_init()
1626+
* when all consumers have been registered on the notifier chain.
16221627
*/
16231628
bitmap_fill(all_banks, MAX_NR_BANKS);
1624-
machine_check_poll(MCP_UC | m_fl, &all_banks);
1629+
machine_check_poll(MCP_UC | MCP_QUEUE_LOG | m_fl, &all_banks);
16251630

16261631
cr4_set_bits(X86_CR4_MCE);
16271632

0 commit comments

Comments
 (0)