ot_flash: Many miscellaneous fixes and CTRL_REGWEN implementation #173

AlexJones0 · 2025-09-04T10:19:30Z

This PR is the first of a series of 3 PRs to improve the OpenTitan Flash Controller emulation and make it more feature complete, with a focus on the features that are needed to support keymgr integration.

This first PR primarily focuses on a making several small fixes and improvements to the existing ot_flash code - you can see much more detail about each change in the commit messages. Aside from this, it also makes a small addition with the implementation of the CTRL_REGWEN register, which hardware uses to disable the CONTROL register while there is an ongoing flash operation.

The connection of recoverable alerts on errors should also fix two OpenTitan flash_ctrl test targets that began failing after recent test changes that expect that certain alerts are fired by the device during the test.

This would just trace an "unsupported" flash error, but still isn't correct and should be avoided. Signed-off-by: Alex Jones <[email protected]>

Make some more readable helper functions for checking if the flash controller is in an operation, or whether it is in an ongoing operation that has been split across multiple executions (e.g. read size too big for FIFO). Signed-off-by: Alex Jones <[email protected]>

Both the op_read and op_prog implementations were wrong in different ways when the fifos are held in reset: - For read operations, the data is still read, it is just not pushed to the fifo. Where before the read would resume after the fifo exits reset, it now continues to finish the read and immediately clears. - For program operations, there was the possibility of an infinite loop as the fifos could never exit reset, but this would be caught by the initial assertion that the fifo is non-empty. As such, this is just a redundant check and so is removed. Signed-off-by: Alex Jones <[email protected]>

Signed-off-by: Alex Jones <[email protected]>

Fix a bug making CTRL_REGWEN writable, presumably due to bundling it with other REGWEN registers. The `CTRL_REGWEN` is only written by hardware, locking the `CONTROL` register during a flash operation. Signed-off-by: Alex Jones <[email protected]>

Signed-off-by: Alex Jones <[email protected]>

Based on the data page configuration behaviour and the info page cfg docs, I had previously assumed that `en=False` for an info page config meant that no protection rules apply (and thus no errors). This is not the case. From the RTL, the correct behaviour is instead to forbid any access (like having all fields as false). Signed-off-by: Alex Jones <[email protected]>

This commit actually implements the functionality of the CTRL_REGWEN register, which is set exclusively by hardware to indicate when an operation is progress, to lock software from writing to the `CONTROL` register in an attempt to potentially start a controller operation. Signed-off-by: Alex Jones <[email protected]>

Likely when this was first created it was intended to be used to signal complete ops with different delays, but currently we use a different `ot_flash_op_execute` and `ot_flash_op_complete` interface that performs all operations synchronously where possible. Since the `INIT` pseudo-operation / command is the only thing that uses this, refactor the function to be a callback for `init_complete` after some initialisation delay. Also add a note about refactoring the misleading `OtFlashOperation` enum. Signed-off-by: Alex Jones <[email protected]>

hw/opentitan/ot_flash.c

rivos-eblot · 2025-09-04T14:37:00Z

Note: I may have missed this point a long time ago, but the "synchronous" implementation (as in that performs
all operations synchronously where possible.) rings some bells:

QEMU I/O thread is shared by all devices, all vCPUs and is unique. IOW, if CONTROL.num is large, this may be a real issue, as the flash driver needs needs to yield, it won't be interrupted and may cause trouble for concurrent driver racing for the I/O thread. It needs to be evaluated (how long does this take to actually complete) but it might require to use an asynchronous implementation. TBC.

AlexJones0 · 2025-09-04T15:09:25Z

QEMU I/O thread is shared by all devices, all vCPUs and is unique. IOW, if CONTROL.num is large, this may be a real issue, as the flash driver needs needs to yield, it won't be interrupted and may cause trouble for concurrent driver racing for the I/O thread. It needs to be evaluated (how long does this take to actually complete) but it might require to use an asynchronous implementation. TBC.

I see, I hadn't considered that. You can only synchronously read or program up to the read/program FIFO size at once (16 words), so even though there is some logic to check permissions and update the backend I think this is still probably okay. If CONTROL.num is larger than 16, then the FIFO limits restrict how much data can be moved in a synchronous operation at once. The operation is then split into multiple synchronous operations based around the FIFO register writes/reads.

And also, with this PR the program FIFO cannot be filled before starting an operation, and so technically the logic is never programming more than one word at a time. It can still read 16 words at once, but I would expect writing to the backend to be more expensive than just reading?

I think the most expensive operation is actually therefore the bank erase, which may indeed take a while to zero out all of the backend data (or maybe it's essentially free, I'm not sure how the backend is implemented)? Perhaps if that does prove an issue we could look into making that specific operation asynchronous, in the same way that the initialize command is already handled?

When a software controller-issued operation causes an error (e.g. read error, program error, memory protection error, program resolution error, program type error, etc.) which is recoverable, it should also send a `recov_err` alert. Adding this allows the Earlgrey 1.0.0 flash_ctrl_mem_protection_test and flash_ctrl_test to pass again, as they now check for specific alerts. Also take the opportunity to improve the alert logic, modelling the ALERT_TEST register as an interface for transient alerts and adding an `alert_bm` for persistent latched alerts as is done in e.g. the OTP. This will be important for unrecoverable faults that can be caused by hardware operation errors later, which can only be disabled via resets. Signed-off-by: Alex Jones <[email protected]>

With the fixes that introduce recoverable alerts on SW operation errors, these two tests that started failing after having the alert catching mechanism (which expects certain alerts to appear in these tests due to bad operations) should pass again. Signed-off-by: Alex Jones <[email protected]>

rivos-eblot · 2025-09-04T15:50:24Z

I think the most expensive operation is actually therefore the bank erase, which may indeed take a while to zero out all of the backend data (or maybe it's essentially free, I'm not sure how the backend is implemented)? Perhaps if that does prove an issue we could look into making that specific operation asynchronous, in the same way that the initialize command is already handled?

Yeah I'm not saying it is an actual issue, just need to run a couple of runs and see how long it takes to complete in the worse case (using a host timer for example). Can be tracked as a dedicated issue, it should not prevent from delivering this work.

AlexJones0 added 9 commits September 4, 2025 11:15

[ot] hw/opentitan: ot_flash: Avoid incorrect execution on fifo reg R/W

9fdf464

This would just trace an "unsupported" flash error, but still isn't correct and should be avoided. Signed-off-by: Alex Jones <[email protected]>

[ot] hw/opentitan: ot_flash: Lock PROG_FIFO writes before operations

417677c

Signed-off-by: Alex Jones <[email protected]>

[ot] hw/opentitan: ot_flash: Correct phy_status.init_wip during init

0ec47f2

Signed-off-by: Alex Jones <[email protected]>

AlexJones0 requested review from jwnrt, ziuziakowska and rivos-eblot September 4, 2025 10:19

ziuziakowska approved these changes Sep 4, 2025

View reviewed changes

hw/opentitan/ot_flash.c Outdated Show resolved Hide resolved

rivos-eblot approved these changes Sep 4, 2025

View reviewed changes

AlexJones0 added 2 commits September 4, 2025 16:19

AlexJones0 force-pushed the eg-flash-fixes branch from ed5ccd7 to d54308c Compare September 4, 2025 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ot_flash: Many miscellaneous fixes and CTRL_REGWEN implementation #173

ot_flash: Many miscellaneous fixes and CTRL_REGWEN implementation #173

Uh oh!

AlexJones0 commented Sep 4, 2025

Uh oh!

Uh oh!

rivos-eblot commented Sep 4, 2025 •

edited

Loading

Uh oh!

AlexJones0 commented Sep 4, 2025 •

edited

Loading

Uh oh!

rivos-eblot commented Sep 4, 2025

Uh oh!

Uh oh!

ot_flash: Many miscellaneous fixes and CTRL_REGWEN implementation #173

Are you sure you want to change the base?

ot_flash: Many miscellaneous fixes and CTRL_REGWEN implementation #173

Uh oh!

Conversation

AlexJones0 commented Sep 4, 2025

Uh oh!

Uh oh!

rivos-eblot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexJones0 commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rivos-eblot commented Sep 4, 2025

Uh oh!

Uh oh!

rivos-eblot commented Sep 4, 2025 •

edited

Loading

AlexJones0 commented Sep 4, 2025 •

edited

Loading