Specification for Transactional Storage as Default in FRAME

This issue will outline how I believe we should make transactional storage production usable within FRAME, and even a default part of all FRAME extrinsics.

## Steps

- [x] Add Storage Layer Limit https://github.com/paritytech/substrate/pull/10808
- [x] Make spawning more layers explicit, and implicitly keep only 1 layer if it already exists
- [ ] Name change

## Background

At a very high level, storage in the runtime has two main abstractions:

1. An underlying DB, which is normally written to a hard-disk.
2. An in-memory overlay which is used to keep track of read/written keys until the end of a block where those values are then committed to the DB.

When writing function within the runtime, making modifications to storage will affect the values in the in-memory overlay. These changes are pooled together with any other changes that happened in previous and future transactions during the block building process, and thus if storage is modified during an extrinsic, it is not possible to undo those specific changes.

This has lead to a critical "best practice" when doing runtime development: "verify first, write last", which says that you should never return an `Err` from an extrinsic after you have modified storage, as those changes will persist, and you may end up with an inconsistent state.

An easy example of this can be seen by executing the following:

```rust
#[test]
fn storage_modified_then_error() {
	fn set_value_then_error(v: u32) -> DispatchResult {
		Value::set(v);
		Err("never do this")?;
		Ok(())
	}

	TestExternalities::default().execute_with(|| {
		// `assert_noop` guarantees that the state root is unchanged after some function is called
		// and an error is returned.
		// This assertion will fail, because storage was already modified before the error was
		// returned.
		assert_noop!(set_value_then_error(3), "never do this");
	});
}
```

## What is `transactional`?

Transactional is a feature that was implemented specifically to address this problem. The original PR can be found here: https://github.com/paritytech/substrate/pull/6269.

Basically, the low level storage APIs now provide a way to spawn additional in-memory overlays and choose whether you want to commit those changes to the main storage overlay or not. When used properly, this can allow you to isolate changes which came about due to a specific extrinsic, and at any point, choose not to commit them.

```rust
#[test]
fn storage_modified_then_error_with_transactional() {
	#[transactional] // <-- This flag fixes our bug
	fn set_value_then_error_transactional(v: u32) -> DispatchResult {
		Value::set(v);
		Err("this is totally okay")?;
		Ok(())
	}

	TestExternalities::default().execute_with(|| {
		// This now succeeds!
		assert_noop!(set_value_then_error_transactional(3), "this is totally okay");
	});
}
```

Additionally, transactional functions can be nested, such that each time a new transactional layer is created, you can choose whether you want to commit those changes to the transactional layer below you. This means you can have storage modifying functions nested within other storage modifying functions, and have pretty much full control over what you do and do not want to commit to the final database.

## Problems with the current system

The current `transactional` system does not take into account limitations of the runtime in terms of computational or memory usage, thus it is not really "safe by default" to use in the runtime. A user can nest transactional layers as much as they want, and there really is no integration of this functionality specifically for benchmarking worst case scenarios.

### Computational Overhead

There is a non-zero cost to resolving a transactional layer into the overlay below it.

We don't really need to copy. All values are stored in heap and we just move pointers.
So the overhead does not depend on the size of the storage items but only on the count of items.

Assuming multiple nested layers, then that layer would need to be copied to the layer below, and so on.

The overhead is very low relative to other kinds of operations within the runtime, but still, at a high enough nesting level, is non-negligible.

### Memory Overhead

From @athei, all of the transactional layers use client memory, not Wasm, so there should be no practical resource limitations here.

## What we want

The end goal for FRAME is to make it as easy as possible to write Pallets which are correct by default. The chance that users can make a mistake of committing changes to storage, returning an error, and then expecting nothing to change is very high. This is especially true when we note that most Pallet developers are probably use to writing code exactly in this way from Smart Contract development on platforms like Ethereum.

As such, FRAME wants to take advantage of `transactional` as both usable and potentially even a default part of the Pallet development experience.

To do that, we must address some of the problems with the existing system.

## Proposed Solutions

To make this feature production ready, we need to address a couple of different problems.

### Hard limit to nesting transactional layers

As mentioned above, there is currently no limits in place for nesting transactional layers, however we know that there is a non-zero resource impact when doing this. Even within software development, there are stack limits which when reached, lead to stack-overflows.

I propose we introduce a conservative hard limit of 10 nested transactional layers as a default in FRAME. Potentially we could allow users to override or bypass this limit through lower level functions, but when simply using the transactional feature, this limit should be enforced.

When the limit is reached, trying to spawn a new transactional layer will return an `Err`, and can be handled by developer.

### Default single transaction layer per extrinsic

The overhead of a single transactional layer should be negligible for nearly all runtime functions, and the benefits in terms of developer experience are huge.

I propose that all extrinsics by default spawn a single transactional storage layer.

If that extrinsic returns `Err`, we can expect that all modified storage items will be reverted, and the underlying state root will be unaffected. However, if that function returns `Ok`, we will instead commit any changes which are present in this single transactional level, just like developers would expect from Smart Contract platforms.

By default, dispatching other calls within a call would not lead to generating more transactional layers.

### Dispatch with transactional layer

There are cases where users may want to dispatch another call within its own transactional layer, but within the limits defined above.

In that case, the user can call a specific `dispatch_with_transactional(call)` which will explicitly spawn a new transactional layer and then execute the call, allowing the user to handle the result.

Currently the `#[tranasctional]` tag is placed above different function definitions, but this does not really make sense if all extrinsics spawn at least one transactional layer by default. Instead, it should be the person writing the dispatch function to determine if a function they are calling should be called within an additional transactional context.

### Annotation for safe without storage layer

Now that the default behavior of extrinsics will be to spawn at least one transactional layer, we can introduce an opt-in optimization where a user can state that a function is safe to be executed without its own transactional layer.

For example:

```rust
/// This function is safe to execute without an additional transactional storage layer.
#[without_transactional]
fn set_value(x: u32) -> DispatchResult {
    Self::check_value(x)?;
    MyStorage::set(x);
    Ok(())
}
```

When this function is called directly, a transactional layer should not spawn for it.

If a user called `dispatch_with_transactional` to this function, a transactional layer also does not need to spawn.

## Name Change

As a final part to this specification, I propose we drop the name `transactional` for something more clear to what is happening here. The term `transaction` is already confusing within the Substrate ecosystem, especially in context with `extrinsics`. Also, it is not clear that the behavior here really has anything to do with `transactions`.

Instead I propose we call these apis `storage_layers`, and basically replace all uses of `transnational` with some appropriate use of that term.

For example:

* `dispatch_with_storage_layer`
* `#[without_storage_layer]`
* `get_storage_layer() -> u8`
* `add_storage_layer() -> Result`
* etc...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Specification for Transactional Storage as Default in FRAME #10806

Steps

Background

What is `transactional`?

Problems with the current system

Computational Overhead

Memory Overhead

What we want

Proposed Solutions

Hard limit to nesting transactional layers

Default single transaction layer per extrinsic

Dispatch with transactional layer

Annotation for safe without storage layer

Name Change

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Specification for Transactional Storage as Default in FRAME #10806

Description

Steps

Background

What is transactional?

Problems with the current system

Computational Overhead

Memory Overhead

What we want

Proposed Solutions

Hard limit to nesting transactional layers

Default single transaction layer per extrinsic

Dispatch with transactional layer

Annotation for safe without storage layer

Name Change

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What is `transactional`?