Skip to content

Serializer support for easier object and collection converters #1562

@steveharter

Description

@steveharter

Extend the existing converter model for collections and objects. The current converter model (with base class JsonConverter) was intended primarily to support value types, not collections and objects.

Extending the converter model to objects and collections will have the following benefits:

  • Better performance, especially collection serialization.
    • This is due to less boxing on the common struct-based enumerator types such as List<T>.Enumerator and less re-processing of logic including determining what type of collection is being (de)serialized.
  • By making the converter model public it allows others to quickly and easily handle several scenarios not possible today including:
    • Better performance for a converter's Read() method that do not require "read-ahead" for the async+stream cases (explained later).
    • Similarly, future AOT (Ahead-Of-Time compilation) deserialization will not require "read-ahead". Each POCO object would have its own converter.
    • Ability for a custom converter's Write() support an async flush on the underlying stream when a threshold is met.
    • Converter support for the new reference handling feature. The underlying dictionary<$id, object> will be available to the converter.
    • Converter support for obtaining the json path (for deserialization) and the object path (for serialization). Currently this is internal and used to supplement JsonException properties.
    • Ability to have before- and after-callback for objects and collections to allow the serializer to do the bulk of the work and the custom code to perform any logic for serializing additional members, performing defaulting or validation, or adding an element to a collection.
  • A consistent public GetConverter API that allows one converter to forward to another converter.
    • Every serializable property will have a non-null converter obtainable by JsonSerializationOptions.GetConverter(). Currently, the built-in support for objects and collections do not have any converters that are returned from GetConverter() since that logic currently exists in the "main loop" and thus requires a converter that wants to forward to those to manually re-enter the main (De)Serialize methods which is slow and has issues including losing "json path" semantics for exceptions.
  • A loosely-typed (non-generic) mechanism to call a converter. This is important for converters that can't use generics (or call MakeGenericType in order to call a converter). For example, a converter that implements System.Runtime.Serialization.ISerializable. It is also used internally for the root object being returned for the non-generic deserialize\serialize methods.
  • Better understandability and maintainability of the code, leading to higher quality and easier feature implementation.
    • The existing "main loop" (explained later) for objects and collections has been difficult to extend for dictionaries and specialized\immutable collections which required a secondary internal converter model (which would go away with a new converter model). There have been several validation and consistency issues and in general this area needs to be refactored.

Backgound

Currently all internal value-type converters (e.g. String, DateTime, etc) use the same converter model (base class and infrastructure) as custom converters. This means the runtime and extensibility are co-dependent which allows for shared logic and essentially no limit on what can be done within a converter. The existing converter model has proven itself very flexible and performant.

However, currently collections and objects do not use the converter model. Instead they are implemented within a "main loop" consisting of static methods along with state classes.

The state classes (ReadStack, ReadStackFrame, WriteStack, WriteStackFrame) exist to support async-based streams where the call stack may need to unwind in order to read async or flush async from the underling stream, and then continue once that is done. This is done to keep memory requirements low and increase throughput for large streams -- the serializer does not "drain" the stream ahead of time and instead has first-class support for streams and async.

The state classes along with the converter design support shared code for both sync and async support. This prevents having to write both an async and sync converter, for example, and prevents the overhead of using async and passing the stream down to almost every method. This shared code benefit applies to both the runtime and custom converters.

With a new converter model, the state classes will continue to remain for unwind\continuation support, but will also work alongside the normal CLR call stack where there will be ~1 call frame for each level in JSON (for each nested JSON array or object). This makes the code more performant.

A limitation of the existing converter model is that it must "read-ahead" during deserialization to fully populate the buffer up to the end up the current JSON level. This read-ahead only occurs when the async+stream JsonSerializer deserialize methods are called and only when the current JSON for that converter starts with a StartArray or StartObject token. Read-ahead is required because the existing converter design does not support a mechanism to "unwind" (when data is exhausted) and "continue" (when more data is read) so the converter expects all data to be present, and expects that reader.Read() will never return false due to running out of data in the current buffer. If the converter does not start with StartArray or StartObject, then it is assumed the converter will not call reader.Read().

Similarly, a limitation of the existing converter model is that it does not support async flush of the underlying stream for the converter's write methods. Again, this only applies to the async+stream case and only when the converter performs multiple write operations that may hit a threshold. Note that the built-in implementation for object and collections (which do not use converters) do support async flush (and async read) when thresholds are hit, but converters do not.

Proposed API

namespace System.Text.Json.Serialization
{
    // Existing type:
    public abstract class JsonConverter
    {
        // Existing:
        public abstract bool CanConvert(Type typeToConvert);

        // New object-based APIs not requiring generics:
        public virtual bool TryRead(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options, ref ReadStack state, ref object value);
        public virtual bool TryWrite(Utf8JsonWriter writer, object value, JsonSerializerOptions options, ref WriteStack state);
    }

    // Existing type:
    public abstract class JsonConverter<T> : JsonConverter
    {
        // Existing:
        protected internal JsonConverter() { }
        public override bool CanConvert(Type typeToConvert);
        public abstract T Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options);
        public abstract void Write(Utf8JsonWriter writer, T value, JsonSerializerOptions options);

        // New:
        public override sealed bool TryRead(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options, ref ReadStack state, ref object value);
        public override sealed bool TryWrite(Utf8JsonWriter writer, object value, JsonSerializerOptions options, ref WriteStack state);
        public virtual bool TryRead(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options, ref ReadStack state, ref T value);
        public virtual bool TryWrite(Utf8JsonWriter writer, T value, JsonSerializerOptions options, ref WriteStack state);
    }

    // New type:
    public abstract class JsonObjectConverter<T> : JsonConverter<T>
    {
        public override sealed T Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options);
        public override sealed void Write(Utf8JsonWriter writer, T value, JsonSerializerOptions options);
    }

    // New type:
    public abstract class JsonArrayConverter<TCollection, TElement> : JsonConverter<TCollection>
    {
        public override sealed TCollection Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options);
        public override sealed void Write(Utf8JsonWriter writer, TCollection value, JsonSerializerOptions options);
    }
}

The ReadStack* and WriteStack* structs will likely be renamed and have a few members such as obtaining the JsonPath, dictionary for reference handling, and state used for continuation after an async read\flush.

In addition to the above base classes, there will likely be:

  • Additional derived class for an object with callbacks for before-read, before-write, after-read, after-write and "GetMember" to lookup a CLR member based on JSON property name.
  • Additional derived class for a collection with callbacks for before-read, before-write, after-read, after-write and for collection an "Add" method to add an element to the collection.
  • Exposing existing converters and converter factories primarily to support AOT compilation where the initialization phase requires compile-time knowledge of the converter for each property.

Metadata

Metadata

Assignees

Labels

api-needs-workAPI needs work before it is approved, it is NOT ready for implementationarea-System.Text.Json

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions