Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@ahsonkhan
Copy link

@ahsonkhan ahsonkhan commented Oct 23, 2019

See related PRs #41845, #41933

Now that the built-in JavaScriptEncoder implementations have been optimized, S.T.Json can leverage those rather than having its own (duplicate) logic. This is true, at least for the NeedsEscaping check.

This PR effectively reverts the optimizations made in #41845 since those went down to S.T.E.W.

There is no behavioral change here.

Ideally, we'd get rid of the encoding logic as well, when encoder == null, but the S.T.E.W implementation needs to be made faster first (for the common cases) to avoid the regression.

Something like the following change (which treats null as JavaScriptEncoder.Deafult across the board)
ahsonkhan@ced1ee0

else
{
// For performance when no encoder is specified, perform escaping here for Ascii and on the
// first occurrence of a non-Ascii character, then call into the default encoder.
while (indexOfFirstByteToEscape < value.Length)
{
byte val = value[indexOfFirstByteToEscape];
if (IsAsciiValue(val))
{
if (NeedsEscaping(val))
{
EscapeNextBytes(val, destination, ref written);
indexOfFirstByteToEscape++;
}
else
{
destination[written] = val;
written++;
indexOfFirstByteToEscape++;
}
}
else
{
// Fall back to default encoder.
destination = destination.Slice(written);
value = value.Slice(indexOfFirstByteToEscape);
EscapeString(value, destination, JavaScriptEncoder.Default, ref written);
break;
}
}

cc @steveharter, @GrabYourPitchforks, @gfoidl

@ahsonkhan ahsonkhan added this to the 5.0 milestone Oct 23, 2019
Copy link

@scalablecory scalablecory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the SIMD code was moved into the default encoder? lgtm

}
return encoder == null ?
JavaScriptEncoder.Default.FindFirstCharacterToEncodeUtf8(value) :
encoder.FindFirstCharacterToEncodeUtf8(value);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be:

(encoder ?? JavaScriptEncoder.Default).FindFirstCharacterToEncodeUtf8(value)

Copy link
Author

@ahsonkhan ahsonkhan Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. This also produces smallest asm.

idx = encoder == null ?
JavaScriptEncoder.Default.FindFirstCharacterToEncode(ptr, value.Length) :
encoder.FindFirstCharacterToEncode(ptr, value.Length);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the goto getting us here now?

Copy link
Author

@ahsonkhan ahsonkhan Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much perf benefit on its own, but we do produce smaller disassembly without having multiple return points in a fixed block, which could help with inlining.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? it seems like the difference would be a jmp vs a ret; is there some stack space cleanup?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, re-checked the assembly diff, and removing the goto is better in terms of asm size.

// Some implementations of JavascriptEncoder.FindFirstCharacterToEncode may not accept
// null pointers and gaurd against that. Hence, check up-front and fall down to return -1.
if (value.IsEmpty)
{
    return -1; // All characters are allowed.
}

fixed (char* ptr = value)
{
    return (encoder ?? JavaScriptEncoder.Default).FindFirstCharacterToEncode(ptr, value.Length);
}

versus

public static unsafe int NeedsEscaping(ReadOnlySpan<char> value, JavaScriptEncoder encoder)
{
    fixed (char* ptr = value)
    {
        int idx = 0;

        // Some implementations of JavascriptEncoder.FindFirstCharacterToEncode may not accept
        // null pointers and gaurd against that. Hence, check up-front and fall down to return -1.
        if (value.IsEmpty)
        {
            idx = -1; // All characters are allowed.
            goto Return;
        }

        idx = (encoder ?? JavaScriptEncoder.Default).FindFirstCharacterToEncode(ptr, value.Length);

    Return:
        return idx;
    }
}

See diff (3 fewer instructions total):
https://www.diffchecker.com/uqTkCcBl

@ahsonkhan
Copy link
Author

I assume the SIMD code was moved into the default encoder?

Yep.

@ahsonkhan
Copy link
Author

Unrelated test failure: https://github.com/dotnet/corefx/issues/41968

using System.Numerics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.CompilerServices; // Do not remove. Needed for Int32LsbToHexDigit when !BUILDING_INBOX_LIBRARY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when !BUILDING_INBOX_LIBRARY

Put into #if !BUILDING_INBOX_LIBRARY to make this (code-) explicit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I will do this separately (for master) since it is primarily a cosmetic change and I want to cherry-pick this change for 3.1 (was waiting on CI to be green for a while :p).

@ahsonkhan ahsonkhan merged commit daadeef into dotnet:master Oct 23, 2019
@ahsonkhan ahsonkhan deleted the UseDefaultEncoderWhenNull branch October 23, 2019 08:09
ahsonkhan added a commit to ahsonkhan/corefx that referenced this pull request Oct 23, 2019
…sEscaping (dotnet#42023)

* When encoder is null, use JavaScriptEncoder.Default to check for
NeedsEscaping.

* Remove unnecessary unsafe keyword and add comment to using directive.

* Address feedback.

* Remove gotos and move the IsEmpty check outside the fixed block.
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…sEscaping (dotnet/corefx#42023)

* When encoder is null, use JavaScriptEncoder.Default to check for
NeedsEscaping.

* Remove unnecessary unsafe keyword and add comment to using directive.

* Address feedback.

* Remove gotos and move the IsEmpty check outside the fixed block.


Commit migrated from dotnet/corefx@daadeef
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants