-
Notifications
You must be signed in to change notification settings - Fork 830
[WIP] Add flag to the %A printf specifier that escapes control characters #2558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It makes a lot of sense :) Can we get an idea if performance changed / improved on large strings? I'm seeing we had List.rev but were using indexer on the string rather than Seq.cast. |
|
Neither |
|
@saul I see that the code is not printing the whole string, but is it correct that I've made another implementation of the function: let formatString2 (s:string) =
let builder = System.Text.StringBuilder(s.Length)
builder.Append("\"") |> ignore
for c in s do
builder.Append(formatChar false c) |> ignore
builder.Append("\"") |> ignore
string builderand it seems to run lot faster (tried on a 4mb xml file): I think using |
|
That's a shame - I was hoping that writing it in idiomatic F# would be performant enough. I'll try your 'large string' benchmark tomorrow. Thanks for looking into this @smoothdeveloper |
|
@saul I'm sure you can turn my code into idiomatic F#, Seq.iter and making local function for builder.Append, we will have best of both worlds :) |
|
My suggestion is that we should use |
|
I thought that I had already commented that this would be a breaking change, lots of code uses %A to output readable text, and undoubtedly works around or perhaps relies on the current quirky behavior, so an alternate formatting char would be useful. |
|
@dsyme @KevinRansom What about if a string is nested within a record? How would we print it with %A then? Are you suggesting that we keep all string formatting the same unless a different flag has been supplied? How do we do %+A with this new string formatting flag then? Personally I think it makes more sense if we accept it as a breaking change - if people want the old behaviour it's very easily replicated with I think another flag just adds even more complexity to printf. I did a straw poll of seasoned F# developers at my office today and about 10% of them knew what %+A did. I think the discoverability of flags is shoddy and it would be best if we were correct by default. |
|
FSharp has many thousands of active customers, a breaking change in this will affect a portion of them in some way. The pain addressed by this PR is probably insufficient to justify a language breaking change. Certainly if C# had this feature there is no way it would get past the breaking change committee. I have seen code-reviews were ToString() modifications were rejected. I'm sure that we can figure out a non-breaking way of adding the type of formatting you are suggesting. |
|
Thanks @KevinRansom, that's fine - I'll implement it as a '@' flag for now and update this PR in due course :) Also @smoothdeveloper I did some perf tests today, and here are the results: This is the average of 25 runs of each benchmark, parsing a .dll file and a .json file. The first 6 are very similar, with each run they tend to swap numbers. I think I'll go with the String.iter solution and update this PR. |
|
Latest commit: Current behaviour:
I'm going to mark this WIP as I still have some work to do (will update original post). |
|
👍, I like it with |
| let o = | ||
| Microsoft.FSharp.Text.StructuredPrintfImpl.FormatOptions.Default | ||
| |> fun o -> | ||
| if useZeroWidth then { o with PrintWidth = 0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use let bindings - that's normal style in FSharp.Core. If you like give all the o different names.
| fieldInfo.GetValue(obj) | ||
|
|
||
| let formatChar isChar c = | ||
| match c with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure but perhaps we should be escaping some entire classes of Unicode characters as well. See https://en.wikipedia.org/wiki/Template:General_Category_(Unicode). Though I suppose it's not strictly necessary - the rule is that F# should be able to parse what it prints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right in saying that F# files are parsed as UTF-8? In which case I don't think we need to escape anything else other than what's there already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relevant part of the spec will be the part on strings - and yes, I believe the ones you've got are the only characters that must be escaped. Even some of those need not be - like \t in strings - but I think you're aiming for a string that looks "good" rather than full of strange characters.
|
|
||
| [<Test>] | ||
| member __.``Standard characters are correctly escaped``() = | ||
| test "%A" "\n" "\"\\n\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests will need updating. Also cover +@A and @+A etc.
|
@dsyme See my latest commit - how do I add the new SprintfAFormatSpecifiers.fs and FsiEscapesStringsByDefault.fs QA tests to the fsharpqa suite? @cartermp Where should I raise the RFC? Do you want me to do it in https://github.com/fsharp/fslang-suggestions? |
|
Add the test cases to this file: visualfsharp/tests/fsharpqa/Source/Printing/env.lst Something like: Kevin |
|
Thanks @KevinRansom, I'll make that change when I'm back. The more pressing issue is that the tests don't compile as they're being compiled by F# 4.1 which doesn't support the @ flag. Do I need to make these QA tests instead of unit tests too? |
|
@saul I think just an issue here to track the work + state the basic design of the format string is good enough. Edit: Yup, RFC |
That's odd. Have you run |
|
@dsyme I will take a look, right now. |
Yes, add the RFC there, thanks! |
|
Sadly ci_part1 works fine on my machine .... doh!!! that was sauls master branch!!! trying again with the real deal. |
|
When building FSharp.Core.dll for the portable profiles we get this: I think there may be an incompatability in the portable profile that causes us to prefer IEnumerable, rather than IEnumerable<char> Yep!!! profile 7 version of string does not support IEnumerable<char> .... wow!!!!! extracted using reflector from net45 mscorlib Just testing the fix, I'll send a PR when it's done. |
|
Now it looks like a regression here: Compiled with built compiler: Compiled with released compiler: I have pushed the portable7 fix to your fork. I will let you take a look at the regression. Kevin |
|
Thanks @KevinRansom - I've committed a fix for that regression and added the new test to the suite. If the tests go green then it's ready. |
|
@saul a couple more test failures that look they are related to the change. |
|
Thanks @KevinRansom - I believe I know what the second error is, but I'm afraid I've no idea how I could have caused the first one. I'll get around to fixing the latter ASAP! |
|
@dotnet-bot test this please |
|
@dotnet-bot test this please |
|
@dotnet-bot test this please |
|
@KevinRansom Now that we don't build PCLs anymore, are the concerns there no longer valid? Also conflicts. |
|
@saul do you think we should still pursue this PR? The conflict is easy to resolve PrintfTests.fs has moved from below src to below test. |
|
please reopen this PR when you are ready to pursue it further. Kevin |



This PR adds a flag to the %A format specifier: @. This flag escapes control characters and other characters that need to be escaped in F# source code string literals.
Before
After
This allows us to directly copy strings that have been formatted with %@A straight into code - and they'll 'just work'.
Remaining work: