-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Updates
- Update 1: Removed reference to assemblyQualifiedNameWithinTypeArgument, which was left over from previous iterations.
- Update 2: Disallowed raw "]" completely in assembly name identifiers to simplify spec. These now have to be escaped regardless whether the assembly name appears in a generic type argument or not.
- Update 3: Merged assembly names and type names into single format.
- Update 4: Clarified difference between SerString and the canonical form. Misread the spec.
Rationale
This is a proposal that attempts to fully specify reflection-notation serialized types for inclusion in ECMA-335 (referred onwards as "the CLI").
In metadata, when a type is persisted as the value of a fixed or named argument, such as in the following code block, it is serialized in a SerString in its canonical form.
[Export(typeof(ILogger))]SerString and the canonical form are documented like so (see _II.23.3 Custom attributes_)
- If the parameter kind is string, (middle line in above diagram) then the blob contains a SerString – a PackedLen count of bytes, followed by the UTF8 characters. If the string is null, its PackedLen has the value 0xFF (with no following characters). If the string is empty (“”), then PackedLen has the value 0x00 (with no following characters).
- If the parameter kind is System.Type, (also, the middle line in above diagram) its value is stored as a SerString (as defined in the previous paragraph), representing its canonical name. The canonical name is its full type name, followed optionally by the assembly where it is defined, its version, culture and public-key-token. If the assembly name is omitted, the CLI looks first in the current assembly, and then in the system library (mscorlib); in these two special cases, it is permitted to omit the assembly-name, version, culture and public-key-token.
The last paragraph is under specified and does not provide enough information for metadata readers or other inspectors to consume and interpret this canonical form.
The documentation for Type.GetType also has an attempt to document a similar format, but it also falls short. Also while nothing in the CLI or on MSDN indicate a relationship between canonical name and the type name you pass to Reflection's Type.GetType, they are clearly related.
Based on this I've attempted to write up the grammar that makes up these formats into a single format. Note, I've used a custom form of BNF (Backus-Naur Form), if that puts an unpleasant taste in your mouth, I'm sorry in advance. :)
My hope is first to work towards an agreement on the format, and then move onto figuring out how to actually represent and document this within the CLI itself (that's where I hope @CarolEidt comes in).
Proposed Format
Format of an full type name or assembly-qualified name in "reflection-notation"
The key is as follows:
Symbol: <name>
Optional: [<name>]
Literal: ","
Or: <pointer>
<array>
<format> ::=
<assemblyQualifiedName>
<fullName>
<assemblyQualifiedName> ::=
<fullName> "," <assemblyName>
<fullName> ::=
<declaringTypeName>[<nestedTypeNames>][<genericTypeArguments>][<pointerOrArray>][<byReference>]
<declaringTypeName> ::=
<simpleTypeName>
<nestedTypeNames> ::=
[<nestedTypeNames>] "+" <nestedTypeName>
<nestedTypeName> ::=
<simpleTypeName>
<simpleTypeName> ::=
[<whitespace>] <identifier>
<genericTypeArguments> ::=
"[" <genericTypeArgumentsList> "]"
<genericTypeArgumentsList> ::=
[<genericTypeArgumentsList> ","] <genericTypeArgument>
<genericTypeArgument> ::=
<genericTypeArgumentFullName>
<genericTypeArgumentAssemblyQualifiedName>
<genericTypeArgumentFullName> ::=
<fullName>
<genericTypeArgumentAssemblyQualifiedName> ::=
"[" <assemblyQualifiedName> "]"
<pointerOrArray> ::=
[<pointerOrArray>]<pointer>
[<pointerOrArray>]<array>
<byReference> ::=
"&"
<pointer> ::=
"*"
<array> ::=
<szArray>
<singleDimensionalArray>
<multiDimensionalArray>
<szArray> ::=
"[]"
<singleDimensionalArray> ::=
"[*]"
<multiDimensionalArray> ::=
"[" <arrayDimensionSeparator> "]"
<arrayDimensionSeparator> ::=
[<arrayDimensionSeparator>] ","
<identifier> ::=
[<identifier>]<identifierChar>
[<identifier>]<escapedChar>
<identifierChar> ::=
any unicode character except <delimiter>
<escapedChar> ::=
"\" <delimiter>
<whitespace> ::=
[<whitespace>] " "
<delimiter> ::=
"*"
"["
"]"
","
"\"
"&"
"+"
<assemblyName> ::=
<name>[<components>]
<name> ::=
[<whitespace>] <identifierOrQuotedIdentifier> [<whitespace>]
<components> ::=
[<components>]<component>
<component> ::=
"," <componentName> "=" <componentValue>
<componentName> ::=
<identifierOrQuotedIdentifier>
<componentValue> ::=
""""
<identifierOrQuotedIdentifier>
<identifierOrQuotedIdentifier> ::=
<identifier>
""" <quotedIdentifier> """
<identifier> ::=
[<identifier>]<identifierChar>
[<identifier>]<escapedChar>
<quotedIdentifier> ::=
[<quotedIdentifier>]<quotedIdentifierChar>
[<quotedIdentifier>]<escapedChar>
<quotedIdentifierChar> ::=
any unicode character except """
<identifierChar> ::=
any unicode character except <delimiter>
<escapedChar> ::=
"\" <delimiter>
<whitespace> ::=
[<whitespace>] " "
<delimiter> ::=
","
"="
"""
"\"
"]"
Notes
I've written an implementation of a decoder of the above format for inclusion as part of System.Reflection.Metadata, 1.2.
Questions
- What do we do about types that are valid and can appear in metadata, but are not currently represented either by reflection or ildasm with a texture equivalent? For example, function pointers or modifiers? What does C++/CLI even persist when I pass long::typeid or (const int*)::typeid as the value of a fixed or named argument? Should we disallow them?
Reflection has lots of corner case issues and inconsistences around on how it handles certain things, such as trailing chars and unclosed quotes. What should we do about them? Should we mimic this in the spec? Or should we just spec the format to be a little tighter and treat these as inconsistences as a quirk of Type.GetType?
We've decided not to mimic these quirks. Writers will be held to the above format, readers can choose to allow more.