Skip to content

Encoding of prefix in data segment entries #1439

@nagisa

Description

@nagisa

Hi,

In WebAssembly 1.0 the data segment encoding was specified as

datasec ::= vec(data)
data ::= memidx expr vec(byte)

and with the merge of bulk-memory proposal (AFAIU) it has become

datasec ::= vec(data)
data ::= 0x00 expr vec(byte)
       | 0x01 vec(byte)
       | 0x02 memidx expr vec(byte)

This sort of seems backwards-compatible at a first glance, but it isn't actually due to memidx using the LEB128 encoding. This means that in the original specification 00, 80 00, 80 80 00, 80 80 80 00, 80 80 80 80 00 were all valid prefix bitstream patterns for the data. As a result binary encoded modules that were valid as per the currently published WebAssembly specification are no longer valid in the draft.

I wonder if it would make sense to adjust the description of the data as such:

datasec ::= vec(data)
data ::= 0x00 expr vec(byte)
       | 0x01 vec(byte)
       | 0x02 memidx expr vec(byte)
       | 0x80 0x00 expr vec(byte)
       | 0x80 0x80 0x00 expr vec(byte)
       | 0x80 0x80 0x80 0x00 expr vec(byte)
       | 0x80 0x80 0x80 0x80 0x00 expr vec(byte)

or perhaps even

datasec ::= vec(data)
data ::= 0x00 expr vec(byte)
       | 0x01 vec(byte)
       | 0x02 memidx expr vec(byte)
       | 0x80 memidx expr vec(byte) // note, though that it ends up accepting `80 80 80 80 80 00` as an additional valid encoding.

in order to maintain the backwards compatibility? Seems like it'd be a pretty painless change to make.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions