addressed PRfeedback and more

Sergey Kanzhelev · Sergey Kanzhelev · commit 253cfb6d4237 · 2019-02-27T10:45:06.000-08:00
diff --git a/index.html b/index.html
@@ -54,6 +54,7 @@
         <section id='abstract' data-include="spec/01-abstract.md" data-include-format='markdown'></section>
         <section id='sotd' data-include="spec/02-sotd.md" data-include-format='markdown'></section>
 
-        <section data-include="spec/20-BINARY_FORMAT.md" data-include-format='markdown'></section>
+        <section data-include="spec/20-binary-format.md" data-include-format='markdown'></section>
+        <section data-include="spec/31-parsing-algoritm.md" data-include-format='markdown' class="informative"></section>
   </body>
 </html>
diff --git a/spec/20-binary-format.md b/spec/20-binary-format.md
@@ -49,48 +49,6 @@ If padding of the field is required (`traceparent` needs to be serialized into
 the bigger buffer) - any number of bytes can be appended to the end of the
 serialized value.
 
-## De-serialization of `traceparent`
-
-Let's assume the algorithm takes a buffer and can set and shift cursor in the
-buffer as well as validate whether the end of the buffer was reached or will be
-reached after reading the given number of bytes. De-serialization of
-`traceparent` should be done in the following sequence:
-
-1. If buffer is empty - return invalid status `BUFFER_EMPTY`. Set a cursor to
-   the first byte.
-2. Read the `version` byte at the cursor position. Shift cursor to `1` byte.
-3. If all three fields (`trace-id`, `parent-id`, `trace-flags`) already read -
-   return them with the status `OK` if `version` is `0` or status
-   `DOWNGRADED_TO_ZERO` otherwise.
-4. If at the end of the buffer return invalid status `TRACEPARENT_INCOMPLETE`.
-   Otherwise read the field identifier byte at the cursor position. Field
-   identifier should be read as unsigned byte assuming big-endian bits order.
-    1. If `0` - check that remaining buffer size is more or equal to `16` bytes.
-       If shorter - return invalid status `TRACE_ID_TOO_SHORT`. Otherwise read
-       the next `16` bytes for `trace-id` and shift cursor to the end of those
-       `16` bytes. Go to step `3`. If `trace-id` is represented as a byte array
-       - first byte should be set into the first element of that array. See
-         comment in serialization section.
-    2. If `1` - check that remaining buffer size is more or equal to `8` bytes.
-       If shorter - return invalid status `PARENT_ID_TOO_SHORT`. Otherwise read
-       the next `8` bytes for `parent-id` and shift cursor to the end of those
-       `8` bytes. Go to step `3`.
-    3. If `2` - check the remaining size of the buffer. If at the end of the
-       buffer - return invalid status. Otherwise - read the `trace-flags`
-       byte. Least significant bit will represent `recorded` value. Go to step
-       `3`.
-    4. In case of any other value - if `version` read at step `2` is `0` -
-       return invalid status `INVALID_FIELD_ID`. If `version` has any other
-       value - `INCOMPATIBLE_VERSION`.
-
-_Note_, that invalid status names are given for readability and not part of the
-specification.
-
-_Note_, that parsing should not treat any additional bytes in the end of the
-buffer as an invalid status. Those fields can be added for padding purposes.
-Optionally implementation can check that the buffer is longer than `29` bytes as
-a very first step if this check is not expensive.
-
 ## `traceparent` example
 
 ``` js
@@ -105,7 +63,7 @@ This corresponds to:
 - `trace-id` is
   `{75, 249, 47, 53, 119, 179, 77, 166, 163, 206, 146, 157, 0, 14, 71, 54}` or
   `4bf9273577b34da6a3ce929d000e4736`.
-- `span-id` is `{52, 240, 103, 170, 11, 169, 2, 183}` or `34f067aa0ba902b7`.
+- `parent-id` is `{52, 240, 103, 170, 11, 169, 2, 183}` or `34f067aa0ba902b7`.
 - `trace-flags` is `1` with the meaning `recorded` is true.
 
 ## `tracestate` binary format
@@ -128,4 +86,16 @@ value-len       = 1BYTE ; length of the value string
 Zero length key (`key-len == 0`) indicates the end of the `tracestate`. So when
 `tracestate` should be serialized into the buffer that is longer than it
 requires - `{ 0, 0 }` (field id `0` and key-len `0`) will indicate the end of
-the `tracestate`.
+the `tracestate`.
+
+## `tracestate` example
+
+``` js
+{ 0,  3,  102, 111, 111,  16,  51, 52, 102, 48, 54, 55, 97, 97, 48, 98, 97, 57, 48, 50, 98, 55,
+  0,  3,   98,  97, 114,  4,   48, 46, 50, 53,  }
+
+```
+
+This corresponds to 2 tracestate entries:
+
+`foo=34f067aa0ba902b7,bar=0.25`
diff --git a/spec/21-BINARY_FORMAT_RATIONALE.md b/spec/21-BINARY_FORMAT_RATIONALE.md
diff --git a/spec/21-binary-format-rationale.md b/spec/21-binary-format-rationale.md
@@ -0,0 +1,25 @@
+# Rationale for decision on binary format
+
+Binary format is similar to proto encoding without any reference on
+protobuf project. It uses field identifiers in bytes in front of field
+values.
+
+## Field identifiers
+
+Protocol uses field identifiers for fields like `trace-id`, `parent-id`,
+`trace-flags` and tracestate entries. The purpose of the field
+identifiers is two-fold. First, allow to remove existing fields or add
+new ones going forward. Second, provides an additional layer of
+validation of the format.
+
+## How can we add new fields
+
+If we follow the rules that we always append the new ids at the end of the
+buffer we can add up to 127. After that we can either use varint encoding or
+just reserve 255 as a continuation byte. Assumption at the moment is
+that specification will never get to this point.
+
+## Why custom binary protocol
+
+We didn't find non-proprietary wide used binary protocol that can be
+used in this specification.
diff --git a/spec/31-parsing-algoritm.md b/spec/31-parsing-algoritm.md
@@ -0,0 +1,86 @@
+# De-serialization algorithms
+
+This is non-normative section that describe de-serialization algorithm
+that may be used to parse `traceparent` and `tracestate` field values.
+
+## De-serialization of `traceparent`
+
+Let's assume the algorithm takes a buffer - bytes array - and can set
+and shift cursor in the buffer as well as validate whether the end of
+the buffer was reached or will be reached after reading the given number
+of bytes. This algorithm can work on stream of bytes. De-serialization
+of `traceparent` MAY be done in the following sequence:
+
+1. If buffer is empty - RETURN invalid status `BUFFER_EMPTY`. Set a cursor to
+   the first byte.
+2. Read the `version` byte at the cursor position. Shift cursor to `1` byte.
+3. If at the end of the buffer RETURN invalid status `TRACEPARENT_INCOMPLETE`.
+4. **Parse `trace-id`**. Read the field identifier byte at the cursor
+   position. If NOT `0` - go to step `8. Report invalid field`.
+   Otherwise - check that remaining buffer size is more or equal to `16`
+   bytes. If shorter - RETURN invalid status `TRACE_ID_TOO_SHORT`.
+   Otherwise read the next `16` bytes for `trace-id` and shift cursor to
+   the end of those `16` bytes.
+5. **Parse `trace-id`**. Read the field identifier byte at the cursor
+   position. If NOT `1` - go to step `8. Report invalid field`.
+   Otherwise - check that remaining buffer size is more or equal to `8`
+   bytes. If shorter - RETURN invalid status `PARENT_ID_TOO_SHORT`.
+   Otherwise read the next `8` bytes for `parent-id` and shift cursor
+   to the end of those `8` bytes.
+6. **Parse `trace-id`**. Read the field identifier byte at the cursor
+   position. If NOT `2` - go to step `8. Report invalid field`.
+   Otherwise - check the remaining size of the buffer. If at the end of
+   the buffer - RETURN invalid status. Otherwise - read the
+   `trace-flags` byte. Least significant bit will represent `recorded`
+   value.
+7. RETURN status `OK` if `version` is `0` or status `DOWNGRADED_TO_ZERO`
+   otherwise.
+8. **Report invalid field**.  If `version` is `0` RETURN invalid status
+   `INVALID_FIELD_ID`. If `version` has any other value -
+   `INCOMPATIBLE_VERSION`
+
+_Note_, that invalid status names are given for readability and not part of the
+specification.
+
+_Note_, that parsing should not treat any additional bytes in the end of the
+buffer as an invalid status. Those fields can be added for padding purposes.
+Optionally implementation can check that the buffer is longer than `29` bytes as
+a very first step if this check is not expensive.
+
+## De-serialization of `tracestate`
+
+Let's assume the algorithm takes a buffer - bytes array - and can set
+and shift cursor in the buffer as well as validate whether the end of
+the buffer was reached or will be reached after reading the given number
+of bytes. Algorithm also uses `version` value parsed from `traceparent`.
+If `version` was not given - value `0` SHOULD be used. This algorithm
+can work on stream of bytes. De-serialization of `tracestate` MAY be
+done in the following sequence:
+
+1. If at the end of the buffer - RETURN status `OK`. Otherwise set a
+   cursor to the first byte.
+2. **Parse `list-member` field identifier**. Read the field identifier
+   byte at the cursor position and shift cursor to `1` byte. If NOT `0`
+   and `version` is `0` RETURN invalid status `INVALID_FIELD_ID`. If NOT
+   `0` and `version` has any other value - `INCOMPATIBLE_VERSION`.
+3. **Parse key**.
+   1. If at the end of the buffer - RETURN status `OK`. This situation
+      indicates that `tracestate` value was padded with `0`.
+   2. Read the `key-len` byte. Shift cursor to `1` byte. If the value of
+      `key-len` is `0` - RETURN status `OK`. This situation indicates an
+      explicit end of a key.
+   3. Check that buffer has `key-len` more bytes. If not - RETURN
+      `KEY_TOO_SHORT`.
+   4. Read `key-len` bytes as `key`. Shift cursor to `key-len` bytes.
+4. **Parse value**.
+   1. If at the end of the buffer - RETURN status `INCOMPLETE_LIST_MEMBER`.
+   2. Read the `value-len` byte. Shift cursor to `1` byte. If the value of
+      `value-len` is `0` - add `list-member` with the `key` and empty
+      `value` to the `tracestate` list. RETURN status `OK`.
+   3. Check that buffer has `value-len` more bytes. If not - RETURN
+      `VALUE_TOO_SHORT`.
+   4. Read `value-len` bytes as `value`. Shift cursor to `value-len`
+      bytes.
+   5. Add `list-member` with the `key` and `value` to the `tracestate`
+      list.
+5. Go to step `2. Parse list-member field identifier`.