bits vs bytes: a complete specification of the memory model

This proposal makes a distinction between the in-memory storage of a value and the storage of a value according to information theory.

In some cases, these are the same thing, such as with a `u8` type. In other cases they can be different for various reasons, such as endianness or aggregate padding.

The idea here is that these two operations would both be well-defined, but would produce different results; they would *not* be defined in terms of each other:

byte casting:

```zig
fn byteCast(a: u32) f32 {
    return @ptrCast(*f32, &a).*;
}
```

bit casting:

```zig
fn bitCast(a: u32) f32 {
    return @bitCast(f32, a);
}
```

Byte casting is done via pointer reinterpretation, or via `extern union` field aliasing. Bit casting is done via the `@bitCast` language primitive. They are both well defined but distinct:

 * **byte casting** - reinterprets memory that is stored in contiguous bytes from one type to another. This is affected by padding and endianness.
 * **bit casting** - reinterprets bits according to information theory. Regardless of padding, alignment, or endianness, if the number of bits it takes to represent a type matches another, the value can be bitcasted from one to the other.

Byte casting is easy to understand; you can almost implement it by accident. Here we focus on bit casting and what it means. First, some prerequisites:

 * `@bitSizeOf` vs `@sizeOf` - sizeof corresponds to bytes. It takes into account padding. As an example, `@sizeOf(u24) == 4`. Meanwhile, `@bitSizeOf` corresponds to information theory. In this example, `@bitSizeOf(u24) == 24`. bit size ignores padding. The bit size of a struct, regardless of whether it is packed or extern or not, is the sum of `@bitSizeOf` for each field.
 * `@bitOffsetOf` vs `@byteOffsetOf` - for bytes it points to the difference in memory address between a field and the base pointer. For bits it tells the number of lower bits that precede the field in a hypothetical integer with bits equal to the `@bitSizeOf` the aggregate.

With this proposal, each type, regardless of whether it has a well-defined memory layout or not (which applies to *bytes*), it has a hypothetical integer with a number of bits equal to the `@bitSizeOf` that type. We call this integer the type's **fundamental int**. `@bitCast` is defined as follows:

 1. convert from the source type to the its fundamental int.
 2. convert from the fundamental int to the destination type.

Attempting to `@bitCast` between two types that have differing `@bitSizeOf` values is a compile error. Note that one can obtain the fundamental int for a type by bit casting the value to an unsigned integer.

The motivation for this proposal is:
 * To complete the specification of how these bit related functions work.
 * To make composing packed structs useful and making `align(0)` useful in general.
 * To make it possible to optimize things such as `??u8`, whose fundamental integer would be a `u10`.
 * To have the protection of a type system but also allow Data-Oriented-Design tricks, storing information in compact ways.

With this proposal, one would be able to convert between structs, even though they have no well-defined byte representation, like this:

```zig
const std = @import("std");
const expect = std.testing.expect;

const S = struct {
    name: []const u8,
    ok: ?bool,
};

const Other = struct {
    name_ptr: [*]const u8,
    name_len: usize,
    ok_present: u1,
    ok_flag: u1,
};

test "example" {
    var s = S{
        .name = "hello",
        .ok = true,
    };
    var other = @bitCast(Other, s);

    try expect(std.mem.eql(u8, other.name_ptr[0..other.name_len], "hello"));
    try expect(other.ok_present == 1);
    try expect(other.ok_flag == 1);
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bits vs bytes: a complete specification of the memory model #10547

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

bits vs bytes: a complete specification of the memory model #10547

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions