Skip to content

bits vs bytes: a complete specification of the memory model #10547

@andrewrk

Description

@andrewrk

This proposal makes a distinction between the in-memory storage of a value and the storage of a value according to information theory.

In some cases, these are the same thing, such as with a u8 type. In other cases they can be different for various reasons, such as endianness or aggregate padding.

The idea here is that these two operations would both be well-defined, but would produce different results; they would not be defined in terms of each other:

byte casting:

fn byteCast(a: u32) f32 {
    return @ptrCast(*f32, &a).*;
}

bit casting:

fn bitCast(a: u32) f32 {
    return @bitCast(f32, a);
}

Byte casting is done via pointer reinterpretation, or via extern union field aliasing. Bit casting is done via the @bitCast language primitive. They are both well defined but distinct:

  • byte casting - reinterprets memory that is stored in contiguous bytes from one type to another. This is affected by padding and endianness.
  • bit casting - reinterprets bits according to information theory. Regardless of padding, alignment, or endianness, if the number of bits it takes to represent a type matches another, the value can be bitcasted from one to the other.

Byte casting is easy to understand; you can almost implement it by accident. Here we focus on bit casting and what it means. First, some prerequisites:

  • @bitSizeOf vs @sizeOf - sizeof corresponds to bytes. It takes into account padding. As an example, @sizeOf(u24) == 4. Meanwhile, @bitSizeOf corresponds to information theory. In this example, @bitSizeOf(u24) == 24. bit size ignores padding. The bit size of a struct, regardless of whether it is packed or extern or not, is the sum of @bitSizeOf for each field.
  • @bitOffsetOf vs @byteOffsetOf - for bytes it points to the difference in memory address between a field and the base pointer. For bits it tells the number of lower bits that precede the field in a hypothetical integer with bits equal to the @bitSizeOf the aggregate.

With this proposal, each type, regardless of whether it has a well-defined memory layout or not (which applies to bytes), it has a hypothetical integer with a number of bits equal to the @bitSizeOf that type. We call this integer the type's fundamental int. @bitCast is defined as follows:

  1. convert from the source type to the its fundamental int.
  2. convert from the fundamental int to the destination type.

Attempting to @bitCast between two types that have differing @bitSizeOf values is a compile error. Note that one can obtain the fundamental int for a type by bit casting the value to an unsigned integer.

The motivation for this proposal is:

  • To complete the specification of how these bit related functions work.
  • To make composing packed structs useful and making align(0) useful in general.
  • To make it possible to optimize things such as ??u8, whose fundamental integer would be a u10.
  • To have the protection of a type system but also allow Data-Oriented-Design tricks, storing information in compact ways.

With this proposal, one would be able to convert between structs, even though they have no well-defined byte representation, like this:

const std = @import("std");
const expect = std.testing.expect;

const S = struct {
    name: []const u8,
    ok: ?bool,
};

const Other = struct {
    name_ptr: [*]const u8,
    name_len: usize,
    ok_present: u1,
    ok_flag: u1,
};

test "example" {
    var s = S{
        .name = "hello",
        .ok = true,
    };
    var other = @bitCast(Other, s);

    try expect(std.mem.eql(u8, other.name_ptr[0..other.name_len], "hello"));
    try expect(other.ok_present == 1);
    try expect(other.ok_flag == 1);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    breakingImplementing this issue could cause existing code to no longer compile or have different behavior.proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions