Skip to content

Proposal: 'typedef' for zig #5132

@ghost

Description

There is a related proposal for distinct types, but it doesn't explore all the opportunities that a more generic form of distinct types could provide. This proposal is an attempt to do just that.

It is also an attempt at finding a new core language abstraction in zig which hopefully is both simple (conceptually) and expressive enough to be worthwhile.

'typedef' in zig could become a common platform for any additional or improved feature that is "comptime only" in nature. This in contrast to implementing each such comptime feature in separation and from scratch.

Being a "comptime only" feature, typedef wouldn't add much (if any) complexity to the codegen stage, and rather require changes to the earlier compilation stages instead.

That there is a similar (though more limited) typedef feature in C will hopefully help make typedef in zig discoverable for newcomers.

Overall, I'd say this proposal is a relatively logical and natural addition to zig, as there are no "new" concepts introduced. It's rather a merger of existing ones:

  • Types are values
  • Types can be namespaces
  • Methods are just functions (unified call syntax)
  • Coercion of types
  • Casting between types
  • Comptime and comptime reflection
  • The existence of builtin types (primitives) that have more language support than user defined types can have, e.g some syntax will only work with builtin types.
  • Syntax relying on built-in types/enums for configuration

Use cases:

Can be implemented on top of typedef:

To avoid making this text even longer than it is, I will mostly refer back from comments in other issues to cover the use cases that way, instead of writing in detail about all the use cases here. Other times, a feature might map so trivially to typedef that it is natural to include it in the summary section below.

The focus of this post is instead on the principles and framework that the implementations for those use cases would be based on.


Usage perspective:

Introducing the typedef syntax:

const MyInt = typedef(u32, .Distinct);        // restricted coercion
const MyInt2 = typedef(u32, .Alias) {         // unrestricted coercion
  fn increment(self: MyInt2) MyInt2 {return self+1;}    // const x: MyInt2 = 10; const y = x.increment(); assert(y==11);
}
const TaggedFloat = typedef(f64, .Tag{.txt="percentage"});   // reflection with std.....hasTag()

Then a more elaborate (but very generic) example that is explained in detail below:

// .Tdconf = "Typedef config" (instance of zig built-in tagged union)
const Td = typedef(T, .Tdconf) {
  const FOO = 100;
  fn method(self: T) bool{
     // ...
  }
};

test "typedef" {
  const t : T = createInstanceOf(T);
  const td : Td = @as(Td, t)
}
// T, t, Td, td, and .Tdconf will frequently be referenced below:

Read the above as:

  • Td is a (runtime) type alias of the type T. They have the exact same runtime representation.
    • Casting with @as(,) from Td to T is always possible, while casting from T to Td may be restricted if desirable. The default is that casting is always possible in both directions.
    • T is the base (type) of Td
  • Td is comptime distinct from T. They can be separated from each other at compile time.
    • Can also separate between two different typedefs, not just between typedefs and the base type.
  • T can be any type. Primitive, struct, enum, array... doesn't matter.
  • Td has an (optionally) attached scope for top level declarations (no fields!)
  • If the attached scope in the declaration of Td was omitted:
    • The namespace (members) of Td and T are the same
    • td.method() points to function in T
  • Otherwise, (namespace/scope was attached):
    • Td.member does not point to T.member. The namespaces of Td and T are separate
    • td.method() points to a function in Td
    • Methods declared in T still work, e.g T.origMethod(td,param)
  • Fields and other non-method members like td.field, td.len, td[2] are still accessible, (as if t was substituted for td), no matter whether Td had an attached namespace or not.
  • When td is used in an expression, the compiler will try to auto coerce td to the type the expression expects.
    • Example: If T happens to be f64, then td + td might coerce to f64 + f64
      and the expression returns f64.
  • Td is configured with an instance of a zig built-in tagged union, here .Tdconf.
    • The different union members carry parameter "paylods" depending on their type. In many cases void is sufficient, and you can write .Tdconf (instead of .Tdconf2{.param = value})

The different typedef configs allow for user specified comptime behavior regarding coercion, unified call syntax, member access, available syntax, restricted syntax, custom validation, etc.

Creating a standardized way to get more control over these compile time aspects is what allows other features to be built on top of typedef.

One typedef config will usually only affect a small subset of comptime behavior at a time. E.g typedef(u32, .Distinct) would be orthogonal to typedef(MyStruct, .Tag{.txt="mytag"}) with regards to which compilation behavior they would parametrize or modify, even though they are both
ultimately based on typedef.

Continuing the list:

  • .Tdconf might affect Td and/or td in the following ways:
    • impose restrictions on T in the typedef declaration. E.g some configs might demand T to be a numeric type.
    • how type coercion works when attempting to coerce from ..
      • td to other typedefs also wrapping T
      • td to T directly
      • some instance of T (wrapped in a typedef or otherwise) to Td
    • which member access syntax on td that is enabled
    • behavior of td in expressions
    • restrictions and validation on the namespaces of T and/or Td ( at the declaration time of Td)

And finally:

  • Tdconf supports comptime reflection, so user code can inspect the data contained in the config instance for a particular typedef type or instance. Td-configs are just union members after all.
    • @typeInfo exposes the typedef data, or a new builtin could be introduced: @typedefInfo

Implementation perspective:

// current syntax
const T1   = u32;
const T2   = SomeStruct;
const T3   = []const u8;
const A_T1 = T1;

// typedef syntax
const TD_T1 = typedef(T1, .Distinct) {};    // keyword takes type, config value, and optional scope
const TD_T2 = typedef(T2, .Distinct);       // typedef can wrap/copy all zig types.
const TD_T3 = typedef(T3, .Distinct) {};    // .. primitives or not
const TD_TD_T1 = typedef(TD_D1, .Distinct); // typedef may be nested
const TD_R = typedef(u32, .ResourceHandle); // config can be something else than .Distinct

Representation of typedef:

Let a typedef "type" be represented by the shorthand TD(T,N,B,S,C).

  • T : type = base zig type
  • N : u32 = typedef id number
  • B : u32 = base typedef id number.
  • S : u32/pointer = attached scope, pointer/id
  • C : u32/pointer = builtin tagged union config pointer/id

Where TD(T,0,0,0,0) is "reserved" as being equivalent to the plain zig type T

T is always a plain zig type, whether the typedef declaration "wrapped" a normal zig type or another typedef.

N is a unique (nonzero) ID that is assigned per declaration of a typedef. N may be zero only when the shorthand represents a plain zig type.

B is included to enable nesting of typedefs, with B equal to the id (N) of the base typedef,
or with B = 0 if the base is a plain zig type.

S is an ID or pointer corresponding to the attached namespace of a typedef declaration. S == 0 (null) if there was no namespace attached.

C stores an ID or pointer corresponding to a typedef config (created per declaration, but there exists only one instance per unique config) where C == 0 (null) only when the shorthand represents a plain zig type, not a typedef.

Example:

// NOTE: refer to the context above
// ID field N set to zero for plain zig types/type aliases
TdTypeOf(T1)   -> TD(u32,0,0,0,0)
TdTypeOf(T2)   -> TD(SomeStruct,0,0,0,0)
TdTypeOf(A_T1) -> TD(SomeStruct,0,0,0,0)

// c1, c2 != 0, and c1 != c2
TdTypeOf(TD_T2)   -> TD(SomeStruct,n,0,s,c1) // n != 0, and s != 0 (has attached scope)
TdTypeOf(TD_D_T1) -> TD(u32,m,n,0,c1)        // m != 0 and m != n (typedef nesting)
TdTypeOf(TD_R)    -> TD(u32,l,0,0,c2)        // l != 0

Zig compiler and typedef:

I don't know as much about the zig compiler as others here, so I have to go by assumptions and hope they are valid or at least not completely off.

My assumptions are that:

  • The shorthand TD(T,N,B,S,C) is a sufficient representation of typedef declarations for a
    powerful typedef feature in zig:
    • Note: N and B are used for separation of typedefs wrapping the same base type, and for controlling coercion mechanics.
    • Note: S adds flexibility and control with regard to namespaces and member access.
    • Note: C adds comptime logic parametrization that is standardized in the form of a union, allowing ..
      • a typedef to invoke non-standard behavior if that is necessary to make a language feature work.
      • a typedef (wrapping a primitive type) to be as predictable to the compiler as if the typedef was a primitive itself.
  • For each typedef declaration encountered, the compiler can add entries into a table (or table
    like structure), with columns corresponding to the typedef-shorthand.
  • The scope (top level declarations) and config instance attached to a given typedef can be stored in separate list structures, and be referenced from the table.
  • All desired comptime logic or processing involving typedef is possible (specifically, there is sufficient information available) if the following are true:
    • There exists a table with rows corresponding to all typedef declarations, and columns equivalent to the shorthand TD(T,N,B,S,C)
    • ZigValue has a field tdId for the typedef id N.
      • Note: As per the section above, when the ZigValue
        is a plain zig type (not a typedef), tdId is equal to zero.
      • Note: Coercion and casting between typedefs wrapping the same base type
        boils down to modifying the tdId field of the ZigValue.
      • Note/Example: Coercion or casting from a typedef to its base type would
        simply set tdId to zero in the result value.
  • instructions for doing arbitrary mathematical operations on numeric arrays can be generated prior to the codegen stage
    • Note/Example: Multiplication of complex numbers.

Another look at typedef declarations
const Td = typedef(T, .Tdconf) {
  const FOO = 100;
};

const TdTd = typedef(Td, .Tdconf);

test ".." {
    const t : T = T{};

    const myTd1 : Td = t; // will this work?
    const myTd2 = @as(Td,t);

    const myTdTd = @as(TdTd, myTd1);

    const expr = 100 + Td.FOO;
}

The following are the results from the typedef declarations and casts above, assuming T is a base type:

  • T is a ZigValue of type "type" or "meta", with its typedef ID equal to zero
  • Td is a ZigValue exactly the same as T, except that the typedef ID n is non-zero and uniquely generated for this particular declaration
  • TdTd is a ZigValue exactly the same as T, but has a non-zero typedef id m, with m != n.
  • Typedef config collection, changes:
    • Add .Tdconf once to the collection, and mark it with id or pointer tc. Added once only due to memoization of equivalent payloads for Td and TdTd.
  • Typedef scope collection, changes:
    • Add the scope that contains the declaration const FOO = 100 to the collection, and mark it with id or pointer s
  • Typedef table, changes:
    • New row for Td, TD(T,N,B,S,C) => TD(T,n,0,s, tc)
    • New row for TdTd, TD(T,N,B,S,C) => TD(T,m,n,0, tc)

Casting and coercing between typedefs essentially involve changing the typedef id associated with a ZigValue according to rules that are parametrized by the typedef configs.

Handling of assignments, typedef perspective only. t is a ZigValue of type T, with typedef id equal to 0:

  • myTd1:
    • Compiler will check typeid of Td to see if it's non-zero, which is the case here.
    • Compiler must either accept or reject the coercion expressed as 0 -> n in terms of the required change in typedef id, depending on the coercion rules defined by the typedef config.
    • Compiler looks up tc in the table using typedef id n, and can make a decision.
    • If the coercion is accepted, then myTd1 will be a ZigValue of type T, with a typedef id equal to n.
  • myTd2
    • Casting to a typedef might be restricted, so as above, the compiler will have to look up the typedef config of Td to determine the correct action.
  • myTdTd
    • Compiler must either accept or reject the coercion expressed as n -> m. In this case,
      it can be taken into account that the base-typedef id of TdTd is non-zero and equal to n, so the hierarchical relation between TdTd and Td can be determined by comparing these numbers. The decision might involve querying typedef configs of both Td and TdTd.
  • expr
    • T is not necessarily a struct, and Td was declared with an attached scope, overriding any scope the base type might define.
    • Td is just a ZigValue of type T with non-zero typedef ID. The declaration Td.FOO is not directly associated with Td. The compiler must replace Td.FOO with something akin to getTypedefStructDecl(s,"FOO"), by looking up the attached scope s in the "typedef table".

Mapping typedef to use cases:

The capabilities of typedefs come in form of

  • (1) Attached namespace and "syntax views" of a type
  • (2) Coercion control
  • (3) Providing payloads for comptime reflection
  • (4) Defining additional language primitive types/typedefs
  • (5) Comptime validation of the specified base

As specified before, all typedef configs are members of a built-in union. Below is a subset of possible typedef configs that would solve some existing github issues, while spanning enough of the "typedef framework" to get decent initial test coverage of it.

Only if these four typedef configs are possible to include in an architecturally sound manner
should additional features (more typedef configs) be considered.

Core typedef configs:

  • .Alias : void (1)
    • Used to mimic const T2 = T1, which creates a "dumb alias" where T1 and T2 automatically coerce to each other without needing any explicit casting. Still, a useful config to have when using typedef solely for the purpose of defining methods.
  • .Distinct : void (2)
    • For creating a "smart alias" with more restricted coercion mechanics.
  • .Tag: []const u8 (3)
    • Attaches a user specified "text payload" to a type, where that text is available at comptime through reflection.
  • .Complex : void (4)
    • For creating a "built-in" type supporting operators like + and *, that under the hood is a typedef wrapping a numeric array of length 2. Configs like these only work together with a specific subset of types in typedef declarations.

Additional typedef configs: Extensive list of possible or theoretically viable typedef configs, where not all of them might be worth it or possible to include.

  • .Matrix : [2]usize (Dimensions) (4)
    • Allowed for typedef wrapping a numeric array with length equal to the product of the entries in the dimension array payload. Could support arithmetic operators similar to how operators would work for .Complex.
  • .ResourceHandle : void (2)
    • For integer types only. For when you want an integer that does not need to coerce in expressions.
  • .Measure : void (2, 4)
    • For numeric types only. Used for "units of measure", where each individual unit is expressed as a typedef wrapping the same measure group typedef.
  • .MeasureGroup : void (2, 4)
    • For numeric types only. Declare one typedef (wrapping a numeric type) with this config for each group of units (like 'Currency' or 'SI')
  • .ConfigurableDistinct : (DistinctConfig : enum) (2)
    • If there is need for alternative distinct type coercion behavior different from what the default .Distinct provides
  • .Encapsulated : (EncapsulationConfig : enum) (1)
    • To wrap an existing type in a typedef that has fewer methods, fields, and other syntax exposed than the original type.
    • Different encapsulation levels could be: FieldsHide, FieldsReadOnly, and MethodsOnly.
  • .FixedPointInt : usize (Amount of fractional bits) (4)
    • For integer types only. Multiplication or division of FixedPointInt instances may return a result with a fractional bit count different from that of (either) operand.
  • .NamedArray : []const u8 (1, 4)
    • For arrays only, where each comma separated "string" in the payload is mapped to a position in the array so that the entries can be accessed with field syntax. (A form of syntax sugar) Amount of entries must correspond to array length. Related to Builtin vector properties #4961. Examples:
      • const Color = typedef([4]u8, .NamedVector{.fields="r,g,b,a"})
      • const Pos3d = typedef([3]f64, .NamedVector{.fields="x,y,z"})
  • .Embedding : type (4, 5)

1: Attached namespace and "syntax views" of a type

Currently, primitive types cannot have methods, even though the only difference between methods and functions are the syntax used for calling them. This becomes an artificial restriction on primitive types.

Also, you cannot add methods to 3rd party user defined types without modifying the 3rd party source code.

With typedef, given a type T you want to define a method on, where T happens to be either a primitive or a 3rd party struct type: Simply cast or coerce T to a typedef Td wrapping T, where the attached scope of Td contains the method.

Typedef provides a form of "method extensions" (#1170), but without introducing any confusion to where a method is defined. It's defined either in the base type, or in the typewrap. Case closed. If methods from the base types are to be included in the typedef, they have to be included by explicit delegation.

It is logical for the "syntax view" of a type to mirror the type (in terms of structure) as closely as possible, and especially for a systems language like zig, but there are times when a different syntax view could be significantly more ergonomic or safe.

Typedef could be a way to offer alternative "syntax views" in zig, either by disabling syntax or enabling syntax sugar depending on the application.


2: Coercion control

Looking at typedef coercion through the lens of the shorthand TD(T,N,B,S,C), with n,m,l != 0, none of n,m,l being equal , and x reads as "not considered".

Lef => Right            // read, "Left" coerces to "Right"

TD(T,l,m,x,x) => TD(T,m,n,x,x)  // coercion down, one level
TD(T,m,n,x,x) => TD(T,n,0,x,x)  // coercion down, one level
TD(T,n,0,x,x) => TD(T,0,0,0,0)  // coercion one level down, to the base type (final level)
TD(T,l,m,x,x) => TD(T,0,0,0,0)  // coercion down, multiple levels

TD(T,n,0,x,x) => TD(T,m,n,x,x)  // coercion up, one level
TD(T,0,0,0,0) => TD(T,m,n,x,x)  // coercion up, multiple levels

As can be seen above, the hierarchical relationship between typedefs can be determined by looking at the entries N and B of the shorthand, and then typedef coercions can be categorized in various ways:

  • Amount of "levels" jumped for one coercion
  • Coercion being "upwards" or "downwards"
  • Compatible vs incompatible base types (redundant with existing type system)

From that, you can have various distinct type behavior based on typedef. For instances tdd of type typedef(T,.Distinct): Whenever a coercion to or from tdd is requested, accept or reject (compile error) depending on the "category" of the given coercion.

A coercion rule set that maps almost exactly to this comment in the distinct types issue #1595, is:

  • Only allow coercions that are one level down to, or from, a distinct instance (typedef with the distinct config)

Different rule sets can of course be applied to different use cases.


2: Reflection on typedef configs

Just like normal types support comptime reflection through @TypeInfo, so could typedef support comptime reflection on the config payload.

This would allow comptime tags to be implemented:

const TaggedType = typedef(T, .Tag{.txt="mytag"})
test "" {
  const tt = @as(TaggedType,getT());
  comptime assert(std.meta.hasTag(tt,"mytag")) // ok
  comptime assert(std.meta.hasTag(tt,"notthetag")) // fails
}

To add tags to a typedef as opposed to normal type, one would have to resort to typedef nesting.


4: Additional zig built-in types leveraging typedef

There are many builtin primitive types in zig that get special treatment with respect to syntax:

  • Indexing syntax on arrays and slices
  • Arithmetic operators on integers and floats,
  • Bitwise operators on integers
  • If-statements on optionals or booleans
  • Automatic coercion from non-optional to optional type
  • Properties like .len for arrays and slices

Some primitive types (primitive in the syntax support sense) are arguably even generic, like [N]SomeType, or ?SomeType.

The idea to be presented here is that a typedef that wraps a primitive base type will be very predictable for the compiler to deal with. The base type is a primitive, and the config payload is standardized because it is a builtin union member.

Definition of a typedef primitive Tp:

  • The plain base type T wrapped by Tp must be a primitive zig type, like bool or a numeric array.
  • The typedef config used in the declaration of Tp adds enough context to T for the compiler to treat Tp as a "new" type.
  • Tp is treated as a zig primitive in the sense that syntax can be unavailable for normal user defined types, but still be available for instances of Tp.
  • Tp is "memoized". Two typedef declarations with the same base type and config payload will not be distinguishable at comptime.
    • Note: Because of this memoization, it might be best to introduce a second keyword typedefprimitive, which starts of as an alias for typedef, but differs in the following ways:
      • Some typedef configs only work with typedef, and others only with typedefprimitive.
        • Possibly, there could be two different unions, but in the "typedef table" described above, all typedef configs would belong in the same column.
      • A primitive typedef declaration receiving the same arguments (base type and config union member) as a previous primitive typedef declaration, will not assign a new unique typedef ID (that is, add a row to the typedef table), but reuse the previous typedef ID that was associated with the given arguments.

Mathematical types can often be modeled as numeric arrays, thus defining additional mathematical "primitive" types in zig could in many cases be done by wrapping a numeric array in a typedef and attach metadata (the typedef config).

Example, complex numbers:

const Complex = std.builtin.Complex(f64); // typedefprimitive([2]f64, .Complex) under the hood, with attached namespace

test "Complex number" {
  const c1 = Complex.init(1.0, 3.2);
  const c2 = Complex.init(-43, 12.5);
  const c3 = c1.pow(c2) + c1 * c2.conj() - (c1*c1+c1)/Complex.init(12.0, 15);
  const r3 = c3.re(); // or c3.re for that matter (syntax view)
  const i3 = c3.im();
}

Having arithmetic operators working for a complex number "typedef" is IMO perfectly feasible,
as per the reasoning given in this issue comment by @skyfex. The functions or procedures necessary for complex number arithmetic does not need any control flow or to return any errors, could be defined as built-in functions, and cannot really be compared to operator overloading where everyone is able to customize how arithmetic operations behave in a multitude of ways.

The compiler would have enough information available to handle the cases where one or both of the operands in an arithmetic expression is a complex (typedef) primitive, and insert instructions corresponding to the appropriate builtin function.

This principle extends to fixed point types and matrices also, even despite the fact that arithmetic operation on these (mathematical) types may return a result with a different type than the type of either or both of the operands. Matrix4x4 * Matrix4x3 yields Matrix4x2. Still, this is unproblematic. The matrix dimensions are attached to the operands in form of the typedef config payload. Because of that, the compiler can ensure that the operands are compatible and determine the correct output "typedefprimitive".

In the codegen stage, these mathematical types would just be arrays.


5: Comptime validation of the specified base

Typedef is a comptime feature, so whenever a typedef is declared some comptime code can be triggered to run. This could include various validation functions that could be defined as builtins, basing it all on the existing reflection capabilities in zig.

Examples could be checking that a struct is a superset of another (embeds), or that the fields in a struct follow the convention set by a typedef config. Example:

const WidgetAnchorFlags = typedef(u4, .BitFlag){
  const SPARE : u4 = 0;
  const LEFT : u4 = 1;
  const RIGHT : u4 = 2;
  const TOP : u4 = 4;
  const BOTTOM : u4 = 8;
};

When the typedef config is .Bitflag, the compiler may enforce the existence of top level declarations with type equal to the base type, that each must have a unique value, that together must fully "cover" all the bits of the base type (which obviously has to be a uint), and where only one of the declarations can be something else than a power of two (the spare).

Additionally, as a bitflag could fit the definition of a typedef primitive established earlier, syntax sugar could be enabled, so that for example widgetAnchors.LEFT would map to widgetAnchors & WidgetAnchorFlags != 0. With primitive types (whether typedef based or not), syntax can be adjusted to improve usability.


Discussion and questions for the zig community:

  • Which subset of this proposal (even if it's the "empty subset") might be the best option for zig?
  • Are there any (whether possibly or certainly) problematic edge cases that does not work with the framework presented here? Examples would be great. Some I can think of myself are:
    • Combining multiple typedef configs is only possible through successive wrapping
    • There can be conflicts between typedef configs
    • If creating a "generic typedef" with a function returning a typedef, zig generic memoization could work against the principle of typedefs being distinguishable from each other at compile time.
      • Technically, a typedef is not directly a type, so memoization could perhaps be disabled for generic typedefs without it impacting the memoization behavior of regular generic types.
  • Would it be possible to make @Vector (and maybe the proposed @Matrix) into typedefs wrapping numeric arrays while still having them map down to LLVM in codegen?
  • Does stage1 have to support all language semantics supported by stage2?
  • Are there any possible use cases not mentioned here that could utilize the typedef framework?
  • Would it be feasible to incrementally implement typedef in zig by extending the amount of typedef config union members bit by bit until the desired simplicity vs features tradeoff is met?

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions