Skip to content

Conversation

@juj
Copy link
Collaborator

@juj juj commented Mar 21, 2022

This allows users to get a compile time warning if they try to pass a statically null pointer value to a system header that is not intended to accept such a value.

@juj juj force-pushed the attribute_nonnull branch 2 times, most recently from 7c3eeff to a6feeea Compare March 21, 2022 16:06
@sbc100
Copy link
Collaborator

sbc100 commented Mar 21, 2022

Is it worth using a macro here in case these headers ever need to be used on a non-clang compiler? (could be it shorter and less visually noisy too).

@sbc100
Copy link
Collaborator

sbc100 commented Mar 21, 2022

Makes me wonder if we also want to use __restrict on some of these?

@juj
Copy link
Collaborator Author

juj commented Mar 22, 2022

I did actually first do this with a EMSCRIPTEN_NONNULL directive, however backed out on it, because

int emscripten_futex_wait(volatile void *addr EMSCRIPTEN_NONNULL, uint32_t val, double maxWaitMilliseconds);

vs

int emscripten_futex_wait(volatile void *addr __attribute__((nonnull)), uint32_t val, double maxWaitMilliseconds);

I found that in IDE (and looks like also on GitHub), using EMSCRIPTEN_NONNULL loses syntax highlighting, so visually it did parse off worse to my eye than directly using __attribute__((unused)) which at least highlights partially.

Also an indirection can cause developers to need to look up what the macro means, if they don't guess its meaning (or doubt themselves).

Finally, our system headers aren't particularly include clean anymore, so I wish not to add to the include chain dependency. Having almost every include file include one more header adds to that load. (not sure how much that matters for compilation unit perf though nowadays.. in the past ~decade+ ago it used to be noticeable, but I might be a dinosaur on this)

in case these headers ever need to be used on a non-clang compiler?

I wonder if that should be a use case even. In a distant future if that becomes a thing, we can look into it then?

@juj
Copy link
Collaborator Author

juj commented Mar 22, 2022

Makes me wonder if we also want to use __restrict on some of these?

My understanding is that restrict enables more optimizations on the implementation side of the function, not outside the function on the caller side (expect than to warn if one does attempt to pass aliasing pointers to the function). Most of these functions are implemented in JS side, so there wouldn't be any optimizations that LLVM would be able to perform there.

Skimming all of these declarations, I was not able to find any that would receive two or more pointers of the same type (pointers to different types are already always assumed to not alias by the compiler).

@juj
Copy link
Collaborator Author

juj commented Mar 22, 2022

I learn that there are two different types of nonnull annotations in Clang: __attribute__((nonnull)) and _Nonnull.

The first tells the compiler that it can optimize based on the assumption that the pointer will never be null. The second annotation does not affect optimizations, but simply emits diagnostics if a static null is passed. Adjusted the code to use _Nonnull in the places where we actually have error checking of inputs in place that should not get optimized out.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 22, 2022

I did actually first do this with a EMSCRIPTEN_NONNULL directive, however backed out on it, because

int emscripten_futex_wait(volatile void *addr EMSCRIPTEN_NONNULL, uint32_t val, double maxWaitMilliseconds);

vs

int emscripten_futex_wait(volatile void *addr __attribute__((nonnull)), uint32_t val, double maxWaitMilliseconds);

I found that in IDE (and looks like also on GitHub), using EMSCRIPTEN_NONNULL loses syntax highlighting, so visually it did parse off worse to my eye than directly using __attribute__((unused)) which at least highlights partially.

That argument makes some sense to me.

Also an indirection can cause developers to need to look up what the macro means, if they don't guess its meaning (or doubt themselves).

Finally, our system headers aren't particularly include clean anymore, so I wish not to add to the include chain dependency. Having almost every include file include one more header adds to that load. (not sure how much that matters for compilation unit perf though nowadays.. in the past ~decade+ ago it used to be noticeable, but I might be a dinosaur on this)

There are some cases where include-bloat is an issue, but for things like this example, we already have split up our includes nad have a very-tiny header called em_types.h that is designed to be included everywhere. That is why em_types.h was created. Tiny C-only headers like this have near-zero cost compared the C++ header bloat the most compilation unit experience. I think we should be including em_types.h everywhere, and assume is has a non-measurable overhead unless proven otherwise.

in case these headers ever need to be used on a non-clang compiler?

I wonder if that should be a use case even. In a distant future if that becomes a thing, we can look into it then?

It used to be the case the the emscripten headers we designed to also be running able through gcc and/or msvc but I don't think I've actually don't that in years, so maybe its fine to make them clang-only? (I guess both these new attributes are gcc-compatible though?)

uint16_t emscripten_atomic_exchange_u16(void/*uint16_t*/ *addr, uint16_t newVal);
uint32_t emscripten_atomic_exchange_u32(void/*uint32_t*/ *addr, uint32_t newVal);
uint64_t emscripten_atomic_exchange_u64(void/*uint64_t*/ *addr, uint64_t newVal);
uint8_t emscripten_atomic_exchange_u8(void/*uint8_t*/ *addr __attribute__((nonnull)), uint8_t newVal);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it also be nice to have the compiler warning if a user passes NULL here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler will warn if a user passes a null here - that is what __attribute__((nonnull)) will do. But in addition to issuing the warning, the compiler can optimize code inside the function, and at the calls to that function with the expectation that any non-statically-inferrable value passed here will not be a null.

So i.e. if one had silly code like

void foo(const char *x __attribute__((nonnull)))
{
	printf("%s\n", x);
}

extern const char *bar(void);

int main()
{
	const char *s = bar();
	foo(s);
	if (!s) // This can be assumed never to be true, since the contract with foo says it can only be called with nonnull pointers. Clang could issue a warning here.
	{
		// we cannot get here!
		printf("s was null!\n");
	}
}

However now I see that Clang does not actually do this kind of analysis when trying out in practice. :/

To answer the difference between __attribute__((nonnull)) and _Nonnull: if instead the function foo read

void foo(const char *x __attribute__((nonnull)))
{
    if (x)
	printf("%s\n", x);
}

the compiler will issue a diagnostic warning, because if (x) is tautologically true: the contract with x says it will never be null.

So if we have a function that does want to do something different when x actually is null, then we want to just use the weaker _Nonnull semantics:

void foo(const char * _Nonnull x) // issue a warning if x is null, but don't affect codegen
{
    if (x)
	printf("%s\n", x);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, from your earlier description had though that one triggered optimization and the triggers warnings, but its seem that the both trigger warning and in addition the the __attribute__ form allows optimization.

This makes sense to me now.

if (!plce.isActive) {
printf("Requesting pointer lock..\n");
ret = emscripten_request_pointerlock(0, 1);
ret = emscripten_request_pointerlock("#canvas", 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the effect of passing zero here previously?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the old semantics 0 meant to use #canvas, but that support was deprecated.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % formatting

@juj juj force-pushed the attribute_nonnull branch from 02f0da5 to 98e3d98 Compare March 6, 2023 12:09
@juj juj enabled auto-merge (squash) March 6, 2023 12:32
@juj
Copy link
Collaborator Author

juj commented Mar 6, 2023

Updated the PR for formatting.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still lgtm.

In practice, did you notice any new warnings showing up in real world code? i.e. did this reveal any latent bugs in real code?

@node_pthreads
def test_emscripten_futexes(self):
self.set_setting('USE_PTHREADS')
self.emcc_args += ['-Wno-nonnull']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment as to why this warning is being disabled? Is it strictly needed for the test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the test is explicitly checking that passing null returns in EINVAL error, which would trigger a diagnostic in the test.

@juj juj merged commit 5441b40 into emscripten-core:main Mar 6, 2023
@juj
Copy link
Collaborator Author

juj commented Mar 7, 2023

In practice, did you notice any new warnings showing up in real world code? i.e. did this reveal any latent bugs in real code?

I recall I started this PR after I saw a bug in Unity's code that would have been caught if these annotations had existed. I do not know this PR revealing any new bugs in Emscripten's codebase itself.

impact-maker pushed a commit to impact-maker/emscripten that referenced this pull request Mar 17, 2023
…s. (emscripten-core#16545)

* Add use of Clang  __attribute__((nonnull)) in Emscripten system headers.

* Fix annotation on emscripten_futex_wake

* Remove nonnull on emscripten_fetch_wait

* Fix annotation and tests

* Add use of _Nonnull attribute

* Fix test

* Break up long lines in webgl2_ext.h

* Add notes about null pointers in emmalloc.h
impact-maker pushed a commit to impact-maker/emscripten that referenced this pull request Mar 17, 2023
…s. (emscripten-core#16545)

* Add use of Clang  __attribute__((nonnull)) in Emscripten system headers.

* Fix annotation on emscripten_futex_wake

* Remove nonnull on emscripten_fetch_wait

* Fix annotation and tests

* Add use of _Nonnull attribute

* Fix test

* Break up long lines in webgl2_ext.h

* Add notes about null pointers in emmalloc.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants