Skip to content

Conversation

xingxue-ibm
Copy link
Contributor

The direct lock data structure has bit 0 (the least significant bit) of the first 32-bit word set to 1 to indicate it is a direct lock. On the other hand, the first word (in 32-bit mode) or first two words (in 64-bit mode) of an indirect lock are the address of the entry allocated from the indirect lock table. The runtime checks bit 0 of the first 32-bit word to tell if this is a direct or an indirect lock. This works fine for 32-bit and 64-bit little-endian because its memory layout of a 64-bit address is (low word, high word). However, this causes problems for big-endian where the memory layout of a 64-bit address is (high word, low word). If an address of the indirect lock table entry is something like 0x110035300, i.e., (0x1, 0x10035300), it is treated as a direct lock. This patch defines struct kmp_base_tas_lock with the ordering of the two 32-bit members flipped for big-endian PPC64 so that when checking/setting tags in member poll, the second word (the low word) is used. This patch also changes places where poll is not already explicitly specified for checking/setting tags.

@xingxue-ibm xingxue-ibm added the openmp:libomp OpenMP host runtime label Jan 23, 2024
@xingxue-ibm xingxue-ibm self-assigned this Jan 23, 2024
@@ -120,8 +120,15 @@ extern void __kmp_validate_locks(void);

struct kmp_base_tas_lock {
// KMP_LOCK_FREE(tas) => unlocked; locked: (gtid+1) of owning thread
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ && KMP_ARCH_PPC64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question whether KMP_ARCH_PPC64 should be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to use KMP_ARCH_PPC64 to limit the scope. This issue may also affect s390x which I believe is also big-endian. I've changed to guard the order flipping with #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ && __LP64__. @iii-i please let me know if it does not work for s390x. The reason for guarding it with __LP64__ is the original ordering works as designed in 32-bit mode both little- and big-endian. If these members are flipped for 32-bit big-endian, tags of a direct lock will take the second word while the address of the entry allocated from the indirect lock table will take the first word.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to use KMP_ARCH_PPC64 to limit the scope. This issue may also affect s390x which I believe is also big-endian.

Ya, I was also thinking about other big-endian processors that might be affected like mips64 or if sparc support is eventually added sparc64.

@iii-i
Copy link
Member

iii-i commented Jan 24, 2024

How did you find this? I would like to check if this affects s390x. If the repro is simple enough, maybe consider adding a test?

@xingxue-ibm xingxue-ibm force-pushed the openmp-lock-big-endian branch from 52fa82c to d47954a Compare January 24, 2024 15:37
@xingxue-ibm
Copy link
Contributor Author

How did you find this? I would like to check if this affects s390x. If the repro is simple enough, maybe consider adding a test?

There were 36 libomp/runtime LIT test affected on AIX in 64-bit, including critical/omp_critical_with_hint.c, and many test cases in the test/parallel and worksharing directories. Investigating these failures shows the root cause is the memory layout of the address in 64-bit big-endian. Since s390x is big-endian, this may affect s390x as well. In terms of adding a new test case for this issue, I am not sure it will be necessary because without this patch, test cases mentioned above would be failing.

@iii-i iii-i requested a review from uweigand January 30, 2024 01:42
@iii-i
Copy link
Member

iii-i commented Jan 30, 2024

I think this currently works on s390x by accident, since heap and executable addresses look roughly like this: 0x2aa2e7e0000. The leading 0x2aa tends to stay there regardless of ASLR, and it is even.

The changes look good to me. I tested them on s390x and there are no test failures.

@xingxue-ibm
Copy link
Contributor Author

Gentle ping...Are there any further comments or concerns?

Copy link
Collaborator

@kkwli kkwli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks.

@xingxue-ibm xingxue-ibm merged commit ac97562 into llvm:main Feb 13, 2024
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Feb 14, 2024
…mbers swapped for big-endian (llvm#79188)

The direct lock data structure has bit `0` (the least significant bit)
of the first 32-bit word set to `1` to indicate it is a direct lock. On
the other hand, the first word (in 32-bit mode) or first two words (in
64-bit mode) of an indirect lock are the address of the entry allocated
from the indirect lock table. The runtime checks bit `0` of the first
32-bit word to tell if this is a direct or an indirect lock. This works
fine for 32-bit and 64-bit little-endian because its memory layout of a
64-bit address is (`low word`, `high word`). However, this causes
problems for big-endian where the memory layout of a 64-bit address is
(`high word`, `low word`). If an address of the indirect lock table
entry is something like `0x110035300`, i.e., (`0x1`, `0x10035300`), it
is treated as a direct lock. This patch defines `struct
kmp_base_tas_lock` with the ordering of the two 32-bit members flipped
for big-endian PPC64 so that when checking/setting tags in member
`poll`, the second word (the low word) is used. This patch also changes
places where `poll` is not already explicitly specified for
checking/setting tags.

(cherry picked from commit ac97562)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Feb 16, 2024
…mbers swapped for big-endian (llvm#79188)

The direct lock data structure has bit `0` (the least significant bit)
of the first 32-bit word set to `1` to indicate it is a direct lock. On
the other hand, the first word (in 32-bit mode) or first two words (in
64-bit mode) of an indirect lock are the address of the entry allocated
from the indirect lock table. The runtime checks bit `0` of the first
32-bit word to tell if this is a direct or an indirect lock. This works
fine for 32-bit and 64-bit little-endian because its memory layout of a
64-bit address is (`low word`, `high word`). However, this causes
problems for big-endian where the memory layout of a 64-bit address is
(`high word`, `low word`). If an address of the indirect lock table
entry is something like `0x110035300`, i.e., (`0x1`, `0x10035300`), it
is treated as a direct lock. This patch defines `struct
kmp_base_tas_lock` with the ordering of the two 32-bit members flipped
for big-endian PPC64 so that when checking/setting tags in member
`poll`, the second word (the low word) is used. This patch also changes
places where `poll` is not already explicitly specified for
checking/setting tags.

(cherry picked from commit ac97562)
@pointhex pointhex mentioned this pull request May 7, 2024
@xingxue-ibm xingxue-ibm deleted the openmp-lock-big-endian branch May 24, 2024 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openmp:libomp OpenMP host runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants