-
Notifications
You must be signed in to change notification settings - Fork 935
Use the unaligned SSE memory access primitive. #7996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…inel-linker-black-magic v4.0.x: Make C and Fortran types for MPI sentinels agree in size
Both opal_hwloc_base_get_relative_locality() and _get_locality_string() iterate over hwloc levels to build the proc locality information. Unfortunately, NUMA nodes are not in those normal levels anymore since 2.0. We have to explicitly look a the special NUMA level to get that locality info. I am factorizing the core of the iterations inside dedicated "_by_depth" functions and calling them again for the NUMA level at the end of the loops. Thanks to Hatem Elshazly for reporting the NUMA communicator split failure at https://www.mail-archive.com/[email protected]/msg33589.html It looks like only the opal_hwloc_base_get_locality_string() part is needed to fix that split, but there's no reason not to fix get_relative_locality() as well. Signed-off-by: Brice Goglin <[email protected]> (cherry picked from commit ea80a20)
not being defined. related to open-mpi#7201 Signed-off-by: Howard Pritchard <[email protected]>
These -D's are for C compilation, not Fortran compilation. Remove this useless statement. Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit f4a47a5)
Automake's Fortran compilation rules inexplicably use CPPFLAGS and AM_CPPFLAGS. Unfortunately, this can cause problems in some cases (e.g., picking up already-installed mpi.mod in a system-default include search path). So in relevant module-using Fortran compilation Makefile.am's, zero out CPPFLAGS and AM_CPPFLAGS. This has a side-effect of requiring that we compile the one .c file in the F08 library in a new, separate subdirectory (with its own Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out). Signed-off-by: Jeff Squyres <[email protected]> Signed-off-by: Gilles Gouaillardet <[email protected]> (cherry picked from commit ab398f4)
…win-again v4.0.x: Fortran fixes
- increase number of max segments to allow application be launched on some Ubuntu configurations Signed-off-by: Sergey Oblomov <[email protected]> (cherry picked from commit f742f28)
…egments-v4.0 OSHMEM/SEGMENTS: increase max number of segments - v4.0
Refs open-mpi#7362 Signed-off-by: Brice Goglin <[email protected]> (cherry picked from commit 329d445)
Signed-off-by: Geoffrey Paulsen <[email protected]>
Adding PMIx v3.1.5rc2 from: https://github.com/openpmix/openpmix/releases/tag/v3.1.5rc2 Signed-off-by: Geoffrey Paulsen <[email protected]>
….5rc2 Adding PMIx v3.1.5rc2
…4.0.3rc4 Reving to VERSION v4.0.3rc4
Build was broken by mistake in commit d40662edc41a5a4d09ae690b640cfdeeb24e15a1 Fixes open-mpi#7362 Signed-off-by: Brice Goglin <[email protected]> (cherry picked from commit 907ad85)
Topic/pr7201 to v40x
- pgcc18 defines __GNUC__ similar to Intel compilers. So we must check for pgi higher up, or else configury will mistake it for gcc. Signed-off-by: Austen Lauria <[email protected]> (cherry picked from commit 14785de)
v4.0.x: Fix pgcc18 support.
This commit addresses two issues in osc/rdma:
1) It is erroneous to attach regions that overlap. This was being
allowed but the standard does not allow overlapping attachments.
2) Overlapping registration regions (4k alignment of attachments)
appear to be allowed. Add attachment bases to the bookeeping
structure so we can keep better track of what can be detached.
It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.
References open-mpi#7384
Signed-off-by: Nathan Hjelm <[email protected]>
(cherry picked from commit 6649aef)
Signed-off-by: Nathan Hjelm <[email protected]>
This commit increaes the osc_rdma_max_attach variable from 32 to 64. The new default is kept low due to the small number of registration resources on some systems (Cray Aries). A larger max attachement value can be set by the user on other systems. Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit 54c8233) Signed-off-by: Nathan Hjelm <[email protected]>
…ize zero Signed-off-by: Joseph Schuchart <[email protected]> (cherry picked from commit 06bbcf4)
…x_ci_for_release_branches_v4 Enabled Mellanox CI for release branches (changes for v4.0.x branch).
Signed-off-by: Artem Ryabov <[email protected]>
Correctly set baseptr in contiguous shared memory window with local size zero (v4.0.x)
This commit changes the behavior of the individual sharedfp component. If the component cannot create either the datafile or the metadatafile during File_open, no error is being raised going forward. This allows applications that do not use shared file pointer operations to continue execution without any issue. If the user however subsequently calls MPI_File_write_shared or similar operations, an error will be raised. Fixes issue open-mpi#7429 Signed-off-by: Edgar Gabriel <[email protected]> (cherry picked from commit df6e3e5)
Signed-off-by: Geoffrey Paulsen <[email protected]>
The CI is triggered only upon a PR creation or by special PR comments. Signed-off-by: Artem Ryabov <[email protected]>
Signed-off-by: Thomas Jahns <[email protected]>
Signed-off-by: Artem Ryabov <[email protected]>
- fix a typo `alloc_shared_contig` to `alloc_shared_noncontig` - correct the value of `blocking_fence` Signed-off-by: Tsubasa Yanagibashi <[email protected]> (cherry picked from commit a07a83d)
…ummy-module-v4.0.x sharedfp/individual: defer error when not being able to open datafile
do not check some input parameters when an {in,out}degree is zero
Thanks Junchao Zhang for analyzing and reporting this issue.
Signed-off-by: Gilles Gouaillardet <[email protected]>
(cherry picked from commit 5655d64)
…n-cint-not-equal-to-finteger v4.1.x: fortran.m4: disallow when sizeof(int) != sizeof(INTEGER)
mpi/c: fix param checks in [I]Neighbor_alltoall{v,w}
Signed-off-by: Artem Polyakov <[email protected]> (cherry picked from commit c72f295)
Signed-off-by: Joshua Hursey <[email protected]>
Add logic to handle different architectural capabilities Detect the compiler flags necessary to build specialized versions of the MPI_OP. Once the different flavors (AVX512, AVX2, AVX) are built, detect at runtime which is the best match with the current processor capabilities. Add validation checks for loadu 256 and 512 bits. Add validation tests for MPI_Op. Signed-off-by: Jeff Squyres <[email protected]> Signed-off-by: Gilles Gouaillardet <[email protected]> Signed-off-by: dongzhong <[email protected]> Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit 14b3c70)
…ngth v4.1.x: Add supports for MPI_OP using AVX512, AVX2 and MMX
v4.1.x: schizo/slurm: Fix binding detection
v4.1.x: schizo/jsm: Disable binding when direct launched
bugfix: provider selection would not differentiate between ipv4 and ipv6 addresses which would cause some nodes to be unable to communicate between each other. Adding a check for address format to provider selection to ensure that all nodes use the same address format. Signed-off-by: Nikola Dancejic <[email protected]> (cherry picked from commit 7e46371)
The missing include file causes an error when using an external version of LibEvent. Signed-off-by: tomhers <[email protected]> (cherry picked from commit 88f9d2c)
Signed-off-by: Geoffrey Paulsen <[email protected]>
…r_SLURM_binding Adding SLURM binding policy change to README
Signed-off-by: Joseph Schuchart <[email protected]> (cherry picked from commit eebc451)
(v4.1.x) osc/rdma: fail query_btls if no endpoint for non-local peer is found
v4.1.x: common/ofi: added address format check to fix provider selection
If building Open MPI with sanitizers, e.g $ configure CC=clang CFLAGS=-fsanitize=address .... configure test programs are also build with the sanitizers and will report errors resulting in configure to fail. Signed-off-by: Christoph Niethammer <[email protected]>
…de_file v4.1.x: BTL/OFI: Fix missing include file.
…atter V4.1.x hcoll reduce scatter
…v4.1.x v4.1: Fix memory leak in configure, which prevents leak sanitizer usage
The default algorithm selections were out of date and not performing
well. After gathering data from OMPI developers, new default algorithm
decisions were selected for:
allgather
allgatherv
allreduce
alltoall
alltoallv
barrier
bcast
gather
reduce
reduce_scatter_block
reduce_scatter
scatter
These results were gathered using the ompi-collectives-tuning package
and then averaged amongst the results gathered from multiple OMPI
developers on their clusters.
You can access the graphs and averaged data here:
https://drive.google.com/drive/folders/1MV5E9gN-5tootoWoh62aoXmN0jiWiqh3
Signed-off-by: William Zhang <[email protected]>
(cherry picked from commit ce40cfb)
coll/tuned: Change the default collective algorithm selection
The btl/ofi does not currently utilize the common ofi include/exclude list. Added verification code similar to the mtl/ofi that will check if the info object is in the include or exclude list. If it isn't in the include list or is in the exclude list, validate_info will return OPAL_ERROR. The btl/ofi will no longer pass a provider name as a hint when calling getinfo, instead filtering the provider during validate_info. This patch also moves the is_in_list MTL function into common code and adds additional debugging output to the BTL to match the MTL standard. Signed-off-by: William Zhang <[email protected]> (cherry picked from commit 9b8f463)
(`prte_hwloc_base_get_locality_string` never returns locality string with L0). Signed-off-by: Mikhail Kurnosov <[email protected]> (cherry picked from commit 4708458)
…strng v4.1.x: opal/hwloc: fix a typo in parsing locality string: L0 changed to L1
v4.1.x: btl/ofi: Use common provider include/exclude list
Alter the test to validate misaligned data. Fixes open-mpi#7954. Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit b6d71aa) Signed-off-by: Brian Barrett <[email protected]>
Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit c4e88a4) Signed-off-by: Brian Barrett <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #7957 into the v4.1.x branch.