[libcxx] Optimize std::generate for segmented iterators #163006

c8ef · 2025-10-11T15:23:10Z

This patch attempts to optimize the performance of std::generate for segmented iterators. Below are the benchmark numbers from libcxx\test\benchmarks\algorithms\modifying\generate.bench.cpp. Test cases that use segmented iterators have also been added.

before

std::generate(deque<int>)/32           194 ns          193 ns      3733333
std::generate(deque<int>)/50           276 ns          276 ns      2488889
std::generate(deque<int>)/1024        5096 ns         5022 ns       112000
std::generate(deque<int>)/8192       40806 ns        40806 ns        17231

after

std::generate(deque<int>)/32           106 ns          105 ns      6400000
std::generate(deque<int>)/50           139 ns          138 ns      4977778
std::generate(deque<int>)/1024        2713 ns         2699 ns       248889
std::generate(deque<int>)/8192       18983 ns        19252 ns        37333

llvmbot · 2025-10-11T15:23:40Z

@llvm/pr-subscribers-libcxx

Author: Connector Switch (c8ef)

Changes

Part of #102817.

Full diff: https://github.com/llvm/llvm-project/pull/163006.diff

2 Files Affected:

(modified) libcxx/include/__algorithm/generate.h (+25-2)
(modified) libcxx/test/std/algorithms/alg.modifying.operations/alg.generate/generate.pass.cpp (+11)

diff --git a/libcxx/include/__algorithm/generate.h b/libcxx/include/__algorithm/generate.h
index c95b527402f5d..91e2ada7daf77 100644
--- a/libcxx/include/__algorithm/generate.h
+++ b/libcxx/include/__algorithm/generate.h
@@ -9,7 +9,10 @@
 #ifndef _LIBCPP___ALGORITHM_GENERATE_H
 #define _LIBCPP___ALGORITHM_GENERATE_H
 
+#include <__algorithm/for_each_segment.h>
 #include <__config>
+#include <__iterator/segmented_iterator.h>
+#include <__type_traits/enable_if.h>
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
 #  pragma GCC system_header
@@ -17,13 +20,33 @@
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-template <class _ForwardIterator, class _Generator>
+template <class _ForwardIterator, class _Sent, class _Generator>
 inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
-generate(_ForwardIterator __first, _ForwardIterator __last, _Generator __gen) {
+__generate(_ForwardIterator __first, _Sent __last, _Generator __gen) {
   for (; __first != __last; ++__first)
     *__first = __gen();
 }
 
+#ifndef _LIBCPP_CXX03_LANG
+template <class _SegmentedIterator,
+          class _Generator,
+          __enable_if_t<__is_segmented_iterator_v<_SegmentedIterator>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
+_SegmentedIterator __generate(_SegmentedIterator __first, _SegmentedIterator __last, _Generator& __gen) {
+  using __local_iterator_t = typename __segmented_iterator_traits<_SegmentedIterator>::__local_iterator;
+  std::__for_each_segment(__first, __last, [&](__local_iterator_t __lfirst, __local_iterator_t __llast) {
+    std::__generate(__lfirst, __llast, __gen);
+  });
+  return __last;
+}
+#endif // !_LIBCPP_CXX03_LANG
+
+template <class _ForwardIterator, class _Generator>
+inline _LIBCPP_HIDE_FROM_ABI
+_LIBCPP_CONSTEXPR_SINCE_CXX20 void generate(_ForwardIterator __first, _ForwardIterator __last, _Generator __gen) {
+  std::__generate(__first, __last, __gen);
+}
+
 _LIBCPP_END_NAMESPACE_STD
 
 #endif // _LIBCPP___ALGORITHM_GENERATE_H
diff --git a/libcxx/test/std/algorithms/alg.modifying.operations/alg.generate/generate.pass.cpp b/libcxx/test/std/algorithms/alg.modifying.operations/alg.generate/generate.pass.cpp
index 29d32d7156742..4591d7ece4645 100644
--- a/libcxx/test/std/algorithms/alg.modifying.operations/alg.generate/generate.pass.cpp
+++ b/libcxx/test/std/algorithms/alg.modifying.operations/alg.generate/generate.pass.cpp
@@ -16,6 +16,7 @@
 
 #include <algorithm>
 #include <cassert>
+#include <deque>
 
 #include "test_macros.h"
 #include "test_iterators.h"
@@ -51,12 +52,22 @@ test()
     assert(ia[3] == 1);
 }
 
+void deque_test() {
+  int sizes[] = {0, 1, 2, 1023, 1024, 1025, 2047, 2048, 2049};
+  for (const int size : sizes) {
+    std::deque<int> d(size);
+    std::generate(d.begin(), d.end(), gen_test());
+    assert(std::all_of(d.begin(), d.end(), [](int x) { return x == 1; }));
+  }
+}
+
 int main(int, char**)
 {
     test<forward_iterator<int*> >();
     test<bidirectional_iterator<int*> >();
     test<random_access_iterator<int*> >();
     test<int*>();
+    deque_test();
 
 #if TEST_STD_VER > 17
     static_assert(test_constexpr());

philnik777

Can we instead just forward to std::for_each?

c8ef · 2025-10-13T10:37:04Z

Can we instead just forward to std::for_each?

You mean like following?

template<class ForwardIt, class Generator>
void generate(ForwardIt first, ForwardIt last, Generator gen) {
    std::for_each(first, last, [&gen](auto& element) {
        element = gen();
    });
}

Will test this tonight.

c8ef · 2025-10-13T10:39:02Z

To some extent, I think the current implementation is also acceptable since it uses the for_each_segment utility.

c8ef · 2025-10-13T14:56:03Z

std::for_each(first, last, [&gen](auto& element) {
        element = gen();
    });

std::generate(deque<int>)/32           220 ns          220 ns      3200000
std::generate(deque<int>)/50           321 ns          322 ns      2133333
std::generate(deque<int>)/1024        5808 ns         5720 ns       112000
std::generate(deque<int>)/8192       46257 ns        46527 ns        15448

Forwarding this to std::for_each seems to make it even slower than the current implementation.

template <class _ForwardIterator, class _Generator>
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void
generate(_ForwardIterator __first, _ForwardIterator __last, _Generator __gen) {
  std::for_each(__first, __last, [&](auto& __element) { __element = __gen(); });
}

philnik777 · 2025-10-13T15:04:17Z

Have you enabled optimizations?

c8ef · 2025-10-13T15:14:16Z

std::generate(deque<int>)/32          16.4 ns         16.4 ns     44800000
std::generate(deque<int>)/50          24.9 ns         25.1 ns     28000000
std::generate(deque<int>)/1024         288 ns          289 ns      2488889
std::generate(deque<int>)/8192        2284 ns         2295 ns       320000

std::generate(deque<int>)/32          16.5 ns         16.1 ns     40727273
std::generate(deque<int>)/50          24.9 ns         25.1 ns     28000000
std::generate(deque<int>)/1024         288 ns          289 ns      2488889
std::generate(deque<int>)/8192        2192 ns         2197 ns       320000

Have you enabled optimizations?

It seems that the default ./bin/llvm-lit generate.bench.cpp does not enable this (or my configuration is incorrect). They have the expected performance.

c8ef · 2025-10-13T15:53:23Z

This looks weird...

  | -- Performing Test HAVE_STD_REGEX -- failed to compile
  | -- Compiling and running to test HAVE_GNU_POSIX_REGEX
  | -- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile
  | -- Compiling and running to test HAVE_POSIX_REGEX
  | -- Performing Test HAVE_POSIX_REGEX -- failed to compile
  | CMake Error at CMakeLists.txt:315 (message):
  | Failed to determine the source files for the regular expression backend

c8ef · 2025-10-13T16:34:06Z

While both implementations offer the same performance, the for_each forward approach cannot properly build the benchmark, failing specifically on regex testing. It's odd because locally, using ninja cxx to build a file like #include <regex> doesn't produce an error.

@philnik777 Could you please help take a look?

philnik777 · 2025-10-13T18:03:20Z

This usually indicates that your code doesn't work in C++14 mode. You should try to run lit with e.g. --param=std=c++03 and see whether that fails.

c8ef · 2025-10-14T18:29:36Z

This usually indicates that your code doesn't work in C++14 mode. You should try to run lit with e.g. --param=std=c++03 and see whether that fails.

# .---command stderr------------
# | In file included from C:\\llvm-project\libcxx\test\std\algorithms\robust_against_proxy_iterators_lifetime_bugs.pass.cpp:
14:
# | In file included from C://llvm-project/build/libcxx/test-suite-install/include/c++/v1/algorithm:1861:
# | C://llvm-project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/for_each.h:34:5: error: no matching function
 for call to '__invoke'
# |    34 |     std::__invoke(__f, std::__invoke(__proj, *__first));
# |       |     ^~~~~~~~~~~~~
# | C://llvm-project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/for_each.h:57:8: note: in instantiation of f
unction template specialization 'std::__for_each<LifetimeIterator, LifetimeIterator, (lambda at C://llvm-project/build/libcx
x/test-suite-install/include/c++/v1/__algorithm/generate.h:24:34), std::__identity>' requested here
# |    57 |   std::__for_each(__first, __last, __f, __proj);
# |       |        ^
# | C://llvm-project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/generate.h:24:8: note: in instantiation of f
unction template specialization 'std::for_each<LifetimeIterator, (lambda at C://llvm-project/build/libcxx/test-suite-install
/include/c++/v1/__algorithm/generate.h:24:34)>' requested here
# |    24 |   std::for_each(__first, __last, [&](auto& __element) { __element = __gen(); });
# |       |        ^
# | C:\\llvm-project\libcxx\test\std\algorithms\robust_against_proxy_iterators_lifetime_bugs.pass.cpp:710:47: note: in insta
ntiation of function template specialization 'std::generate<LifetimeIterator, (lambda at C://llvm-project/libcxx/test/std/al
gorithms/robust_against_proxy_iterators_lifetime_bugs.pass.cpp:651:14)>' requested here
# |   710 |   test(simple_in, [&](I b, I e) { (void) std::generate(b, e, gen); });
# |       |                                               ^
# | C:\\llvm-project\libcxx\test\std\algorithms\robust_against_proxy_iterators_lifetime_bugs.pass.cpp:710:33: note: while su
bstituting into a lambda expression here
# |   710 |   test(simple_in, [&](I b, I e) { (void) std::generate(b, e, gen); });
# |       |                                 ^
# | C:\\llvm-project\libcxx\test\std\algorithms\robust_against_proxy_iterators_lifetime_bugs.pass.cpp:763:3: note: in instan
tiation of function template specialization 'test<LifetimeIterator>' requested here
# |   763 |   test<LifetimeIterator>();
# |       |   ^
# | C://llvm-project/build/libcxx/test-suite-install/include/c++/v1/__type_traits/invoke.h:88:69: note: candidate template i
gnored: substitution failure [with _Args = <(lambda at C://llvm-project/build/libcxx/test-suite-install/include/c++/v1/__alg
orithm/generate.h:24:34) &, LifetimeIterator::Reference>]: no type named 'type' in 'std::__invoke_result_impl<void, (lambda at C:/Users/Mario/Doc
uments/llvm-project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/generate.h:24:34) &, LifetimeIterator::Reference>'
# |    85 | using __invoke_result_t _LIBCPP_NODEBUG = typename __invoke_result<_Args...>::type;
# |       | ~~~~~
# |    86 |
# |    87 | template <class... _Args>
# |    88 | _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR __invoke_result_t<_Args...> __invoke(_Args&&... __args)
# |       |                                                                     ^
# | In file included from C:\\llvm-project\libcxx\test\std\algorithms\robust_against_proxy_iterators_lifetime_bugs.pass.cpp:
14:
# | In file included from C://llvm-project/build/libcxx/test-suite-install/include/c++/v1/algorithm:1861:
# | C://llvm-project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/for_each.h:34:5: error: no matching function
 for call to '__invoke'
# |    34 |     std::__invoke(__f, std::__invoke(__proj, *__first));
# |       |     ^~~~~~~~~~~~~
# | C://llvm-project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/for_each.h:57:8: note: in instantiation of f
unction template specialization 'std::__for_each<ConstexprIterator, ConstexprIterator, (lambda at C://llvm-project/build/lib
cxx/test-suite-install/include/c++/v1/__algorithm/generate.h:24:34), std::__identity>' requested here
# |    57 |   std::__for_each(__first, __last, __f, __proj);
# |       |        ^
# | C://llvm-project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/generate.h:24:8: note: in instantiation of f
unction template specialization 'std::for_each<ConstexprIterator, (lambda at C://llvm-project/build/libcxx/test-suite-instal
l/include/c++/v1/__algorithm/generate.h:24:34)>' requested here
# |    24 |   std::for_each(__first, __last, [&](auto& __element) { __element = __gen(); });
# |       |        ^
# | C:\\llvm-project\libcxx\test\std\algorithms\robust_against_proxy_iterators_lifetime_bugs.pass.cpp:710:47: note: in insta
ntiation of function template specialization 'std::generate<ConstexprIterator, (lambda at C://llvm-project/libcxx/test/std/a
lgorithms/robust_against_proxy_iterators_lifetime_bugs.pass.cpp:651:14)>' requested here
# |   710 |   test(simple_in, [&](I b, I e) { (void) std::generate(b, e, gen); });
# |       |                                               ^
# | C:\\llvm-project\libcxx\test\std\algorithms\robust_against_proxy_iterators_lifetime_bugs.pass.cpp:710:33: note: while su
bstituting into a lambda expression here
# |   710 |   test(simple_in, [&](I b, I e) { (void) std::generate(b, e, gen); });
# |       |                                 ^
# | C:\\llvm-project\libcxx\test\std\algorithms\robust_against_proxy_iterators_lifetime_bugs.pass.cpp:765:17: note: in insta
ntiation of function template specialization 'test<ConstexprIterator>' requested here
# |   765 |   static_assert(test<ConstexprIterator>());
# |       |                 ^
# | C://llvm-project/build/libcxx/test-suite-install/include/c++/v1/__type_traits/invoke.h:88:69: note: candidate template i
gnored: substitution failure [with _Args = <(lambda at C://llvm-project/build/libcxx/test-suite-install/include/c++/v1/__alg
orithm/generate.h:24:34) &, ConstexprIterator::Reference>]: no type named 'type' in 'std::__invoke_result_impl<void, (lambda at C:/Users/Mario/Do
cuments/llvm-project/build/libcxx/test-suite-install/include/c++/v1/__algorithm/generate.h:24:34) &, ConstexprIterator::Reference>'
# |    85 | using __invoke_result_t _LIBCPP_NODEBUG = typename __invoke_result<_Args...>::type;
# |       | ~~~~~
# |    86 |
# |    87 | template <class... _Args>
# |    88 | _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR __invoke_result_t<_Args...> __invoke(_Args&&... __args)
# |       |                                                                     ^
# | C:\\llvm-project\libcxx\test\std\algorithms\robust_against_proxy_iterators_lifetime_bugs.pass.cpp:765:17: error: static
assertion expression is not an integral constant expression
# |   765 |   static_assert(test<ConstexprIterator>());
# |       |                 ^~~~~~~~~~~~~~~~~~~~~~~~~
# | 3 errors generated.
# `-----------------------------
# error: command failed with exit status: 1

I still haven't figured out why the regex check failed, but one of the test error message is above. I suspect it relates to std::__invoke. Is it possible to revert to the original version without forwarding to std::for_each?

libcxx/include/__algorithm/generate.h

c8ef · 2025-10-15T16:01:36Z

It's really weird that the regex check still isn't working. Even worse, it isn't producing a CMakeError log that I can use to reproduce the failed compilation. I ran the full test suite and found that the failures are only related to fs(symlink), iostream, and locale, which seems to have little relevance to std::generator.

********************
Failed Tests (17):
  llvm-libc++-mingw.cfg.in :: std/input.output/filesystems/fs.op.funcs/fs.op.rename/rename.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/input.output/iostream.format/ext.manip/get_money.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/input.output/iostream.format/ext.manip/put_money.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/input.output/iostream.format/output.streams/ostream.formatted/ostream.inserters.arithmetic/long_double.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_en_U
S.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_F
R.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_over
long.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_R
U.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_zh_C
N.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_en_U
S.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_F
R.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_ru_R
U.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_zh_C
N.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/localization/locale.categories/category.numeric/locale.nm.put/facet.num.put.members/put_long_double.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/strings/string.conversions/to_string.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/strings/string.conversions/to_wstring.pass.cpp
  llvm-libc++-mingw.cfg.in :: std/time/time.duration/time.duration.nonmember/ostream.pass.cpp

c8ef · 2025-10-15T16:20:47Z

I also checked the GitHub Actions CMakeError log for the REGEX, but it didn't return any hit.

philnik777 · 2025-10-16T07:42:36Z

The final issue is that auto&& is unavailable in C++ standards prior to C++14. : (

[]<class T>(T&& v) { ... } should work.

c8ef · 2025-10-16T07:50:31Z

The final issue is that auto&& is unavailable in C++ standards prior to C++14. : (

[]<class T>(T&& v) { ... } should work.

<source>:2:17: warning: explicit template parameter list for lambdas is a C++20 extension [-Wc++20-extensions]
    2 |     auto fn = []<class T>(T&& v) { return v + 1; };

Get a clang ICE on c++11 mode: https://godbolt.org/z/P4j6r5cra 😆

c8ef · 2025-10-16T07:51:46Z

I think I prefer the functor approach to make sure it works.

c8ef · 2025-10-17T02:25:45Z

The windows arm mingw CI failure seems unrelated.

libcxx/docs/ReleaseNotes/22.rst

libcxx/include/__algorithm/generate.h

libcxx/test/std/algorithms/alg.modifying.operations/alg.generate/generate.pass.cpp

libcxx/include/__algorithm/generate.h

Co-authored-by: A. Jiang <[email protected]>

philnik777

Thanks!

c8ef · 2025-10-20T11:37:18Z

Thanks for the valuable feedback!

Part of #102817. This is a natural follow-up to #163006. We are forwarding `std::generate_n` to `std::__for_each_n` (`std::for_each_n` needs c++17), resulting in improved performance for segmented iterators. before: ``` std::generate_n(deque<int>)/32 17.5 ns 17.3 ns 40727273 std::generate_n(deque<int>)/50 25.7 ns 25.5 ns 26352941 std::generate_n(deque<int>)/1024 490 ns 487 ns 1445161 std::generate_n(deque<int>)/8192 3908 ns 3924 ns 179200 ``` after: ``` std::generate_n(deque<int>)/32 11.1 ns 11.0 ns 64000000 std::generate_n(deque<int>)/50 16.1 ns 16.0 ns 44800000 std::generate_n(deque<int>)/1024 291 ns 292 ns 2357895 std::generate_n(deque<int>)/8192 2269 ns 2250 ns 298667 ```

* [flang] Fix standalone build regression from llvm#161179 (llvm#164309) Fix incorrect linking and dependencies introduced in llvm#161179 that break standalone builds of Flang. Signed-off-by: Michał Górny <[email protected]> * [AMDGPU] Remove magic constants from V_PK_ADD_F32 pattern. NFC (llvm#164335) * [AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (llvm#161638) They were previously optimized to not emit any waitcnt, which is technically correct because there is no reordering of operations at workgroup scope in CU mode for GFX10+. This breaks transitivity however, for example if we have the following sequence of events in one thread: - some stores - store atomic release syncscope("workgroup") - barrier then another thread follows with - barrier - load atomic acquire - store atomic release syncscope("agent") It does not work because, while the other thread sees the stores, it cannot release them at the wider scope. Our release fences aren't strong enough to "wait" on stores from other waves. We also cannot strengthen our release fences any further to allow for releasing other wave's stores because only GFX12 can do that with `global_wb`. GFX10-11 do not have the writeback instruction. It'd also add yet another level of complexity to code sequences, with both acquire/release having CU-mode only alternatives. Lastly, acq/rel are always used together. The price for synchronization has to be paid either at the acq, or the rel. Strengthening the releases would just make the memory model more complex but wouldn't help performance. So the choice here is to streamline the code sequences by making CU and WGP mode emit almost identical (vL0 inv is not needed in CU mode) code for release (or stronger) atomic ordering. This also removes the `vm_vsrc(0)` wait before barriers. Now that the release fence in CU mode is strong enough, it is no longer needed. Supersedes llvm#160501 Solves SC1-6454 * [InstSimplify] Support ptrtoaddr in simplifyGEPInst() (llvm#164262) This adds support for ptrtoaddr in the `ptradd p, ptrtoaddr(p2) - ptrtoaddr(p) -> p2` fold. This fold requires that p and p2 have the same underlying object (otherwise the provenance may not be the same). The argument I would like to make here is that because the underlying objects are the same (and the pointers in the same address space), the non-address bits of the pointer must be the same. Looking at some specific cases of underlying object relationship: * phi/select: Trivially true. * getelementptr: Only modifies address bits, non-address bits must remain the same. * addrspacecast round-trip cast: Must preserve all bits because we optimize such round-trip casts away. * non-interposable global alias: I'm a bit unsure about this one, but I guess the alias and the aliasee must have the same non-address bits? * various intrinsics like launder.invariant.group, ptrmask. I think these all either preserve all pointer bits (like the invariant.group ones) or at least the non-address bits (like ptrmask). There are some interesting cases like amdgcn.make.buffer.rsrc, but those are cross address-space. ----- There is a second `gep (gep p, C), (sub 0, ptrtoint(p)) -> C` transform in this function, which I am not extending to handle ptrtoaddr, adding negative tests instead. This transform is overall dubious for provenance reasons, but especially dubious with ptrtoaddr, as then we don't have the guarantee that provenance of `p` has been exposed. * [Hexagon] Add REQUIRES: asserts to test This test uses -debug-only, so needs an assertion-enabled build. * [AArch64] Combing scalar_to_reg into DUP if the DUP already exists (llvm#160499) If we already have a dup(x) as part of the DAG along with a scalar_to_vec(x), we can re-use the result of the dup to the scalar_to_vec(x). * [CAS] OnDiskGraphDB - fix MSVC "not all control paths return a value" warnings. NFC. (llvm#164369) * Reapply "[libc++] Optimize __hash_table::erase(iterator, iterator)" (llvm#162850) This reapplication fixes the use after free caused by not properly updating the bucket list in one case. Original commit message: Instead of just calling the single element `erase` on every element of the range, we can combine some of the operations in a custom implementation. Specifically, we don't need to search for the previous node or re-link the list every iteration. Removing this unnecessary work results in some nice performance improvements: ``` ----------------------------------------------------------------------------------------------------------------------- Benchmark old new ----------------------------------------------------------------------------------------------------------------------- std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/0 457 ns 459 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/32 995 ns 626 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/1024 18196 ns 7995 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/8192 124722 ns 70125 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/0 456 ns 461 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/32 1183 ns 769 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/1024 27827 ns 18614 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/8192 266681 ns 226107 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/0 455 ns 462 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/32 996 ns 659 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/1024 15963 ns 8108 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/8192 136493 ns 71848 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/0 454 ns 455 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/32 985 ns 703 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/1024 16277 ns 9085 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/8192 125736 ns 82710 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/0 457 ns 454 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/32 1091 ns 646 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/1024 17784 ns 7664 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/8192 127098 ns 72806 ns ``` This reverts commit acc3a62. * [TableGen] List the indices of sub-operands (llvm#163723) Some instances of the `Operand` class used in Tablegen instruction definitions expand to a cluster of multiple operands at the MC layer, such as complex addressing modes involving base + offset + shift, or clusters of operands describing conditional Arm instructions or predicated MVE instructions. There's currently no convenient way for C++ code to know the offset of one of those sub-operands from the start of the cluster: instead it just hard-codes magic numbers like `index+2`, which is hard to read and fragile. This patch adds an extra piece of output to `InstrInfoEmitter` to define those instruction offsets, based on the name of the `Operand` class instance in Tablegen, and the names assigned to the sub-operands in the `MIOperandInfo` field. For example, if target Foo were to define def Bar : Operand { let MIOperandInfo = (ops GPR:$first, i32imm:$second); // ... } then the new constants would be `Foo::SUBOP_Bar_first` and `Foo::SUBOP_Bar_second`, defined as 0 and 1 respectively. As an example, I've converted some magic numbers related to the MVE predication operand types (`vpred_n` and its superset `vpred_r`) to use the new named constants in place of the integer literals they previously used. This is more verbose, but also clearer, because it explains why the integer is chosen instead of what its value is. * [lldb] Add bidirectional packetLog to gdbclientutils.py (llvm#162176) While debugging the tests for llvm#155000 I found it helpful to have both sides of the simulated gdb-rsp traffic rather than just the responses so I've extended the packetLog in MockGDBServerResponder to record traffic in both directions. Tests have been updated accordingly * [MLIR] [Vector] Added canonicalizer for folding from_elements + transpose (llvm#161841) ## Description Adds a new canonicalizer that folds `vector.from_elements(vector.transpose))` => `vector.from_elements`. This canonicalization reorders the input elements for `vector.from_elements`, adjusts the output shape to match the effect of the transpose op and eliminating its need. ## Testing Added a 2D vector lit test that verifies the working of the rewrite. --------- Signed-off-by: Keshav Vinayak Jha <[email protected]> * [DA] Add initial support for monotonicity check (llvm#162280) The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. For AddRecs that may wrap, DA should conservatively return unknown dependence. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases. This patch introduces the notion of *monotonicity* and a validation logic to check whether a SCEV is monotonic. The monotonicity check classifies the SCEV into one of the following categories: - Unknown: Nothing is known about the monotonicity of the SCEV. - Invariant: The SCEV is loop-invariant. - MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense for any iteration of the loops in the loop nest. The current validation logic basically searches an affine AddRec recursively and checks whether the `nsw` flag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity. The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient. Split off from llvm#154527. * [VPlan] Use VPlan::getRegion to shorten code (NFC) (llvm#164287) * [VPlan] Improve code using m_APInt (NFC) (llvm#161683) * [SystemZ] Avoid trunc(add(X,X)) patterns (llvm#164378) Replace with trunc(add(X,Y)) to avoid premature folding in upcoming patch llvm#164227 * [clang][CodeGen] Emit `llvm.tbaa.errno` metadata during module creation Let Clang emit `llvm.tbaa.errno` metadata in order to let LLVM carry out optimizations around errno-writing libcalls to, as long as it is proved the involved memory location does not alias `errno`. Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972. * [LV][NFC] Remove undef from phi incoming values (llvm#163762) Split off from PR llvm#163525, this standalone patch replaces use of undef as incoming PHI values with zero, in order to reduce the likelihood of contributors hitting the `undef deprecator` warning in github. * [DA] Add option to enable specific dependence test only (llvm#164245) PR llvm#157084 added an option `da-run-siv-routines-only` to run only SIV routines in DA. This PR replaces that option with a more fine-grained one that allows to select other than SIV routines as well. This option is useful for regression testing of individual DA routines. This patch also reorganizes regression tests that use `da-run-siv-routines-only`. * [libcxx] Optimize `std::generate_n` for segmented iterators (llvm#164266) Part of llvm#102817. This is a natural follow-up to llvm#163006. We are forwarding `std::generate_n` to `std::__for_each_n` (`std::for_each_n` needs c++17), resulting in improved performance for segmented iterators. before: ``` std::generate_n(deque<int>)/32 17.5 ns 17.3 ns 40727273 std::generate_n(deque<int>)/50 25.7 ns 25.5 ns 26352941 std::generate_n(deque<int>)/1024 490 ns 487 ns 1445161 std::generate_n(deque<int>)/8192 3908 ns 3924 ns 179200 ``` after: ``` std::generate_n(deque<int>)/32 11.1 ns 11.0 ns 64000000 std::generate_n(deque<int>)/50 16.1 ns 16.0 ns 44800000 std::generate_n(deque<int>)/1024 291 ns 292 ns 2357895 std::generate_n(deque<int>)/8192 2269 ns 2250 ns 298667 ``` * [BOLT] Check entry point address is not in constant island (llvm#163418) There are cases where `addEntryPointAtOffset` is called with a given `Offset` that points to an address within a constant island. This triggers `assert(!isInConstantIsland(EntryPointAddress)` and causes BOLT to crash. This patch adds a check which ignores functions that would add such entry points and warns the user. * [llvm][dwarfdump] Pretty-print DW_AT_language_version (llvm#164222) In both verbose and non-verbose mode we will now use the `llvm::dwarf::LanguageDescription` to turn the version into a human readable string. In verbose mode we also display the raw version code (similar to how we display addresses in verbose mode). To make the version code and prettified easier to distinguish, we print the prettified name in colour (if available), which is consistent with how `DW_AT_language` is printed in colour. Before: ``` 0x0000000c: DW_TAG_compile_unit DW_AT_language_name (DW_LNAME_C) DW_AT_language_version (201112) ``` After: ``` 0x0000000c: DW_TAG_compile_unit DW_AT_language_name (DW_LNAME_C) DW_AT_language_version (201112 C11) ``` --------- Signed-off-by: Michał Górny <[email protected]> Signed-off-by: Keshav Vinayak Jha <[email protected]> Co-authored-by: Michał Górny <[email protected]> Co-authored-by: Stanislav Mekhanoshin <[email protected]> Co-authored-by: Pierre van Houtryve <[email protected]> Co-authored-by: Nikita Popov <[email protected]> Co-authored-by: David Green <[email protected]> Co-authored-by: Simon Pilgrim <[email protected]> Co-authored-by: Nikolas Klauser <[email protected]> Co-authored-by: Simon Tatham <[email protected]> Co-authored-by: Daniel Sanders <[email protected]> Co-authored-by: Keshav Vinayak Jha <[email protected]> Co-authored-by: Ryotaro Kasuga <[email protected]> Co-authored-by: Ramkumar Ramachandra <[email protected]> Co-authored-by: Antonio Frighetto <[email protected]> Co-authored-by: David Sherwood <[email protected]> Co-authored-by: Connector Switch <[email protected]> Co-authored-by: Asher Dobrescu <[email protected]> Co-authored-by: Michael Buch <[email protected]>

Part of llvm#102817. This patch attempts to optimize the performance of `std::generate` for segmented iterators. Below are the benchmark numbers from `libcxx\test\benchmarks\algorithms\modifying\generate.bench.cpp`. Test cases that use segmented iterators have also been added. - before ``` std::generate(deque<int>)/32 194 ns 193 ns 3733333 std::generate(deque<int>)/50 276 ns 276 ns 2488889 std::generate(deque<int>)/1024 5096 ns 5022 ns 112000 std::generate(deque<int>)/8192 40806 ns 40806 ns 17231 ``` - after ``` std::generate(deque<int>)/32 106 ns 105 ns 6400000 std::generate(deque<int>)/50 139 ns 138 ns 4977778 std::generate(deque<int>)/1024 2713 ns 2699 ns 248889 std::generate(deque<int>)/8192 18983 ns 19252 ns 37333 ``` --------- Co-authored-by: A. Jiang <[email protected]>

) Part of llvm#102817. This is a natural follow-up to llvm#163006. We are forwarding `std::generate_n` to `std::__for_each_n` (`std::for_each_n` needs c++17), resulting in improved performance for segmented iterators. before: ``` std::generate_n(deque<int>)/32 17.5 ns 17.3 ns 40727273 std::generate_n(deque<int>)/50 25.7 ns 25.5 ns 26352941 std::generate_n(deque<int>)/1024 490 ns 487 ns 1445161 std::generate_n(deque<int>)/8192 3908 ns 3924 ns 179200 ``` after: ``` std::generate_n(deque<int>)/32 11.1 ns 11.0 ns 64000000 std::generate_n(deque<int>)/50 16.1 ns 16.0 ns 44800000 std::generate_n(deque<int>)/1024 291 ns 292 ns 2357895 std::generate_n(deque<int>)/8192 2269 ns 2250 ns 298667 ```

Part of llvm#102817. This patch attempts to optimize the performance of `std::generate` for segmented iterators. Below are the benchmark numbers from `libcxx\test\benchmarks\algorithms\modifying\generate.bench.cpp`. Test cases that use segmented iterators have also been added. - before ``` std::generate(deque<int>)/32 194 ns 193 ns 3733333 std::generate(deque<int>)/50 276 ns 276 ns 2488889 std::generate(deque<int>)/1024 5096 ns 5022 ns 112000 std::generate(deque<int>)/8192 40806 ns 40806 ns 17231 ``` - after ``` std::generate(deque<int>)/32 106 ns 105 ns 6400000 std::generate(deque<int>)/50 139 ns 138 ns 4977778 std::generate(deque<int>)/1024 2713 ns 2699 ns 248889 std::generate(deque<int>)/8192 18983 ns 19252 ns 37333 ``` --------- Co-authored-by: A. Jiang <[email protected]>

) Part of llvm#102817. This is a natural follow-up to llvm#163006. We are forwarding `std::generate_n` to `std::__for_each_n` (`std::for_each_n` needs c++17), resulting in improved performance for segmented iterators. before: ``` std::generate_n(deque<int>)/32 17.5 ns 17.3 ns 40727273 std::generate_n(deque<int>)/50 25.7 ns 25.5 ns 26352941 std::generate_n(deque<int>)/1024 490 ns 487 ns 1445161 std::generate_n(deque<int>)/8192 3908 ns 3924 ns 179200 ``` after: ``` std::generate_n(deque<int>)/32 11.1 ns 11.0 ns 64000000 std::generate_n(deque<int>)/50 16.1 ns 16.0 ns 44800000 std::generate_n(deque<int>)/1024 291 ns 292 ns 2357895 std::generate_n(deque<int>)/8192 2269 ns 2250 ns 298667 ```

[libcxx] Optimize std::generate for segmented iterators

962695a

c8ef requested a review from a team as a code owner October 11, 2025 15:23

llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Oct 11, 2025

c8ef requested review from frederick-vs-ja, ldionne and philnik777 October 11, 2025 15:28

c8ef added 2 commits October 11, 2025 23:56

fix copy ci

2f00429

fix missing &

2526110

philnik777 reviewed Oct 13, 2025

View reviewed changes

c8ef added 2 commits October 13, 2025 23:14

forward std::generate to std::for_each

4589580

forward std::generate to std::for_each

bb58a98

c8ef requested a review from philnik777 October 14, 2025 18:30

frederick-vs-ja reviewed Oct 15, 2025

View reviewed changes

libcxx/include/__algorithm/generate.h Outdated Show resolved Hide resolved

c8ef added 2 commits October 15, 2025 22:54

use std::forward

f1e883f

Merge branch 'main' into seg-generate

b03bfdf

c8ef requested a review from frederick-vs-ja October 15, 2025 17:12

c8ef added 2 commits October 16, 2025 01:42

get rid of auto&&

eaa9f0f

include

cea8b3d

c8ef added 3 commits October 16, 2025 22:04

address review comments

36ee86c

Merge branch 'main' into seg-generate

8136d79

release notes

13f7843

c8ef requested a review from frederick-vs-ja October 17, 2025 02:25

frederick-vs-ja reviewed Oct 17, 2025

View reviewed changes

libcxx/docs/ReleaseNotes/22.rst Outdated Show resolved Hide resolved

libcxx/include/__algorithm/generate.h Outdated Show resolved Hide resolved

libcxx/test/std/algorithms/alg.modifying.operations/alg.generate/generate.pass.cpp Show resolved Hide resolved

philnik777 reviewed Oct 17, 2025

View reviewed changes

libcxx/include/__algorithm/generate.h Outdated Show resolved Hide resolved

c8ef added 3 commits October 17, 2025 22:52

Merge branch 'main' into seg-generate

79ca702

revert test

7b1c4a0

decltype

4c52a1a

c8ef requested review from frederick-vs-ja and philnik777 October 18, 2025 06:00

frederick-vs-ja reviewed Oct 18, 2025

View reviewed changes

libcxx/include/__algorithm/generate.h Outdated Show resolved Hide resolved

Apply suggestion from @frederick-vs-ja

82bb2ee

Co-authored-by: A. Jiang <[email protected]>

c8ef requested a review from frederick-vs-ja October 18, 2025 12:42

philnik777 approved these changes Oct 20, 2025

View reviewed changes

frederick-vs-ja approved these changes Oct 20, 2025

View reviewed changes

c8ef merged commit 46e8816 into llvm:main Oct 20, 2025
79 checks passed

c8ef deleted the seg-generate branch October 20, 2025 11:37

c8ef mentioned this pull request Oct 20, 2025

[libcxx] Optimize std::generate_n for segmented iterators #164266

Merged

[libcxx] Optimize std::generate for segmented iterators #163006

[libcxx] Optimize std::generate for segmented iterators #163006

Uh oh!

Conversation

c8ef commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 11, 2025

Uh oh!

philnik777 left a comment

Choose a reason for hiding this comment

Uh oh!

c8ef commented Oct 13, 2025

Uh oh!

c8ef commented Oct 13, 2025

Uh oh!

c8ef commented Oct 13, 2025

Uh oh!

philnik777 commented Oct 13, 2025

Uh oh!

c8ef commented Oct 13, 2025

Uh oh!

c8ef commented Oct 13, 2025

Uh oh!

c8ef commented Oct 13, 2025

Uh oh!

philnik777 commented Oct 13, 2025

Uh oh!

c8ef commented Oct 14, 2025

Uh oh!

Uh oh!

c8ef commented Oct 15, 2025

Uh oh!

c8ef commented Oct 15, 2025

Uh oh!

philnik777 commented Oct 16, 2025

Uh oh!

c8ef commented Oct 16, 2025

Uh oh!

c8ef commented Oct 16, 2025

Uh oh!

c8ef commented Oct 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

philnik777 left a comment

Choose a reason for hiding this comment

Uh oh!

c8ef commented Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

c8ef commented Oct 11, 2025 •

edited

Loading