Huge amount of llvm code generated by comparison kernels, potentially slowing compile times

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**

Incremental compile times feel like they have gotten slower some time ago. While investigating I notices that a large amount of llvm code is generated for the comparison kernels, especially dyn and dict versions. These end up calling the generic `BooleanArray::from_iter` method which therefore gets duplicated for each possible iterator type.

```
$ cargo llvm-lines | head -n30
   Compiling arrow v16.0.0 (/home/jhorstmann/Source/github/apache/arrow-rs/arrow)
    Finished dev [unoptimized + debuginfo] target(s) in 35.58s
  Lines           Copies        Function name
  -----           ------        -------------
  3156218 (100%)  73225 (100%)  (TOTAL)
   292166 (9.3%)   1164 (1.6%)  <arrow::array::array_boolean::BooleanArray as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
   112200 (3.6%)   1545 (2.1%)  core::iter::traits::iterator::Iterator::fold
    92112 (2.9%)    912 (1.2%)  arrow::compute::kernels::comparison::cmp_dict
    85538 (2.7%)   1236 (1.7%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold::enumerate::{{closure}}
    84972 (2.7%)   1164 (1.6%)  <arrow::array::array_boolean::BooleanArray as core::iter::traits::collect::FromIterator<Ptr>>::from_iter::{{closure}}
    83903 (2.7%)   1478 (2.0%)  core::iter::adapters::map::map_fold::{{closure}}
    80639 (2.6%)   1533 (2.1%)  core::option::Option<T>::map
    80304 (2.5%)    912 (1.2%)  arrow::compute::kernels::comparison::cmp_dict::{{closure}}
    78427 (2.5%)   1478 (2.0%)  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
    65942 (2.1%)   1236 (1.7%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::fold
    59070 (1.9%)   1517 (2.1%)  core::iter::traits::iterator::Iterator::for_each
    39108 (1.2%)   2360 (3.2%)  core::iter::adapters::map::Map<I,F>::new
    38034 (1.2%)    137 (0.2%)  arrow::util::trusted_len::trusted_len_unzip
    37336 (1.2%)   1511 (2.1%)  core::iter::traits::iterator::Iterator::for_each::call::{{closure}}
    33218 (1.1%)     62 (0.1%)  core::slice::sort::partition_in_blocks
    29970 (0.9%)     81 (0.1%)  arrow::compute::kernels::take::take_primitive
    27904 (0.9%)   2360 (3.2%)  core::iter::traits::iterator::Iterator::map
    27631 (0.9%)    249 (0.3%)  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next
    27392 (0.9%)    202 (0.3%)  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::size_hint
    26906 (0.9%)    140 (0.2%)  arrow::array::array_primitive::PrimitiveArray<T>::from_trusted_len_iter
    26764 (0.8%)    136 (0.2%)  arrow::buffer::mutable::MutableBuffer::try_from_trusted_len_iter
    25676 (0.8%)    277 (0.4%)  core::iter::traits::iterator::Iterator::try_fold
    22117 (0.7%)    257 (0.4%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    19091 (0.6%)    215 (0.3%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next
    18150 (0.6%)    415 (0.6%)  core::option::Option<T>::ok_or_else
    17976 (0.6%)    168 (0.2%)  arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter_bool
    17430 (0.6%)   1478 (2.0%)  core::iter::adapters::map::map_fold
```

Removing the dyn and dict comparison kernels gives the following number of lines (but this is of course not a realistic option):

```
$ cargo llvm-lines | head -n30
   Compiling arrow v16.0.0 (/home/jhorstmann/Source/github/apache/arrow-rs/arrow)

  Lines           Copies        Function name
  -----           ------        -------------
  1737110 (100%)  41764 (100%)  (TOTAL)
    49753 (2.9%)    939 (2.2%)  core::option::Option<T>::map
    38034 (2.2%)    137 (0.3%)  arrow::util::trusted_len::trusted_len_unzip
    33218 (1.9%)     62 (0.1%)  core::slice::sort::partition_in_blocks
    30112 (1.7%)    385 (0.9%)  core::iter::traits::iterator::Iterator::fold
    29970 (1.7%)     81 (0.2%)  arrow::compute::kernels::take::take_primitive
    26906 (1.5%)    140 (0.3%)  arrow::array::array_primitive::PrimitiveArray<T>::from_trusted_len_iter
    26764 (1.5%)    136 (0.3%)  arrow::buffer::mutable::MutableBuffer::try_from_trusted_len_iter
    25676 (1.5%)    277 (0.7%)  core::iter::traits::iterator::Iterator::try_fold
    22117 (1.3%)    257 (0.6%)  <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
    19091 (1.1%)    215 (0.5%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next
    18337 (1.1%)    318 (0.8%)  core::iter::adapters::map::map_fold::{{closure}}
    16947 (1.0%)    318 (0.8%)  <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
    16576 (1.0%)     64 (0.2%)  arrow::compute::kernels::cast::pack_numeric_to_dictionary
    16177 (0.9%)    261 (0.6%)  <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
    15683 (0.9%)     41 (0.1%)  <arrow::array::array_string::GenericStringArray<OffsetSize> as core::iter::traits::collect::FromIterator<core::option::Option<Ptr>>>::from_iter
    14998 (0.9%)    838 (2.0%)  core::iter::adapters::map::Map<I,F>::new
    14586 (0.8%)     86 (0.2%)  arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter
    14551 (0.8%)     96 (0.2%)  arrow::buffer::mutable::MutableBuffer::extend_from_iter
    14145 (0.8%)     77 (0.2%)  <arrow::array::array_primitive::PrimitiveArray<T> as core::iter::traits::collect::FromIterator<Ptr>>::from_iter
    13912 (0.8%)     39 (0.1%)  arrow::array::array::print_long_array
    13830 (0.8%)    357 (0.9%)  core::iter::traits::iterator::Iterator::for_each
    13743 (0.8%)     27 (0.1%)  arrow::compute::kernels::filter::filter_primitive
    13726 (0.8%)     62 (0.1%)  core::slice::sort::partition
    13125 (0.8%)     98 (0.2%)  <core::iter::adapters::GenericShunt<I,R> as core::iter::traits::iterator::Iterator>::try_fold::{{closure}}
    12256 (0.7%)     64 (0.2%)  arrow::array::builder::PrimitiveDictionaryBuilder<K,V>::append
    11804 (0.7%)     62 (0.1%)  core::slice::sort::partition_equal
    11215 (0.6%)     88 (0.2%)  <arrow::buffer::immutable::Buffer as core::iter::traits::collect::FromIterator<T>>::from_iter
```

This seems to indicate that nearly half of the llvm code is caused by the comparison kernels.

**Describe the solution you'd like**

I'm not sure whether there is a good solution for this, but wanted to document these findings.

The kernels use macros very extensively, but replacing these with generic functions will probably lead to the same amount of llvm lines because of monomorphization. Maybe some of the non-generic common code could be extracted into helper methods.

**Describe alternatives you've considered**

The `dyn` kernels could also be put behind a feature flag, but that would again complicate the test matrix.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Huge amount of llvm code generated by comparison kernels, potentially slowing compile times #1858

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Huge amount of llvm code generated by comparison kernels, potentially slowing compile times #1858

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions