Skip to content

Slow loop fusion when multiplying a column vector with a row vector #20875

@zsoerenm

Description

@zsoerenm

I am using Julia v0.6.0-pre.alpha.34
I run this simple function, which should be fast, because of loop fusion:

function test_perf1()
    range = 1:2000000
    range_transp = collect(range)'
    steering_vector = complex.(ones(4,1), ones(4,1))
    sum_signal = zeros(Complex{Float64}, 4, length(range))
    sum_signal .+=
      steering_vector .*
      cis.((2 * pi * 1.023e6 / 4e6) .* range_transp .+ (40 * pi / 180))
    return sum_signal
  end

This results in:

BenchmarkTools.Trial: 
  memory estimate:  137.33 MiB
  allocs estimate:  19
  --------------
  minimum time:     1.328 s (0.98% GC)
  median time:      1.364 s (2.87% GC)
  mean time:        1.363 s (2.99% GC)
  maximum time:     1.398 s (5.34% GC)
  --------------
  samples:          4
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

However, when I remove the dot after cis, I get this result:

BenchmarkTools.Trial: 
  memory estimate:  183.12 MiB
  allocs estimate:  97
  --------------
  minimum time:     440.628 ms (17.45% GC)
  median time:      443.680 ms (17.70% GC)
  mean time:        445.445 ms (17.72% GC)
  maximum time:     458.517 ms (19.00% GC)
  --------------
  samples:          12
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

The memory consumption is reduced, but the fused code it 3 x slower

Metadata

Metadata

Assignees

No one assigned

    Labels

    broadcastApplying a function over a collectionperformanceMust go faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions