-
Notifications
You must be signed in to change notification settings - Fork 738
CPP extension for overdrive effect in functional #580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| #include <torch/extension.h> | ||
|
|
||
| namespace torch { | ||
| namespace audio { | ||
|
|
||
| template <typename scalar_t> | ||
| void overdrive_cpu_kernel( | ||
| at::TensorAccessor<scalar_t, 2> waveform_accessor, | ||
| at::TensorAccessor<scalar_t, 2> temp_accessor, | ||
| at::TensorAccessor<scalar_t, 1> last_in_accessor, | ||
| at::TensorAccessor<scalar_t, 1> last_out_accessor, | ||
| at::TensorAccessor<scalar_t, 2> output_waveform_accessor) { | ||
| int64_t n_frames = waveform_accessor.size(1); | ||
| int64_t n_channels = waveform_accessor.size(0); | ||
|
|
||
| for (int64_t i_channel = 0; i_channel < n_channels; ++i_channel) { | ||
| for (int64_t i_frame = 0; i_frame < n_frames; ++i_frame) { | ||
| last_out_accessor[i_channel] = temp_accessor[i_channel][i_frame] - | ||
| last_in_accessor[i_channel] + 0.995 * last_out_accessor[i_channel]; | ||
| last_in_accessor[i_channel] = temp_accessor[i_channel][i_frame]; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're setting the value of last_in to the value of temp for the current iteration so that the next iteration those values may be used . But instead you could just read from temp all the time (except for the first iteration) right? I added a similar comment for the Python code above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes , the first iteration needs to be handled then we can remove the last_in variable |
||
| output_waveform_accessor[i_channel][i_frame] = | ||
| waveform_accessor[i_channel][i_frame] * 0.5 + | ||
| last_out_accessor[i_channel] * 0.75; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| void _overdrive_helper_cpu( | ||
| at::Tensor& waveform, | ||
| at::Tensor& temp, | ||
| at::Tensor& last_in, | ||
| at::Tensor& last_out, | ||
| at::Tensor& output_waveform) { | ||
| AT_DISPATCH_FLOATING_TYPES(waveform.scalar_type(), "overdrive_cpu", ([&] { | ||
| overdrive_cpu_kernel<scalar_t>( | ||
| waveform.accessor<scalar_t, 2>(), | ||
| temp.accessor<scalar_t, 2>(), | ||
| last_in.accessor<scalar_t, 1>(), | ||
| last_out.accessor<scalar_t, 1>(), | ||
| output_waveform.accessor<scalar_t, 2>()); | ||
| })); | ||
| } | ||
|
|
||
| } // namespace audio | ||
| } // namespace torch | ||
|
|
||
| PYBIND11_MODULE(_torch_overdrive, m) { | ||
| m.def( | ||
| "_overdrive_helper", | ||
| &torch::audio::_overdrive_helper_cpu, | ||
| "Executes helper loop for overdrive effect"); | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the amount of work you might benefit from using parallel_for.
Most PyTorch CPU operators are parallelized, unless there's no obvious need due to memory-boundedness.
Another issue with pure C for C++ extensions, for now, is autovectorization. We can't ship avx2 code without a CPU capability based dispatch. That means for C code in extensions like this we're for now restricted to SSE and related.
Of course this is taken care of when you call into at:: operations directly, since they each take advantage being part of libtorch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpuhrsch , yeah parallelization can be applied only for the channels loop . I was not sure how the parallel_for treats the inner sequential loop , so kept it without the parallel_for. A parallel thread won't interfere with other parallel thread's inner loop working right ?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhargavkathivarapu - I'm not sure what you mean by "interfere with" exactly. As in, shared variables or creating integers etc.? In this particular case it seems that the inner loops are independent of each other given that they differ in i_channel. The pointers and such will still be picked up as shared variables, but as long as you don't write to a single memory location from multiple threads concurrently etc., there's no issue.
By default PyTorch uses openmp which yields this implementation. Look into openmp's
omp parallel(here is what looks like a good explanation) for some more detail on what that means.