Skip to content

Conversation

@ahans
Copy link
Contributor

@ahans ahans commented Feb 11, 2024

Here's a first version of this exercise to get the discussion started.

The C++-specific parts of the description are still missing. I wanted to wait with writing that until we've reached consensus on the approach.

Since the introduction of the parallel versions of STL algorithms with C++17, I think using that is the most elegant way of solving this exercise. However, to make that work with gcc, TBB needs to be present (without TBB it will just fall back to sequential execution). I added the respective find_package line to CMakeLists.txt, but I'm not sure if that is an acceptable prerequisite here. Using std::thread manually would be the obvious alternative. However, due to the overhead of thread creation, that would only be beneficial for huge texts and in practice one would probably rely on thread pools and avoid the creation of new threads with each call to the function.

I also included a Catch2 benchmark that can be turned on with a define. I have seen Google benchmark used elsewhere. I could switch the benchmark to use that, but I think that since we have Catch2 already available anyway, it makes sense to use its built-in benchmarking tool.

I also thought about how to automatically check that things actually do run in parallel. However, I think there's no robust (let alone portable) way of doing that. So just like the other tracks, we'd have to trust the student implements something that is actually parallel. The unit tests can only check for correct results.

Let me know what you think!

@github-actions
Copy link
Contributor

Hello. Thanks for opening a PR on Exercism 🙂

We ask that all changes to Exercism are discussed on our Community Forum before being opened on GitHub. To enforce this, we automatically close all PRs that are submitted. That doesn't mean your PR is rejected but that we want the initial discussion about it to happen on our forum where a wide range of key contributors across the Exercism ecosystem can weigh in.

You can use this link to copy this into a new topic on the forum. If we decide the PR is appropriate, we'll reopen it and continue with it, so please don't delete your local branch.

If you're interested in learning more about this auto-responder, please read this blog post.


Note: If this PR has been pre-approved, please link back to this PR on the forum thread and a maintainer or staff member will reopen it.

@github-actions github-actions bot closed this Feb 11, 2024
@ahans ahans mentioned this pull request Feb 11, 2024
11 tasks
@ahans ahans force-pushed the feature/parallel-letter-freq branch from 4cad24d to 8eef740 Compare February 11, 2024 22:34
@vaeng
Copy link
Contributor

vaeng commented Feb 12, 2024

Hey there,

thank you for the PR. Lots of good ideas.

You raise a lot of valid points. I think this would be best done in an approach and an article.
We have 20 seconds per execution on the test-runner. This includes spinning up the docker container, compiling the code and the tests and converting the results to json.

I would rather not run benchmarks in that timeframe. Currently, with test runner v3 there is no interface to display timing information back to the user.

All tests use Catch2 v3 on the test-runner. So if we would switch to gtests or something else, we would have to change the test-runner, which I would rather not do for a single exercise.

As with many other exercises, we cannot force students to do a "correct" implementation, if the incorrect (non-parallel) thing has the same results for the unit tests.

@ahans
Copy link
Contributor Author

ahans commented Feb 12, 2024

You raise a lot of valid points. I think this would be best done in an approach and an article. We have 20 seconds per execution on the test-runner. This includes spinning up the docker container, compiling the code and the tests and converting the results to json.

I didn't mean to run the benchmark as part of the normal online evaluation. It's supposed to be something a student can use offline. The same approach is followed on (I think) the Rust track. The question was more about which benchmarking framework we want to use. I would prefer Catch2's, given that Catch2 is already in place anyway, but I have seen Google Benchmark being used as well, so I can imagine that for consistency we'd want to go with that.

The other question is about TBB. Would it be acceptable to include that in the runner image? The STL algorithms-based approach would also work without it, but then the runner would definitely execute things sequentially and not in parallel. We could also make the requirement optional and explain in an article/approach what students would have to do to run it locally with parallelism enabled.

@vaeng vaeng added x:module/practice-exercise Work on Practice Exercises x:rep/large Large amount of reputation labels Feb 12, 2024
@ahans
Copy link
Contributor Author

ahans commented Feb 16, 2024

Some guidance on how to continue here would be appreciated!

@vaeng
Copy link
Contributor

vaeng commented Feb 22, 2024

I made a test runner and modified the test-runner to use tbb. It is just 3MB more than the current runner image, so I think it is okay to "upgrade".

Apart from the test-runner I would also need to update the github actions to accept tbb solutions.

We would definitely need a [.docs/introduction.append.md file ]/https://exercism.org/docs/building/tracks/practice-exercises#h-documentation-files) to explain the routine for the benchmark and also on how to install tbb for ubuntu, mac and windows and maybe how to deactivate that part of the testing locally if that is somehow not possible.

Can you add such a file?

Copy link
Contributor

@siebenschlaefer siebenschlaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!

Just three things bother me:

  • I'd rather have all the modifications of the global config.json that have nothing to do with this exercise in a different PR
  • the location of the example solution in the config.json of this exercise needs ".meta/"
  • passing a char to the one-argument version of std::tolower() is at least a code smell, you could either cast the argument to unsigned char or change the loop variable c to unsigned char.

@vaeng probably found a solution for TBB

],
"example": [
"example.cpp",
"example.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"example.h"
".meta/example.h"

[[nodiscard]] std::unordered_map<char, size_t> frequency(
std::string_view const text) {
std::unordered_map<char, size_t> freq;
for (auto c : text) freq[std::tolower(c)] += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (auto c : text) freq[std::tolower(c)] += 1;
for (auto c : text) freq[std::tolower(static_cast<unsigned char>(c))] += 1;

// determine frequencies of texts in parallel
std::transform(std::execution::par, texts.cbegin(), texts.cend(),
freqs.begin(),
[](auto const& text) { return frequency(text); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could consider changing the parameter type to just auto,
see blog posts #1 and #2 by Arthur O'Dwyer.

std::vector<std::string_view> const& texts) {
std::vector<std::unordered_map<char, size_t>> freqs(texts.size());
// determine frequencies of texts in parallel
std::transform(std::execution::par, texts.cbegin(), texts.cend(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could consider changing par to par_unseq.

set(exercise_cpp "")
endif()

add_definitions("-DCATCH_CONFIG_ENABLE_BENCHMARKING")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as we're still figuring out whether and how to support benchmarking I would remove that line

# Run the tests on every build
add_custom_target(test_${exercise} ALL DEPENDS ${exercise} COMMAND ${exercise})

target_link_libraries(${exercise} PRIVATE TBB::tbb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind moving that line up, before set_target_properties in line 40, just to keep all the calls to target_link_libraries() somewhat close together?

@ahans
Copy link
Contributor Author

ahans commented Feb 24, 2024

I think I addressed most comments. The introduction.append.md file needs to be filled, I can certainly add something there.

Regarding the benchmark, I changed a few more things around:

  • There is a new CMake flag EXERCISM_INCLUDE_BENCHMARK now. When it is set, the relevant code gets included and the required Catch2 define is set.
  • The benchmark is in its own test now.
  • Texts are random generated and a bit larger, so that you can actually see a runtime difference between sequential/parallel execution.

I made TBB now optional. Things work without TBB available, it's just that execution won't be parallel then. I also pulled that into the GCC/Clang block. I believe that MSVC uses its own implementation and doesn't require TBB. I have some Windows machine with MSVC available I can use to verify this.

@ahans
Copy link
Contributor Author

ahans commented Feb 24, 2024

I made a test runner and modified the test-runner to use tbb. It is just 3MB more than the current runner image, so I think it is okay to "upgrade".

Will you open a separate PR for those or do you want me to do it? I suppose that would go into the cpp-test-runner repo?

Now with TBB being optional I think it'd be also fine to just skip that. The test runner doesn't need to actually run things in parallel IMHO.

Apart from the test-runner I would also need to update the github actions to accept tbb solutions.

Same question as above. With TBB optional, do we need that?

One argument could be that we also want to allow students to use TBB directly. I'm not sure about that. For the C++17 parallel algorithms, TBB's presence is an implementation detail (and probably not even for all platforms), so that alone would not be a good reason to expose it. Would we want students more than what standard C++ has to offer?

@ahans
Copy link
Contributor Author

ahans commented Feb 24, 2024

We would definitely need a [.docs/introduction.append.md file ]/https://exercism.org/docs/building/tracks/practice-exercises#h-documentation-files) to explain the routine for the benchmark and also on how to install tbb for ubuntu, mac and windows and maybe how to deactivate that part of the testing locally if that is somehow not possible.

Can you add such a file?

Looking at the linked documentation, I think you meant instructions.append.md, right?

@vaeng
Copy link
Contributor

vaeng commented Feb 25, 2024

I like the optional route you have taken.
This way we can keep everything as it is, but might still be adding tdd to the test-runner in the feature.

We should make use of the append file to explain the reasons for tbb and how it can be used in this context.
This is also the correct file to explain the benchmarking part and how to activate it.

@ahans
Copy link
Contributor Author

ahans commented Feb 29, 2024

We should make use of the append file to explain the reasons for tbb and how it can be used in this context.
This is also the correct file to explain the benchmarking part and how to activate it.

Sorry for the silence for so long.

I added that file now. I also removed the size of the texts for the benchmark. When testing under Windows it was running for quite a long time. 10 KiB texts also show a speedup.

Let me know if there's more you'd like changed.

freqs.begin(), [](auto text) { return frequency(text); });

// combine individual frequencies (this happens sequentially)
std::unordered_map<char, size_t> totals{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to go deeper into parallel, you could use reduce or do transform_reduce above.
Is there a reason against it that I don't see?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::reduce doesn't work on references, but always on values. So here, using std::reduce would require creating a new std::unordered_map in each step. That would create enough overhead to make it overall slower than just doing it sequentially. Honestly, using reduce here didn't even occur to me before your comment, probably because the way I do it now is also a common pattern when doing GPU programming: Parallelize what's easily parallelizable (determining letter frequency per text) and do the final reduce on the host (because we're dealing with comparatively few elements anyway). For a parallel reduce to make sense here, we'd need much more than ten texts. With thousands it could start to make sense, but maybe not even then. Either way, the reduce would need a different strategy than just using std::reduce. I can't think of anything that would justify the effort in the present case.

Copy link
Contributor

@vaeng vaeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the new content.
Every new sentence should be on a new line in the append document.

@ahans
Copy link
Contributor Author

ahans commented Mar 1, 2024

Every new sentence should be on a new line in the append document.

I updated the document now accordingly. But I wonder what's the rational for this?

@vaeng
Copy link
Contributor

vaeng commented Mar 1, 2024

Every new sentence should be on a new line in the append document.

I updated the document now accordingly. But I wonder what's the rational for this?

https://exercism.org/docs/building/markdown/markdown#h-one-sentence-per-line

@ahans
Copy link
Contributor Author

ahans commented Mar 1, 2024

Every new sentence should be on a new line in the append document.

I updated the document now accordingly. But I wonder what's the rational for this?

https://exercism.org/docs/building/markdown/markdown#h-one-sentence-per-line

Interesting points! Thanks for that! I actually had line breaks in between sentences. I think it doesn't conflict with the points of your link and makes it easier to read in a plain editor that doesn't support markdown rendering. But it's apparently not done in other markdown documents around here. So now mine also has a strict 1-1 relationship between sentences and lines.

@ahans ahans requested review from siebenschlaefer and vaeng March 1, 2024 12:22
Copy link
Contributor

@siebenschlaefer siebenschlaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link
Contributor

@vaeng vaeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes.

What can we do about the macOS issue?

@ahans
Copy link
Contributor Author

ahans commented Mar 1, 2024

Thanks for the changes.

What can we do about the macOS issue?

I only noticed this now. I guess #ifdefing it out is the only reasonable way here. In the doc I even say that Apple Clang doesn't support it. Looking at that table, I also realize that what I write about gcc/clang may only be partly true. Looks like TBB is used by libstdc++, but not libc++, where it's still an experimental feature. I will try to address the CI build issue as well as update the doc.

@vaeng vaeng merged commit 1dbcaac into exercism:main Mar 2, 2024
@ahans ahans mentioned this pull request Apr 16, 2024
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

x:module/practice-exercise Work on Practice Exercises x:rep/large Large amount of reputation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants