-
-
Notifications
You must be signed in to change notification settings - Fork 234
Add parallel-letter-frequency #800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hello. Thanks for opening a PR on Exercism 🙂 We ask that all changes to Exercism are discussed on our Community Forum before being opened on GitHub. To enforce this, we automatically close all PRs that are submitted. That doesn't mean your PR is rejected but that we want the initial discussion about it to happen on our forum where a wide range of key contributors across the Exercism ecosystem can weigh in. You can use this link to copy this into a new topic on the forum. If we decide the PR is appropriate, we'll reopen it and continue with it, so please don't delete your local branch. If you're interested in learning more about this auto-responder, please read this blog post. Note: If this PR has been pre-approved, please link back to this PR on the forum thread and a maintainer or staff member will reopen it. |
4cad24d to
8eef740
Compare
|
Hey there, thank you for the PR. Lots of good ideas. You raise a lot of valid points. I think this would be best done in an approach and an article. I would rather not run benchmarks in that timeframe. Currently, with test runner v3 there is no interface to display timing information back to the user. All tests use Catch2 v3 on the test-runner. So if we would switch to gtests or something else, we would have to change the test-runner, which I would rather not do for a single exercise. As with many other exercises, we cannot force students to do a "correct" implementation, if the incorrect (non-parallel) thing has the same results for the unit tests. |
I didn't mean to run the benchmark as part of the normal online evaluation. It's supposed to be something a student can use offline. The same approach is followed on (I think) the Rust track. The question was more about which benchmarking framework we want to use. I would prefer Catch2's, given that Catch2 is already in place anyway, but I have seen Google Benchmark being used as well, so I can imagine that for consistency we'd want to go with that. The other question is about TBB. Would it be acceptable to include that in the runner image? The STL algorithms-based approach would also work without it, but then the runner would definitely execute things sequentially and not in parallel. We could also make the requirement optional and explain in an article/approach what students would have to do to run it locally with parallelism enabled. |
|
Some guidance on how to continue here would be appreciated! |
|
I made a test runner and modified the test-runner to use tbb. It is just 3MB more than the current runner image, so I think it is okay to "upgrade". Apart from the test-runner I would also need to update the github actions to accept tbb solutions. We would definitely need a [ Can you add such a file? |
siebenschlaefer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done!
Just three things bother me:
- I'd rather have all the modifications of the global
config.jsonthat have nothing to do with this exercise in a different PR - the location of the example solution in the
config.jsonof this exercise needs ".meta/" - passing a
charto the one-argument version ofstd::tolower()is at least a code smell, you could either cast the argument tounsigned charor change the loop variablectounsigned char.
@vaeng probably found a solution for TBB
| ], | ||
| "example": [ | ||
| "example.cpp", | ||
| "example.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "example.h" | |
| ".meta/example.h" |
| [[nodiscard]] std::unordered_map<char, size_t> frequency( | ||
| std::string_view const text) { | ||
| std::unordered_map<char, size_t> freq; | ||
| for (auto c : text) freq[std::tolower(c)] += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for (auto c : text) freq[std::tolower(c)] += 1; | |
| for (auto c : text) freq[std::tolower(static_cast<unsigned char>(c))] += 1; |
| // determine frequencies of texts in parallel | ||
| std::transform(std::execution::par, texts.cbegin(), texts.cend(), | ||
| freqs.begin(), | ||
| [](auto const& text) { return frequency(text); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::vector<std::string_view> const& texts) { | ||
| std::vector<std::unordered_map<char, size_t>> freqs(texts.size()); | ||
| // determine frequencies of texts in parallel | ||
| std::transform(std::execution::par, texts.cbegin(), texts.cend(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could consider changing par to par_unseq.
| set(exercise_cpp "") | ||
| endif() | ||
|
|
||
| add_definitions("-DCATCH_CONFIG_ENABLE_BENCHMARKING") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as we're still figuring out whether and how to support benchmarking I would remove that line
| # Run the tests on every build | ||
| add_custom_target(test_${exercise} ALL DEPENDS ${exercise} COMMAND ${exercise}) | ||
|
|
||
| target_link_libraries(${exercise} PRIVATE TBB::tbb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind moving that line up, before set_target_properties in line 40, just to keep all the calls to target_link_libraries() somewhat close together?
Co-authored-by: Matthias <[email protected]>
…nto feature/parallel-letter-freq
|
I think I addressed most comments. The Regarding the benchmark, I changed a few more things around:
I made TBB now optional. Things work without TBB available, it's just that execution won't be parallel then. I also pulled that into the GCC/Clang block. I believe that MSVC uses its own implementation and doesn't require TBB. I have some Windows machine with MSVC available I can use to verify this. |
Will you open a separate PR for those or do you want me to do it? I suppose that would go into the Now with TBB being optional I think it'd be also fine to just skip that. The test runner doesn't need to actually run things in parallel IMHO.
Same question as above. With TBB optional, do we need that? One argument could be that we also want to allow students to use TBB directly. I'm not sure about that. For the C++17 parallel algorithms, TBB's presence is an implementation detail (and probably not even for all platforms), so that alone would not be a good reason to expose it. Would we want students more than what standard C++ has to offer? |
Looking at the linked documentation, I think you meant |
|
I like the optional route you have taken. We should make use of the append file to explain the reasons for tbb and how it can be used in this context. |
Sorry for the silence for so long. I added that file now. I also removed the size of the texts for the benchmark. When testing under Windows it was running for quite a long time. 10 KiB texts also show a speedup. Let me know if there's more you'd like changed. |
| freqs.begin(), [](auto text) { return frequency(text); }); | ||
|
|
||
| // combine individual frequencies (this happens sequentially) | ||
| std::unordered_map<char, size_t> totals{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to go deeper into parallel, you could use reduce or do transform_reduce above.
Is there a reason against it that I don't see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::reduce doesn't work on references, but always on values. So here, using std::reduce would require creating a new std::unordered_map in each step. That would create enough overhead to make it overall slower than just doing it sequentially. Honestly, using reduce here didn't even occur to me before your comment, probably because the way I do it now is also a common pattern when doing GPU programming: Parallelize what's easily parallelizable (determining letter frequency per text) and do the final reduce on the host (because we're dealing with comparatively few elements anyway). For a parallel reduce to make sense here, we'd need much more than ten texts. With thousands it could start to make sense, but maybe not even then. Either way, the reduce would need a different strategy than just using std::reduce. I can't think of anything that would justify the effort in the present case.
vaeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the new content.
Every new sentence should be on a new line in the append document.
exercises/practice/parallel-letter-frequency/.docs/instructions.append.md
Outdated
Show resolved
Hide resolved
exercises/practice/parallel-letter-frequency/.docs/instructions.append.md
Outdated
Show resolved
Hide resolved
I updated the document now accordingly. But I wonder what's the rational for this? |
https://exercism.org/docs/building/markdown/markdown#h-one-sentence-per-line |
Interesting points! Thanks for that! I actually had line breaks in between sentences. I think it doesn't conflict with the points of your link and makes it easier to read in a plain editor that doesn't support markdown rendering. But it's apparently not done in other markdown documents around here. So now mine also has a strict 1-1 relationship between sentences and lines. |
siebenschlaefer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes.
What can we do about the macOS issue?
I only noticed this now. I guess |
Here's a first version of this exercise to get the discussion started.
The C++-specific parts of the description are still missing. I wanted to wait with writing that until we've reached consensus on the approach.
Since the introduction of the parallel versions of STL algorithms with C++17, I think using that is the most elegant way of solving this exercise. However, to make that work with gcc, TBB needs to be present (without TBB it will just fall back to sequential execution). I added the respective
find_packageline toCMakeLists.txt, but I'm not sure if that is an acceptable prerequisite here. Usingstd::threadmanually would be the obvious alternative. However, due to the overhead of thread creation, that would only be beneficial for huge texts and in practice one would probably rely on thread pools and avoid the creation of new threads with each call to the function.I also included a Catch2 benchmark that can be turned on with a define. I have seen Google benchmark used elsewhere. I could switch the benchmark to use that, but I think that since we have Catch2 already available anyway, it makes sense to use its built-in benchmarking tool.
I also thought about how to automatically check that things actually do run in parallel. However, I think there's no robust (let alone portable) way of doing that. So just like the other tracks, we'd have to trust the student implements something that is actually parallel. The unit tests can only check for correct results.
Let me know what you think!