Add parallel-letter-frequency #800

ahans · 2024-02-11T22:14:34Z

Here's a first version of this exercise to get the discussion started.

The C++-specific parts of the description are still missing. I wanted to wait with writing that until we've reached consensus on the approach.

Since the introduction of the parallel versions of STL algorithms with C++17, I think using that is the most elegant way of solving this exercise. However, to make that work with gcc, TBB needs to be present (without TBB it will just fall back to sequential execution). I added the respective find_package line to CMakeLists.txt, but I'm not sure if that is an acceptable prerequisite here. Using std::thread manually would be the obvious alternative. However, due to the overhead of thread creation, that would only be beneficial for huge texts and in practice one would probably rely on thread pools and avoid the creation of new threads with each call to the function.

I also included a Catch2 benchmark that can be turned on with a define. I have seen Google benchmark used elsewhere. I could switch the benchmark to use that, but I think that since we have Catch2 already available anyway, it makes sense to use its built-in benchmarking tool.

I also thought about how to automatically check that things actually do run in parallel. However, I think there's no robust (let alone portable) way of doing that. So just like the other tracks, we'd have to trust the student implements something that is actually parallel. The unit tests can only check for correct results.

Let me know what you think!

github-actions · 2024-02-11T22:14:47Z

Hello. Thanks for opening a PR on Exercism 🙂

We ask that all changes to Exercism are discussed on our Community Forum before being opened on GitHub. To enforce this, we automatically close all PRs that are submitted. That doesn't mean your PR is rejected but that we want the initial discussion about it to happen on our forum where a wide range of key contributors across the Exercism ecosystem can weigh in.

You can use this link to copy this into a new topic on the forum. If we decide the PR is appropriate, we'll reopen it and continue with it, so please don't delete your local branch.

If you're interested in learning more about this auto-responder, please read this blog post.

Note: If this PR has been pre-approved, please link back to this PR on the forum thread and a maintainer or staff member will reopen it.

vaeng · 2024-02-12T09:07:04Z

Hey there,

thank you for the PR. Lots of good ideas.

You raise a lot of valid points. I think this would be best done in an approach and an article.
We have 20 seconds per execution on the test-runner. This includes spinning up the docker container, compiling the code and the tests and converting the results to json.

I would rather not run benchmarks in that timeframe. Currently, with test runner v3 there is no interface to display timing information back to the user.

All tests use Catch2 v3 on the test-runner. So if we would switch to gtests or something else, we would have to change the test-runner, which I would rather not do for a single exercise.

As with many other exercises, we cannot force students to do a "correct" implementation, if the incorrect (non-parallel) thing has the same results for the unit tests.

ahans · 2024-02-12T09:25:37Z

You raise a lot of valid points. I think this would be best done in an approach and an article. We have 20 seconds per execution on the test-runner. This includes spinning up the docker container, compiling the code and the tests and converting the results to json.

I didn't mean to run the benchmark as part of the normal online evaluation. It's supposed to be something a student can use offline. The same approach is followed on (I think) the Rust track. The question was more about which benchmarking framework we want to use. I would prefer Catch2's, given that Catch2 is already in place anyway, but I have seen Google Benchmark being used as well, so I can imagine that for consistency we'd want to go with that.

The other question is about TBB. Would it be acceptable to include that in the runner image? The STL algorithms-based approach would also work without it, but then the runner would definitely execute things sequentially and not in parallel. We could also make the requirement optional and explain in an article/approach what students would have to do to run it locally with parallelism enabled.

ahans · 2024-02-16T18:58:15Z

Some guidance on how to continue here would be appreciated!

vaeng · 2024-02-22T20:37:46Z

I made a test runner and modified the test-runner to use tbb. It is just 3MB more than the current runner image, so I think it is okay to "upgrade".

Apart from the test-runner I would also need to update the github actions to accept tbb solutions.

We would definitely need a [.docs/introduction.append.md file ]/https://exercism.org/docs/building/tracks/practice-exercises#h-documentation-files) to explain the routine for the benchmark and also on how to install tbb for ubuntu, mac and windows and maybe how to deactivate that part of the testing locally if that is somehow not possible.

Can you add such a file?

siebenschlaefer

Well done!

Just three things bother me:

I'd rather have all the modifications of the global config.json that have nothing to do with this exercise in a different PR
the location of the example solution in the config.json of this exercise needs ".meta/"
passing a char to the one-argument version of std::tolower() is at least a code smell, you could either cast the argument to unsigned char or change the loop variable c to unsigned char.

@vaeng probably found a solution for TBB

exercises/practice/parallel-letter-frequency/.meta/config.json

siebenschlaefer · 2024-02-22T19:27:48Z

exercises/practice/parallel-letter-frequency/.meta/config.json

+    ],
+    "example": [
+      "example.cpp",
+      "example.h"


Suggested change

"example.h"

".meta/example.h"

siebenschlaefer · 2024-02-22T19:28:14Z

exercises/practice/parallel-letter-frequency/.meta/example.cpp

+[[nodiscard]] std::unordered_map<char, size_t> frequency(
+    std::string_view const text) {
+    std::unordered_map<char, size_t> freq;
+    for (auto c : text) freq[std::tolower(c)] += 1;


Suggested change

for (auto c : text) freq[std::tolower(c)] += 1;

for (auto c : text) freq[std::tolower(static_cast<unsigned char>(c))] += 1;

siebenschlaefer · 2024-02-22T19:31:20Z

exercises/practice/parallel-letter-frequency/.meta/example.cpp

+    // determine frequencies of texts in parallel
+    std::transform(std::execution::par, texts.cbegin(), texts.cend(),
+                   freqs.begin(),
+                   [](auto const& text) { return frequency(text); });


You could consider changing the parameter type to just auto,
see blog posts #1 and #2 by Arthur O'Dwyer.

siebenschlaefer · 2024-02-22T19:31:55Z

exercises/practice/parallel-letter-frequency/.meta/example.cpp

+    std::vector<std::string_view> const& texts) {
+    std::vector<std::unordered_map<char, size_t>> freqs(texts.size());
+    // determine frequencies of texts in parallel
+    std::transform(std::execution::par, texts.cbegin(), texts.cend(),


You could consider changing par to par_unseq.

siebenschlaefer · 2024-02-22T19:35:00Z

exercises/practice/parallel-letter-frequency/CMakeLists.txt

+    set(exercise_cpp "")
+endif()
+
+add_definitions("-DCATCH_CONFIG_ENABLE_BENCHMARKING")


as long as we're still figuring out whether and how to support benchmarking I would remove that line

siebenschlaefer · 2024-02-22T20:34:29Z

exercises/practice/parallel-letter-frequency/CMakeLists.txt

+# Run the tests on every build
+add_custom_target(test_${exercise} ALL DEPENDS ${exercise} COMMAND ${exercise})
+
+target_link_libraries(${exercise} PRIVATE TBB::tbb)


Do you mind moving that line up, before set_target_properties in line 40, just to keep all the calls to target_link_libraries() somewhat close together?

Co-authored-by: Matthias <[email protected]>

…nto feature/parallel-letter-freq

ahans · 2024-02-24T23:29:12Z

I think I addressed most comments. The introduction.append.md file needs to be filled, I can certainly add something there.

Regarding the benchmark, I changed a few more things around:

There is a new CMake flag EXERCISM_INCLUDE_BENCHMARK now. When it is set, the relevant code gets included and the required Catch2 define is set.
The benchmark is in its own test now.
Texts are random generated and a bit larger, so that you can actually see a runtime difference between sequential/parallel execution.

I made TBB now optional. Things work without TBB available, it's just that execution won't be parallel then. I also pulled that into the GCC/Clang block. I believe that MSVC uses its own implementation and doesn't require TBB. I have some Windows machine with MSVC available I can use to verify this.

ahans · 2024-02-24T23:37:17Z

I made a test runner and modified the test-runner to use tbb. It is just 3MB more than the current runner image, so I think it is okay to "upgrade".

Will you open a separate PR for those or do you want me to do it? I suppose that would go into the cpp-test-runner repo?

Now with TBB being optional I think it'd be also fine to just skip that. The test runner doesn't need to actually run things in parallel IMHO.

Apart from the test-runner I would also need to update the github actions to accept tbb solutions.

Same question as above. With TBB optional, do we need that?

One argument could be that we also want to allow students to use TBB directly. I'm not sure about that. For the C++17 parallel algorithms, TBB's presence is an implementation detail (and probably not even for all platforms), so that alone would not be a good reason to expose it. Would we want students more than what standard C++ has to offer?

ahans · 2024-02-24T23:38:42Z

We would definitely need a [.docs/introduction.append.md file ]/https://exercism.org/docs/building/tracks/practice-exercises#h-documentation-files) to explain the routine for the benchmark and also on how to install tbb for ubuntu, mac and windows and maybe how to deactivate that part of the testing locally if that is somehow not possible.

Can you add such a file?

Looking at the linked documentation, I think you meant instructions.append.md, right?

vaeng · 2024-02-25T11:45:21Z

I like the optional route you have taken.
This way we can keep everything as it is, but might still be adding tdd to the test-runner in the feature.

We should make use of the append file to explain the reasons for tbb and how it can be used in this context.
This is also the correct file to explain the benchmarking part and how to activate it.

…er-freq

ahans · 2024-02-29T20:32:33Z

We should make use of the append file to explain the reasons for tbb and how it can be used in this context.
This is also the correct file to explain the benchmarking part and how to activate it.

Sorry for the silence for so long.

I added that file now. I also removed the size of the texts for the benchmark. When testing under Windows it was running for quite a long time. 10 KiB texts also show a speedup.

Let me know if there's more you'd like changed.

vaeng · 2024-02-29T20:42:29Z

exercises/practice/parallel-letter-frequency/.meta/example.cpp

+                   freqs.begin(), [](auto text) { return frequency(text); });
+
+    // combine individual frequencies (this happens sequentially)
+    std::unordered_map<char, size_t> totals{};


If you want to go deeper into parallel, you could use reduce or do transform_reduce above.
Is there a reason against it that I don't see?

std::reduce doesn't work on references, but always on values. So here, using std::reduce would require creating a new std::unordered_map in each step. That would create enough overhead to make it overall slower than just doing it sequentially. Honestly, using reduce here didn't even occur to me before your comment, probably because the way I do it now is also a common pattern when doing GPU programming: Parallelize what's easily parallelizable (determining letter frequency per text) and do the final reduce on the host (because we're dealing with comparatively few elements anyway). For a parallel reduce to make sense here, we'd need much more than ten texts. With thousands it could start to make sense, but maybe not even then. Either way, the reduce would need a different strategy than just using std::reduce. I can't think of anything that would justify the effort in the present case.

vaeng

Thanks for the new content.
Every new sentence should be on a new line in the append document.

exercises/practice/parallel-letter-frequency/.docs/instructions.append.md

ahans · 2024-03-01T08:16:23Z

Every new sentence should be on a new line in the append document.

I updated the document now accordingly. But I wonder what's the rational for this?

vaeng · 2024-03-01T08:48:27Z

Every new sentence should be on a new line in the append document.

I updated the document now accordingly. But I wonder what's the rational for this?

https://exercism.org/docs/building/markdown/markdown#h-one-sentence-per-line

ahans · 2024-03-01T12:22:20Z

Every new sentence should be on a new line in the append document.

I updated the document now accordingly. But I wonder what's the rational for this?

https://exercism.org/docs/building/markdown/markdown#h-one-sentence-per-line

Interesting points! Thanks for that! I actually had line breaks in between sentences. I think it doesn't conflict with the points of your link and makes it easier to read in a plain editor that doesn't support markdown rendering. But it's apparently not done in other markdown documents around here. So now mine also has a strict 1-1 relationship between sentences and lines.

siebenschlaefer

Looks great!

vaeng

Thanks for the changes.

What can we do about the macOS issue?

ahans · 2024-03-01T13:50:57Z

Thanks for the changes.

What can we do about the macOS issue?

I only noticed this now. I guess #ifdefing it out is the only reasonable way here. In the doc I even say that Apple Clang doesn't support it. Looking at that table, I also realize that what I write about gcc/clang may only be partly true. Looks like TBB is used by libstdc++, but not libc++, where it's still an experimental feature. I will try to address the CI build issue as well as update the doc.

github-actions bot closed this Feb 11, 2024

siebenschlaefer reopened this Feb 11, 2024

ahans mentioned this pull request Feb 11, 2024

Add #48in24 featured exercises #779

Closed

11 tasks

Add parallel-letter-frequency

8eef740

ahans force-pushed the feature/parallel-letter-freq branch from 4cad24d to 8eef740 Compare February 11, 2024 22:34

vaeng added x:module/practice-exercise Work on Practice Exercises x:rep/large Large amount of reputation labels Feb 12, 2024

siebenschlaefer reviewed Feb 22, 2024

View reviewed changes

ahans and others added 4 commits February 24, 2024 23:28

Update exercises/practice/parallel-letter-frequency/.meta/config.json

938aada

Co-authored-by: Matthias <[email protected]>

address PR comments

9c942d6

Merge branch 'feature/parallel-letter-freq' of github.com:ahans/cpp i…

1abefce

…nto feature/parallel-letter-freq

partially revert changes to config.json

e350cee

vaeng and others added 7 commits February 25, 2024 12:47

Merge branch 'main' into feature/parallel-letter-freq

e730e7a

Merge remote-tracking branch 'origin/main' into feature/parallel-lett…

86f2942

…er-freq

add instructions.append.md file

d8a9a8c

fix example.h path

53d41e6

fix markdown links

fe8b271

remove introduction.append.md

2ed5779

use 10 KiB in benchmark and #if defined

1d8501b

vaeng reviewed Feb 29, 2024

View reviewed changes

vaeng requested changes Feb 29, 2024

View reviewed changes

exercises/practice/parallel-letter-frequency/.docs/instructions.append.md Outdated Show resolved Hide resolved

exercises/practice/parallel-letter-frequency/.docs/instructions.append.md Outdated Show resolved Hide resolved

update md with PR comments

5063ad0

put each sentence on exactly one line

860332a

ahans requested review from siebenschlaefer and vaeng March 1, 2024 12:22

siebenschlaefer approved these changes Mar 1, 2024

View reviewed changes

vaeng reviewed Mar 1, 2024

View reviewed changes

vaeng approved these changes Mar 1, 2024

View reviewed changes

don't give execution policy on Apple Clang & update instructions.append

5d8de3f

vaeng approved these changes Mar 1, 2024

View reviewed changes

vaeng merged commit 1dbcaac into exercism:main Mar 2, 2024

ahans mentioned this pull request Apr 16, 2024

Exercises for #48in24 exercism/c#940

Open

12 tasks

	for (auto c : text) freq[std::tolower(c)] += 1;
	for (auto c : text) freq[std::tolower(static_cast<unsigned char>(c))] += 1;

Uh oh!

Add parallel-letter-frequency #800

Add parallel-letter-frequency #800

Uh oh!

Conversation

ahans commented Feb 11, 2024

Uh oh!

github-actions bot commented Feb 11, 2024

Uh oh!

vaeng commented Feb 12, 2024

Uh oh!

ahans commented Feb 12, 2024

Uh oh!

ahans commented Feb 16, 2024

Uh oh!

vaeng commented Feb 22, 2024

Uh oh!

siebenschlaefer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahans commented Feb 24, 2024

Uh oh!

ahans commented Feb 24, 2024

Uh oh!

ahans commented Feb 24, 2024

Uh oh!

vaeng commented Feb 25, 2024

Uh oh!

ahans commented Feb 29, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vaeng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ahans commented Mar 1, 2024

Uh oh!

vaeng commented Mar 1, 2024

Uh oh!

ahans commented Mar 1, 2024

Uh oh!

siebenschlaefer left a comment

Choose a reason for hiding this comment

Uh oh!

vaeng left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahans commented Mar 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vaeng left a comment •

edited

Loading

ahans commented Mar 1, 2024 •

edited

Loading